Cost Estimation

INTRODUCTION

Managers are supposed to plan. Planning includes budgeting. IS budgeting includes software development and acquisition costs. Can these costs be budgeted, or shall we say "predicted"?

Over the years many have attempted to determine a priori what the cost of a developing a specific application will be. Why has it been so important? Not only the budget is on the line, but many times a manager's job or reputation as well. The make or buy decision must be made.

What cost is this that we are trying to estimate, determine or "predict"? We know that the cost of developing software, up until the point that it is accepted, is only a fraction of the total cost of the system over the typical life cycle of the product. However, for the purpose of this study, we will exclude the maintenance costs, and will speak only of the development costs up until acceptance. This position is consistent with that taken by those having done research in this field.

We will first review and discuss the most main published methods (lines of code, function points, and objects), and some basic terminology relating them, followed by a discussion of current trends, and finally the implications of these trends for software cost estimation.

PUBLISHED TECHNIQUES

We will look at three basic researched methodologies for a priori software cost estimation: lines of code, functions, and objects. For each we will describe the methodology used, with its accompanying advantages and disadvantages. We must note that, thus far, all researched models have approached cost estimation through estimation of effort (generally man-months) involved in the project.

LINES OF CODE

This general approach is actually subdivided into two different areas: SLOC (Source Lines of Code, and SDI (Source Delivered Instructions). The difference between these two is that the first, SLOC, takes into account all the housekeeping which must be done by the developer, such as headers and embedded comments. The second, SDI, only takes into account the number of executable lines of code.

The best known technique using LOC (Lines of Code) is the COCOMO (COnstructive COst MOdel), developed by Boehm. This model, along with other SLOC/SDI based models, uses not only the LOC, but also other factors such as product attributes, hardware limitations, personnel, and development environment. These different factors lead to one or more "adjustment" factors which adjust the direct evaluation of the effort needed. In COCOMO's case, there are fourteen such factors derived by Boehm. This model shows a linear relation between the LOC and the cost.

Another model for this category (LOC) is the Putnam Estimation Model. This model includes more variables, and is non-linear in nature. The estimation is affected not only by the SDI, but also by the software development environment and desired development time.

Other models using LOC are BYL (Before You Leap) by the Gordon Group, WICOMO (Wang Institute Cost Model), DECPlan (Digital Equipment) and SLIM (Application of Putnam Estimation Model).

FUNCTIONS

Cost estimation based on expected functionality of the system was first proposed by Albrecht in 1979, and has since been researched by several people. This cost estimation relies on function points, and requires the identification of all occurrences of five unique function types: External Inputs, External Outputs, Logical Internal Files, External Interfaces, and Queries. The sum of all occurrences is called RAW-FUNCTION-COUNTS (FC). This value must be modified by a weighted rating of Complexity Factors, giving a Technical Complexity Factor (TCF). The Function Points are equivalent to FC*TCF for any given project.

This technique has been evaluated by several authors, and some attempts have been made at refining the model. These estimations have proven "more successful" than the original model at estimating cost a priori.

Overall, the function-points models appear to more accurately predict the effort needed for a specific project than LOC-based models.

OBJECTS

Cost estimation based on objects has recently been introduced, given the ascendancy of Object-Oriented-Programming (OOP) and Object-Oriented CASE tools. The basic is similar to function-based cost estimation, yet, as the name implies counts objects, and not functions. Research until now has been very limited, and has not shown any improvement in reliability over function-based methods.

TRENDS

What are current trends in software cost estimation? What changes in systems development affect software cost estimation. We will examine the major changes which have been taking place in recent times.

USE OF SLOC/SDI

In the past few years, the practitioners trend has been to get away from SLOC and SDI, and to work based on function points. The reasoning for this is that function points are more "independent" (they are less dependent on the language and the programming environment) than SLOC and SDI.

PROTOTYPING

In recent years prototyping has become a major component of many systems developments efforts. Boehm and Papaccio's spiral development model is in essence a prototyping model in which a system is developed in phases, with requirements specifications, cost to completion, and the risk evaluated at each step.

CASE TOOLS AND PROGRAM GENERATORS

In the last few years, CASE tools and program generators have developed to the point that some companies are no longer "programming" in the traditional sense of the word. They are in essence just doing an in depth analysis, which, when it is complete, gives them a working system. Along the way, they may generate the system many times to test it, using the system as a prototype development platform.

IN-HOUSE METRICS DEVELOPMENT

Today, most major systems developers and consultants, have a methodology to determine the a priori cost of a software development project. This methodology is proprietary, and we can only be aware of the externals of it. The cost estimation methodology is linked to a specific systems analysis and design methodology. This cost estimation is based on the use of the analysis methodology and the experience of the firm.

PROBLEMS AND EVALUATION

Given the differing methodologies and current trends in software development, what research can and/or should be done? In order to see this, let's look at the overall situation, with an evaluation of the problems and advantages each cost-estimation methodology.

It is apparent that there is room, and even desire, for improved metrics. It is clear that there is no perfect way of a priori cost estimation, but there are ways which may be acceptable. In order to evaluate the three methods outlined, we must fully understand the problems each presents.

LINES OF CODE

This, the oldest of the models, is probably not going to generate much in the way of new research. Current trends in which software development is going to prototyping, CASE tools, and 4GLs, make the use of LOC much less stable. In order to get a model which suits the environment, there must be many projects of different types and sizes in a stable environment. This is generally no longer the case, as fewer and fewer organizations have significant numbers of new applications "written" entirely by programmers.

Even if the number of projects exist, the calibration is not easy, due to the differing capacity of programmers and environments.

FUNCTION POINTS

This widely used technique has calibration problems, just as the LOC models do. However, the calibration problems seem to be simpler, and easier to define. One factor which accounts for the ease of calibration is that since function points are independent of the programming environment, it is possible to use data gathered at other sites, as is currently being done by Software Productivity Research, Inc. [DREG89]

In the past, many people have used function points to then determine the LOC, and then have done the cost estimation using the LOC. This methodology is incorrect, in that it adds one more error factor into the equation. If function points are the only independent variable to estimate lines of code, LOC are not needed.

OBJECT POINTS

This newest methodology is too new to really be evaluated empirically. Matter of fact, the one paper available, published by Banker, Kauffman and Kumar [BANK91] gives data and correlations which until now I have been unable to verify. Either the data presented in Table 6 are incorrect, or the explanation of the variables given is such that I have been unable to fit the data to the model given.

The authors noted that there were significant variances across time, as software developers became more and more familiar with the CASE development environment which was being evaluated.

The concept of using objects to estimate cost is tantalizing in its simplicity, but has yet to demonstrate viability in the long-haul. It has been tested in only one case so far. However, any model which shows promise in spite of the significant variability introduced by a new software development technique should be evaluated more.

OVERALL PROBLEMS

It is clear that at the current time no well-known model is available to practitioners who desire to put one into practice. At the same time, we can see that different companies, such as Anderson Consulting, offer cost estimation tools to their customers, and are highly "successful" at what they do.

From my experience and that of all practitioners who have attempted cost estimation, we note that cost estimation is a very difficult item, much subject to the variability in human beings. We must realize that in psychological research any model which can explain even 50% of the variance in behavior is highly regarded. Should we consider that human behavior is a large factor in the software development process, and therefore in the cost estimation?

Where are there successful models being built? It is in organizations which have a large number of applications development projects and have a very structured methodology for software development. I have been unable to find any published cost estimation methodology that has been shown to explain more than 70% of the variance across different organizations.

FUTURE RESEARCH

In what research has been done, and in practice, no cost estimation principle is extremely predictive without a given methodology. It is therefore necessary to attempt to study a given cost estimation technique in relation to a given methodology to attempt to develop an empirical model which would have a higher explanatory power than that of current models.

In the paper by Banker, Kauffman, and Kumar, it was made obvious that not only must the cost estimation technique be stable, but also the development tools must be stable. It is very difficult to develop a model which depends on what year in the cycle of development techniques.

There is currently an ongoing project by Software Productivity Research, Inc. to gather a set of over 10,000 varied projects using function point analysis [DREG89, p. 145]. This project, if completed, promises to be the first major empirical study on cost estimation across multiple development platforms, and multiple development techniques. In the bibliographic search conducted, no reports of the conclusions of this study have been reported.

CONCLUSIONS AND IMPLICATIONS FOR PROJECT MANAGERS

While software cost prediction models are still in relative infancy, it is clear that each manager must be able to prepare a budget for the project. Of the techniques presented in this paper, the function points analysis technique is the most robust. This is not to say that it must be used to the exclusion of other techniques, but that it is the technique for which the largest body of empirical research has been conducted.

Object points is a promising technique in object-oriented CASE environments, but has much to be studied, and SLOC models are becoming outdated due to new methodologies.

Is there a "best" technique? Yes, whatever works in the given environment. With careful calibration for a given environment it is possible for the manager to develop a cost estimation model which closely relates to the environment. This is not without effort and much time, but can be financially rewarding, as well as providing peace of mind for the manager.

BIBLIOGRAPHY

Albrecht, A. J. "Measuring Application Development Productivity. In Proceedings of the IBM Applications Development Symposium. GUIDE/SHARE (Monterey, CA, Oct. 14-17). IBM. 1979, pp. 83-92.
Bailey, John W. and Basili, Victor R. "A Meta-Model for Software Development Resource Expenditures."
Boehm, Barry W. and Papaccio, Philip N. "Understanding and Controlling Software Costs."
Cuelenaere, A. M. E., van Genuchten, M. J. I. M. and Heemstra, F. J. "Calibrating a Software Cost Estimation Model: Why and How" in Information and Software Technology, v. 29, Dec. 1987, pp. 558-567.
Dreger, J. Brian. Function Point Analysis. Englewood Cliffs, NJ: Prentice Hall, 1989.
Banker, R., Kauffman, W., and R. Kumar. "An Empirical Test of Object-Based Output Measurement Metrics in a Computer Aided Software Engineering (CASE) Environment." Unpublished manuscript.
Kemerer, Chris F. "An Empirical Validation of Software Cost Estimation Models." Communications of the ACM, 30: 416-429.
Mendelson, Haim. The Economics of Information Systems Management. Unpublished manuscript, 1989.
Miyazaki, Y., Takanou, A., and Nozaki, H. "Method to estimate parametere values in software prediction models" in Information and Software Technology, v. 33, April 1991, pp. 239-243.
Symons, Charles R. "Function Point Analysis."