Managers are supposed to plan. Planning includes budgeting. IS budgeting includes software development and acquisition costs. Can these costs be budgeted, or shall we say "predicted"?
Over the years many have attempted to determine a priori what the cost of a developing a specific application will be. Why has it been so important? Not only the budget is on the line, but many times a manager's job or reputation as well. The make or buy decision must be made.
What cost is this that we are trying to estimate, determine or "predict"? We know that the cost of developing software, up until the point that it is accepted, is only a fraction of the total cost of the system over the typical life cycle of the product. However, for the purpose of this study, we will exclude the maintenance costs, and will speak only of the development costs up until acceptance. This position is consistent with that taken by those having done research in this field.
We will first review and discuss the most main published methods
(lines of code, function points, and objects), and some basic
terminology relating them, followed by a discussion of current
trends, and finally the implications of these trends for software
We will look at three basic researched methodologies for a priori software cost estimation: lines of code, functions, and objects. For each we will describe the methodology used, with its accompanying advantages and disadvantages. We must note that, thus far, all researched models have approached cost estimation through estimation of effort (generally man-months) involved in the project.
LINES OF CODE
This general approach is actually subdivided into two different areas: SLOC (Source Lines of Code, and SDI (Source Delivered Instructions). The difference between these two is that the first, SLOC, takes into account all the housekeeping which must be done by the developer, such as headers and embedded comments. The second, SDI, only takes into account the number of executable lines of code.
The best known technique using LOC (Lines of Code) is the COCOMO (COnstructive COst MOdel), developed by Boehm. This model, along with other SLOC/SDI based models, uses not only the LOC, but also other factors such as product attributes, hardware limitations, personnel, and development environment. These different factors lead to one or more "adjustment" factors which adjust the direct evaluation of the effort needed. In COCOMO's case, there are fourteen such factors derived by Boehm. This model shows a linear relation between the LOC and the cost.
Another model for this category (LOC) is the Putnam Estimation Model. This model includes more variables, and is non-linear in nature. The estimation is affected not only by the SDI, but also by the software development environment and desired development time.
Other models using LOC are BYL (Before You Leap) by the Gordon Group, WICOMO (Wang Institute Cost Model), DECPlan (Digital Equipment) and SLIM (Application of Putnam Estimation Model).
Cost estimation based on expected functionality of the system was first proposed by Albrecht in 1979, and has since been researched by several people. This cost estimation relies on function points, and requires the identification of all occurrences of five unique function types: External Inputs, External Outputs, Logical Internal Files, External Interfaces, and Queries. The sum of all occurrences is called RAW-FUNCTION-COUNTS (FC). This value must be modified by a weighted rating of Complexity Factors, giving a Technical Complexity Factor (TCF). The Function Points are equivalent to FC*TCF for any given project.
This technique has been evaluated by several authors, and some attempts have been made at refining the model. These estimations have proven "more successful" than the original model at estimating cost a priori.
Overall, the function-points models appear to more accurately predict the effort needed for a specific project than LOC-based models.
Cost estimation based on objects has recently been introduced, given the ascendancy of Object-Oriented-Programming (OOP) and Object-Oriented CASE tools. The basic is similar to function-based cost estimation, yet, as the name implies counts objects, and not functions. Research until now has been very limited, and has not shown any improvement in reliability over function-based methods.
What are current trends in software cost estimation? What changes in systems development affect software cost estimation. We will examine the major changes which have been taking place in recent times.
USE OF SLOC/SDI
In the past few years, the practitioners trend has been to get away from SLOC and SDI, and to work based on function points. The reasoning for this is that function points are more "independent" (they are less dependent on the language and the programming environment) than SLOC and SDI.
In recent years prototyping has become a major component of many systems developments efforts. Boehm and Papaccio's spiral development model is in essence a prototyping model in which a system is developed in phases, with requirements specifications, cost to completion, and the risk evaluated at each step.
CASE TOOLS AND PROGRAM GENERATORS
In the last few years, CASE tools and program generators have developed to the point that some companies are no longer "programming" in the traditional sense of the word. They are in essence just doing an in depth analysis, which, when it is complete, gives them a working system. Along the way, they may generate the system many times to test it, using the system as a prototype development platform.
IN-HOUSE METRICS DEVELOPMENT
Today, most major systems developers and consultants, have a methodology to determine the a priori cost of a software development project. This methodology is proprietary, and we can only be aware of the externals of it. The cost estimation methodology is linked to a specific systems analysis and design methodology. This cost estimation is based on the use of the analysis methodology and the experience of the firm.
Given the differing methodologies and current trends in software development, what research can and/or should be done? In order to see this, let's look at the overall situation, with an evaluation of the problems and advantages each cost-estimation methodology.
It is apparent that there is room, and even desire, for improved metrics. It is clear that there is no perfect way of a priori cost estimation, but there are ways which may be acceptable. In order to evaluate the three methods outlined, we must fully understand the problems each presents.
LINES OF CODE
This, the oldest of the models, is probably not going to generate much in the way of new research. Current trends in which software development is going to prototyping, CASE tools, and 4GLs, make the use of LOC much less stable. In order to get a model which suits the environment, there must be many projects of different types and sizes in a stable environment. This is generally no longer the case, as fewer and fewer organizations have significant numbers of new applications "written" entirely by programmers.
Even if the number of projects exist, the calibration is not easy, due to the differing capacity of programmers and environments.
This widely used technique has calibration problems, just as the LOC models do. However, the calibration problems seem to be simpler, and easier to define. One factor which accounts for the ease of calibration is that since function points are independent of the programming environment, it is possible to use data gathered at other sites, as is currently being done by Software Productivity Research, Inc. [DREG89]
In the past, many people have used function points to then determine the LOC, and then have done the cost estimation using the LOC. This methodology is incorrect, in that it adds one more error factor into the equation. If function points are the only independent variable to estimate lines of code, LOC are not needed.
This newest methodology is too new to really be evaluated empirically. Matter of fact, the one paper available, published by Banker, Kauffman and Kumar [BANK91] gives data and correlations which until now I have been unable to verify. Either the data presented in Table 6 are incorrect, or the explanation of the variables given is such that I have been unable to fit the data to the model given.
The authors noted that there were significant variances across time, as software developers became more and more familiar with the CASE development environment which was being evaluated.
The concept of using objects to estimate cost is tantalizing in its simplicity, but has yet to demonstrate viability in the long-haul. It has been tested in only one case so far. However, any model which shows promise in spite of the significant variability introduced by a new software development technique should be evaluated more.
It is clear that at the current time no well-known model is available to practitioners who desire to put one into practice. At the same time, we can see that different companies, such as Anderson Consulting, offer cost estimation tools to their customers, and are highly "successful" at what they do.
From my experience and that of all practitioners who have attempted cost estimation, we note that cost estimation is a very difficult item, much subject to the variability in human beings. We must realize that in psychological research any model which can explain even 50% of the variance in behavior is highly regarded. Should we consider that human behavior is a large factor in the software development process, and therefore in the cost estimation?
Where are there successful models being built? It is in organizations which have a large number of applications development projects and have a very structured methodology for software development. I have been unable to find any published cost estimation methodology that has been shown to explain more than 70% of the variance across different organizations.
In what research has been done, and in practice, no cost estimation principle is extremely predictive without a given methodology. It is therefore necessary to attempt to study a given cost estimation technique in relation to a given methodology to attempt to develop an empirical model which would have a higher explanatory power than that of current models.
In the paper by Banker, Kauffman, and Kumar, it was made obvious that not only must the cost estimation technique be stable, but also the development tools must be stable. It is very difficult to develop a model which depends on what year in the cycle of development techniques.
There is currently an ongoing project by Software Productivity Research, Inc. to gather a set of over 10,000 varied projects using function point analysis [DREG89, p. 145]. This project, if completed, promises to be the first major empirical study on cost estimation across multiple development platforms, and multiple development techniques. In the bibliographic search conducted, no reports of the conclusions of this study have been reported.
While software cost prediction models are still in relative infancy, it is clear that each manager must be able to prepare a budget for the project. Of the techniques presented in this paper, the function points analysis technique is the most robust. This is not to say that it must be used to the exclusion of other techniques, but that it is the technique for which the largest body of empirical research has been conducted.
Object points is a promising technique in object-oriented CASE environments, but has much to be studied, and SLOC models are becoming outdated due to new methodologies.
Is there a "best" technique? Yes, whatever works
in the given environment. With careful calibration for a given
environment it is possible for the manager to develop a cost estimation
model which closely relates to the environment. This is not without
effort and much time, but can be financially rewarding, as well
as providing peace of mind for the manager.