Software Cost Estimation: A Review of Models, Process, and Practice FIONA WALKERDEN AND ROSS JEFFERY Centre for Advanced Empirical Software Research School of Information Systems University of New South Wales Sydney, Australia
Abstract This article presents a review of software cost estimation models, processes, and practice. A general prediction process and a framework for selecting predictive measures and methods are proposed and used to analyze and interpret the research that is reviewed in this article. The prediction process and selection framework highlight the importance of measurement within the context of system development and the need to package and reuse experience to improve the accuracy of software cost estimates. As a result of our analysis we identify the need for further work to expand the range of models that are the focus of software cost estimation research. We also recommend that practitioners adopt an estimation process that incorporates feedback on the accuracy of estimates and packaging of experience
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction to Software Cost Estimation. . . . . . . . . . . 1.2 Motivation for the Current Work . . . . . . . . . . . . . . . 1.3 Summary of the Work and Its Contributions . . . . . . . . . 1.4 Outline of This Article . . . . . . . . . . . . . . . . . . . . 2. The Prediction Process and Its Relationship to Quality Improvement
Paradigm . . . . . . . . . . . . . . . . . . . . . . 2.1 Characterize Current or Future Project. . . . . 2.2 Select Predictive Measures and Methods . . . . 2.3 Plan Measurement and Collection of Data . . . 2.4 Make Predictions and Assess Their Accuracy. . 2.5 Evaluate Predictive Measures and Methods Used
ADVANCES IN COMPUTERS. VOL 44
59
. . . . . . 61 . . . . . . . 61 . . . . . . 61 . . . . . . . 62 . . . . . . 63
. . . . . . . . . . . . . 63 . . . . . . . . . . . . . . 65
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
65 66 66 67
Copyright 0 1W7 by Academic Press All rights of reproduction in any form reserved I",LC
1"CO,",
c1c
M
60
3.
4.
5.
6.
7.
8.
FIONA WALKERDEN AND ROSS JEFFERY
2.6 Package the Experience . . . . . . . . . . . . . . . . . . . . . . . . . . 67 67 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Framework for Selecting Predictive Measures and Methods . . . . . . . . . . . . 68 68 3.1 The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 Predictive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 72 3.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.5 Viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.6 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.7 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.8 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Cost Estimation Processes . . . . . . . . . . . . . . . . . . . . . . . 76 4.1 Bailey and Basili Process . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 BoehmProcess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 79 4.3 DeMarco Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Heemstra Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.5 Arifoglu Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6 PROBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 82 4.7 Wideband Delphi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.8 Estimeeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Cost Estimation Models . . . . . . . . . . . . . . . . . . . . . . . . 87 88 5.1 Empirical Parametric Models . . . . . . . . . . . . . . . . . . . . . . . 5.2 Empirical Nonparametric Models . . . . . . . . . . . . . . . . . . . . 100 5.3 Analogical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 105 5.4 Theoretical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 107 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Cost Estimation Practice . . . . . . . . . . . . . . . . . . . . . . 111 6.1 Cost Estimation at the National Security Agency . . . . . . . . . . . . . 111 6.2 Cost Estimation within an Information Systems Department . . . . . . . . 111 6.3 Cost Estimation at Jet Propulsion Laboratory . . . . . . . . . . . . . . . 112 6.4 Cost Estimation at AT&T Bell Laboratories . . . . . . . . . . . . . . . 113 6.5 Survey of Cost Estimation in the Netherlands . . . . . . . . . . . . . . . 113 6.6 Survey of Cost Estimation in the United States . . . . . . . . . . . . . . 114 6.7 Expert Judgment in Cost Estimation . . . . . . . . . . . . . . . . . . . 114 115 6.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.1 Review of Software Cost Estimation Research . . . . . . . . . . . . . . 116 7.2 The Prediction Process and Selection Framework . . . . . . . . . . . . . 116 7.3 Analysis and Interpretation of Previous Research . . . . . . . . . . . . . 117 7.4 Needs for Future Research and Recommendations for Practice . . . . . . 117 7.5 Foundation for Software Cost Estimation Tool Development . . . . . . . 119 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.1 Overview of the Work That Has Been Done . . . . . . . . . . . . . . . 120 8.2 Areas That This Work Has Not Addressed . . . . . . . . . . . . . . . . 120 8.3 Outcomes of This Work and Opportunities for Further Research . . . . . 121 122 8.4 Consequences for Practitioners . . . . . . . . . . . . . . . . . . . . . . 8.5 Development of a Software Cost Estimation Tool . . . . . . . . . . . . . 122 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
SOFTWARE COST ESTIMATION
61
1. Introduction
1.1 Introduction to Software Cost Estimation Software cost estimation is commonly regarded as making estimates of the effort required to complete the software for a system development project or activity. The effort is a measure of labor cost and is reported in units such as person-months or person-years. The purposes for which software cost estimates are required are varied. Some of these are planning and controlling projects, assessing the feasibility of system development, and preparing quotes or bids for presentation to customers. Planning plays a key role in the successful management of system development activities. Consider a typical system development project faced with targets for both the functionality to be delivered and the time in which to deliver it. Those responsible for planning the project require estimates of the staffing levels needed to deliver the functionality within the target time. If the initial estimates suggest that the targets cannot be satisfied simultaneously, then estimates of staffing that meet either one of the targets may be required. Trade-offs between functionality and time will need to be made based on these estimates. Prediction for system development is not restricted to estimates of the effort and time needed to deliver a system, which are the focus of the software cost estimation literature (e.g., Boehm, 1981; DeMarco, 1982). Consider the example of an organization that procures a custom-made system for a safety-critical application. The organization must allocate a budget for the cost of maintaining the system, as well as for the initial delivery cost. The system may also have stringent reliability and performance requirements. Predictions of measures such as the mean time to failure in the delivered software, defect levels, response time, object code size, and the cost of ongoing software maintenance are needed to assess whether the project will satisfy its constraints. This example illustrates that preparing estimates of initial system development cost and schedule is just one part of the overall prediction, trade-off, and management activity.
1.2
Motivation for the Current Work
Software cost estimation is as much a relevant area of research now as it was 20 years ago, when difficulties of estimating were discussed in “The Mythical Man Month” (Brooks, 1975). Software cost estimates are typically inaccurate, and there is no evidence that the software development community is improving its ability to make accurate estimates. Two cross-
62
FIONA WALKERDEN AND ROSS JEFFERY
organizational surveys conducted in the 1990s (Heemstra, 1992; Lederer and Prasad, 1993) report more than 60% and more than 80% of projects as overrunning their budgets. In spite of the research effort that has been directed into developing models for software cost estimation, it appears that most estimates are made informally, cost models are used infrequently, and, where used, cost models are not associated with significantly greater accuracy (Heemstra, 1992; Lederer and Prasad, 1993). This suggests that it may be proving difficult for software developers to apply existing research on software cost estimation. Inaccurate estimates of software cost and delivery times have unacceptable consequences. For example, where effort is underestimated, a cost overrun may make a project unprofitable, and overruns in delivery time may result in lost business. Alternatively, cost or delivery time may be contained despite an underestimate, but the overall quality of the delivered system may be compromised. An overestimate of effort may also adversely affect the competitiveness of a business, for example, where a decision is made to cancel what would otherwise have been a profitable project or where the overestimate leads to subsequent overstaffing when a project is completed. The widespread inaccuracy of estimates and the consequences of inaccurate estimates are what motivates continuing research into methods and models for software cost estimation.
1.3 Summary of the Work and Its Contributions The work presented in this article draws together a wide range of research into software cost estimation. It reviews not only software cost estimation models, but also software cost estimation processes and practice. A prediction process and framework for selecting predictive measures and methods have been developed and used to analyze and interpret the research that is reviewed. The prediction process and selection framework highlight the importance of measurement within the context of system development, because measurement provides the data on which predictions are based. The prediction process and selection framework also emphasize the need to package and reuse experience to improve and sustain improvements in the accuracy of software cost estimates. As a result of its analysis and interpretation of software cost estimation research, this work is able to identify needs for further research in this field and make recommendations to practitioners about how to improve the accuracy of their estimates. There are needs to expand the range of models
SOFTWARE COST ESTIMATION
63
that are the focus of software cost estimation research. Practitioners need to adopt an estimation process that incorporates feedback on the accuracy of estimates and packaging of experience. They must also improve practice in the area of measurement, so that appropriate data are available for making predictions.
1.4 Outline of This Article The prediction process and the framework for selecting predictive measures and measures are presented in the following two sections, as these are the basis for analyzing and interpreting the software cost estimation research that is reviewed subsequently. The reviews of research into software cost estimation processes, models, and practice are each presented in sections following the selection framework. The penultimate section of this article explains the contributions of this work. The conclusion summarizes the work presented in this article, comments on areas that this work has not addressed, and gives a very brief overview of the software cost estimation tool that will be developed in the light of this work.
2. The Prediction Process and Its Relationship to Quality Improvement Paradigm A process for making predictions and improving their accuracy must include three activities: selecting and modeling the measures to be predicted, making the predictions, and evaluating the accuracy of the predictions. The inclusion of the first two activities is self-evident. Evaluation is needed as a goal of the prediction process to improve the accuracy of predictions. Evaluation provides the feedback into the prediction process. Without feedback, there can be no learning from experience and no basis for improving future predictions. A parallel can be drawn between the prediction process and the Quality Improvement Paradigm (QIP) (Basili and Rombach, 1988). Both involve selecting measures, making measurements, analyzing the data collected, and learning from experience. The prediction process can be viewed as a specialization of QIP, which focuses on making predictions about processes and products and on improving the accuracy of these predictions. Table I shows the six steps of QIP (Basili, 1995), along with the equivalent steps in the prediction process. The prediction process is illustrated in this section, step by step. A system development scenario is used to provide examples for each step. The exam-
64
FIONA WALKERDEN AND ROSS JEFFERY
TABLEI COMPARISON OF QUALITY IMPROVEMENT PARADIGM (QIP) A N D THE PREDICTION PROCESS QIP”
Prediction Process
~~
Step 1. Characterize the current project and its environment with respect to models and metrics. Step 2. Set the quantifiable goals for successful project performance and improvement. Step 3. Choose the appropriate process model and supporting methods and tools for this project. Step 4. Execute the processes. construct the products, collect and validate the prescribed data, and analyze it to provide real-time feedback for corrective action. Step 5 . Analyze the data to evaluate the current practices, determine problems. record findings, and make recommendations for future project improvements. Step 6 . Package the experience in the form of updated and refined models and other forms of structured knowledge gained from this and prior projects, and save it in an experience base to be reused on future projects.
Step 1. Characterize the current or future project and its environment with respect to the constraints and targets set for it. Step 2. Select the predictive measures and methods by which to assess whether the project will satisfy its contraints and targets. Step 3. Plan the measurement of data needed to support the prediction process. Step 4. Make the predictions and document the assumptions on which they are based. Collect and validate the prescribed data. Assess the accuracy of the predictions to provide real-time feedback for corrective action and to make new predictions based on it. Step 5. Evaluate the predictive measures and methods used by analyzing the data collected. Determine causes o f inaccuracy. record findings, and make recommendations for future improvements to predictions. Step 6. Package the experience as in QIP.
“ Source: From Basili. V. R. (1995). The experience factory and its relationship to other quality approaches. In “Advances in Computers.” Vol. 41, pp. 65-82. Reprinted by permission of Academic Press. San Diego, CA.
ples used in this section are indicative of the sorts of predictions and measurements made in a realistic scenario. However, in a realistic system development scenario many more predictions are made, based on many more assumptions, and supported by many more measurements than are discussed in the examples given here. In our scenario, an organization that provides a public information service is seeking to replace and upgrade its existing system. The existing system is being replaced because of increased user demand for the service. It is forecast that within 18 months the system will be unable to cope with the user demand. The new system must support the functions of the existing system, plus some enhancements. The scenario uses some examples from the prediction process within the company that successfully tenders to develop the new system.
SOFTWARE COST ESTIMATION
2.1
65
Characterize Current or Future Project
The first step in the prediction process involves gathering information about the project and its environment that is relevant to determining the constraints on it or targets for it, and to understanding and predicting whether these constraints and targets can be met. The sort of information gathered is the same as would be gathered in the first step of QIP. While tendering for the contract, the company in our scenario must understand the customer’s needs and interests and translate these into targets for the new system. It is apparent that both the performance of the new system and the schedule for its delivery are important from the customer’s point of view. Setting the right price for the new system is important from the company’s point of view, in order to be competitive with other tenderers.
2.2
Select Predictive Measures and Methods
The second step in the prediction process is to select the measures to predict for a project and the methods by which to predict them. This step is similar to the second step of QIP, which involves selecting the measures by which to assess whether project improvement goals are being met. The measures to be predicted should reflect the project’s constraints and targets, in order to assess whether these can be satisfied. We return to the differences between QIP and our framework in this respect, later. In our scenario the new system must be operational before the old system is unable to cope with user demand. The forecasts of user demand set a target for the project’s delivery date. From the customer’s perspective, replacing the old system is more important than adding the enhancements. In this situation, an appropriate measure to predict would be one that indicates how much functionality can be built and tested by a particular delivery date. This prediction can be used to negotiate with the customer about the subset of functionality on which to concentrate for an initial delivery, if the entire system cannot be completed by the target date. Function points (Albrecht, 1979) have been used as a measure of functionality, and models that relate development effort to function points have been explored (e.g., Albrecht and Gaffney, 1983). Based on assumptions about the labor hours available before the delivery date, a prediction of the number of function points to be delivered can be made. The function points for the entire system can be counted from the system specifications for comparison. The method by which the measure is predicted must also be selected. A regression model relating effort to function points could be used in reverse
66
FlONA WALKERDEN AND ROSS JEFFERY
to predict the function points, if sufficient data about previous, similar projects were available. However, the company in our scenario has developed only one system similar to the new system. In this situation, the prediction could be based on an analogy with this system.
2.3 Plan Measurement and Collection of Data The third step of the prediction process plans how to measure and collect the data needed to analyze the accuracy of the predictions and the data needed to make new or revised predictions. The measurement plan developed in the third step of the prediction process is no different from one that might be produced as part of the third step of QIP. The measurement plan is executed during system development. In our scenario, effort and function points are examples of the data that need to be measured and collected during system development. The effort expended in development would needed to be measured to assess whether the original assumption about the available labor was valid. Function points would need to be measured regularly to detect increases in scope, which could delay the delivery date. Where a prediction may lead to a decision not to proceed with system development, as in a feasibility study, this step would be deferred until after the prediction and subsequent decision were made.
2.4
Make Predictions and Assess Their Accuracy
In the fourth step of the prediction process, the predictions are made. As data become available, the predictions are analyzed to assess whether they are being met. The results of this analysis are fed back both to the system developers and to those making the predictions. This feedback gives developers the opportunity to take corrective action if appropriate; it also gives those making the predictions the opportunity to revise them based on the new data. The fourth step of the prediction process is analogous to the fourth step of QIP, where data are collected and analyzed to assess and provide feedback on whether the improvement goals are being met during system development. In our prediction scenario, if it is observed that the function point counts are increasing at each measurement, the reasons can be explored and the growth controlled. The delivery date may also be revised, when predicted on the basis of the new measurements. Some predictive measures may require data that are not available prior to system development, so new as well as revised estimates will be made
SOFTWARE COST ESTIMATION
67
as development progresses. For example, the effort required for integration testing may be predicted based on measures of system design attributes, when the design documentation is available.
2.5
Evaluate Predictive Measures and Methods Used
The fifth step of the prediction process involves evaluating the measures and methods that have been used to make predictions for system development. This step is needed to learn from experience. The reasons for inaccurate predictions need to be understood and the findings recorded. These findings should be accompanied by suggestions for improving the accuracy of future predictions. For example, in our scenario, the reasons for inaccuracy in predictions of the delivery date would be explored. If one of the reasons for inaccuracy was growth of requirements during development, a recommendation might be made to allow for requirements growth in future predictions. This step is equivalent to the fifth step of QIP, where experience is gained from analyzing the data collected during system development. There are some circumstances in which no comparison of a prediction and the outcome is possible. For example, in a feasibility study for system development, a cost estimate may be used to decide whether to proceed. The estimate and associated decision could, however, be recorded and used for comparison in future feasibility studies.
2.6 Package the Experience The sixth and final step involves packaging the experience gained so that it can be reused by the organization in future. This step is the same as the final step in QIP. It is especially pertinent in a prediction process, as accurate predictions can only be expected where there is prior experience. Predictions made without reference to prior experience can be little more than guesses.
2.7
Discussion
Although both QIP and the prediction process are described in the context of a project, they could both be applied to any system development activity. An organization that maintains and enhances a software product could equally well employ QIP and the prediction process as a projectbased organization that develops one-off systems. For example, a productbased organization may set a target for the next release of the product and
68
FIONA WALKERDEN AND ROSS JEFFERY
then predict how many of the possible enhancements can be included by the release date. Both QIP and the prediction process rely on measurement. The measurement needs of a quality improvement initiative such as QIP and a prediction process could be satisfied by a single program within an organization. This can readily be achieved by specifying the measurement of process or product attributes for the purpose of prediction as QIP goals. QIP is also associated with a model of how to package and reuse experience: the Experience Factory (Basili et a/., 1994). This model can equally be applied to the packaging and reuse of experience in the prediction process. The combination of the approaches within QIP to measurement and the packaging of experience suggest that organizations that employ QIP could easily adopt the prediction process. The step that differentiates the prediction process from QIP is the second, where the predictive measures and methods are selected. This selection process differs from processes for selecting improvement goals. The next section describes a framework for selecting predictive measures and methods that can be used as a basis for the second step of the prediction process.
3. Framework for Selecting Predictive Measures and Methods 3.1 The Framework The second step of the prediction process is a subprocess, which selects what to predict and how. The framework for selecting predictive measures and methods provides a model for use in this selection process. The measures define what is to be predicted and the methods define how a measure is to be predicted. The framework occupies an equivalent place in the prediction process to approaches such as the Goal/Question/Metric Paradigm (GQM) in QIP (Basili and Rombach, 1988). The GQM can be used when initiating a measurement program within an organization. The GQM captures organizational and project goals and develops measures by which progress with respect to these goals can be assessed and evaluated. Where the GQM provides a detailed process for defining measurable goals, the framework for selecting predictive measures and methods is descriptive. The framework characterizes predictive measures and methods and describes the factors that influence the selection and why. The framework does not describe the process by which to make the selection.
SOFTWARE COST ESTIMATION
69
The following factors influence the selection of predictive measures and methods: 0 0 0 0
The purpose of the prediction The viewpoint of those who will use the prediction The time at which the prediction is made The environment within which the prediction is made The experience that can be drawn on to make the prediction (Fig. 1)
Each element of the framework is described in this section. Measures and methods are described first, as their characteristics drive the selection process. Next, the elements that influence the selection process are described. Purpose, viewpoint, time, environment, and experience are considered in turn. The purpose of the prediction is the central influence on what is predicted, and so this is considered first. Each of these elements has characteristics that contribute to the selection process. The characteristics can be used as a selection criterion for measure, method, or both. Examples are provided to illustrate the contribution of each characteristic.
3.2 Predictive Measures A measure is a numeric representation of a specific attribute of an entity
( Fenton, 1991). Measures for system development represent attributes of
process inputs, outputs, or the process itself. For example, a count of requirements is one measure of the length of an input to a system development
FIG.1. Factors that influence selection of predictive measures and methods.
70
FIONA WALKERDEN AND ROSS JEFFERY
process, a count of delivered lines of code is a measure of the length of an output, and the number of hours spent developing the code is a measure of effort for the process of implementation. Measures can be classified as either direct or indirect. A count of delivered lines of code is an example of a direct measure. Its value can be determined by applying a measurement technique directly to the product of interest. Indirect measures rely on measurements of one or more other attributes. Measures of productivity, which are based on rates of completing an output of the development process, such as lines of code per man day, are examples of indirect measures. In this example the relationship between the measures is determined by the definition of the productivity measure. A measure of development effort that is calculated from a measure such as lines of code is another example of an indirect measure. This measure is based on an empirical relationship among the development process attribute, effort, and an attribute of a process output, the length of the software produced. This empirical relationship is formalized via a numerical model. Predictions are usually based on indirect measures, because the attribute of interest is not directly measurable at the time the prediction is made. For example, the effort to implement the software for a system may be predicted using a model relating this effort t o an attribute of the system design, such as its length measured by lines of text. The actual hours spent implementing the software can be measured directly. Afterward, the prediction of effort, made via the indirect measure, can be compared with the direct measure of effort to assess the accuracy of the prediction. When defining indirect, predictive measures, it is preferable for the models to depend on attributes that are directly measurable at the time a prediction is made, rather than attributes that must also be predicted. For example, if the effort to implement software is predicted from the length of the software to be produced, a prediction of the length of the software is needed, too. It is preferable to be able to measure directly all inputs to a prediction model, as some models amplify the uncertainty in their input values. Direct measures minimize the uncertainty that is introduced into an estimate. Models of development effort based on the length of software produced can successfully explain much of the variation in effort. Regression models of effort and lines of code typically show high correlations. For example, Conte et af. (1986) give an example of a linear regression model with a correlation coefficient, R2, of 82%.However, the difficulty of predicting measures such as lines of code early in the system development process limits the usefulness of such measures in predicting effort at that time.
SOFTWARE COST ESTIMATION
71
3.3 Prediction Methods The framework recognizes four classes of prediction methods: 0 0 0 0
Empirical Analogical Theoretical Heuristic
Any prediction that relates the attribute of interest to other measurable attributes must be based on an empirical model. This empirical model is the starting point for each prediction method. Empirical methods analyze data to establish a numerical model of the relationship between measures of the attributes in the empirical model. Regression analysis is one example of an empirical method. Analogical prediction methods use measures of the attributes from the empirical model to characterize the current case, for which the prediction is to be made. Known values of measures for the current case are used to search a data set for analogous cases. The prediction is made by interpolating from one or more analogous cases to the current case. Atkinson and Shepperd (1994) describe a method for estimating development effort for a software project by analogy, which represents projects by their function point components (Albrecht, 1979). Theoretical prediction methods propose a numerical model based on the empirical model. The theoretical model must be validated empirically, by comparison with actual data for the measures. Abdel-Hamid and Madnick (1986, 1991) have developed a theoretical model of software development project management, based on dynamic feedback relationships among staff management, software production, planning, and control. Heuristic methods are used as extensions to the other methods. Heuristics are rules of thumb, developed through experience, that capture knowledge about relationships between attributes of the empirical model. Heuristics can be used to adjust predictions made by other methods. For example, Cuelenaere et al. (1987) describe an expert system that uses rules to assist in calibrating the PRICE SP software cost estimation model. Further examples of each of these prediction methods are found in Section 5, which reviews software cost estimation models. The models and methods by which they were developed are classified according to the framework. Expert judgment is also recognized as a prediction method (e.g., Boehm, 1981; Heemstra, 1992). Experts may employ one or more of the other methods in making predictions, either informally or formally. It is likely
72
FIONA WALKERDEN AND ROSS JEFFERY
that expert judgment is employed to make predictions whenever an expert is available. Expert judgment is not included in the framework for selecting prediction methods, as this method cannot easily be characterized, and it is assumed that it is selected whenever experts are available.
3.4 Purpose The purpose for which a prediction is required is the central influence on what measures are to be predicted. It defines the reason for making the prediction. Examples of purposes for which predictions are required include the following: 0 0 0 0
Exploring the feasibility of developing or purchasing a new system Exploring the impact of changing the functions of an existing system Planning how to staff a software development project Quoting a price or schedule for a new system
The purpose of a prediction has the most influence on decisions about which system attributes are of interest. These attributes determine the sorts of measures that are appropriate. The measures to predict may be immediately apparent from the purpose. In our system development scenario, introduced in the previous section, the company must prepare a quote for the tender. The cost of the system in either the local or a foreign currency is the obvious measure. This cost could be divided between software and hardware costs. In this case, the cost of the hardware for the system must be predicted for inclusion in the quote. The hardware cost includes the cost of the hardware for a server on which the database supporting the information service resides as well as costs for client hardware and peripherals. Before approaching a hardware vendor to request a quote for the server, the bid team must predict how much storage capacity is required, what bandwidth the communications interfaces must support, and what processing capacity is required to handle the user demand. From an initial measure of total cost, the number of measures to be predicted grows as the problem is decomposed. The purpose may also dictate how quickly a prediction is needed, and how certain a prediction needs to be. The length of time available to make a prediction influences the choice of prediction method. For example, the quote may need to be prepared quickly for a tender. If the bid team had access to numerical models developed from similar systems on which to base capacity predictions, these could quickly be used for the quote. As no existing models are available, the prediction is made by analogy with the similar system, as this is less time-consuming than developing numerical models from scratch.
SOFTWARE COST ESTIMATION
73
In our scenario, the bid team chooses to predict the processing power as a ratio of the processing power for the similar system. The similar system is known to be able to handle a certain peak load measured in database transactions per second. From the customer’s specifications they estimate the number of user-initiated transactions per second that the new system must handle and apply a heuristic to convert these into database transactions. After calculating that the ratio of these transaction rates is approximately 1.5, the hardware vendor is approached to prepare a quote for the cost of a machine that has at least 50% more processing power than the machine used in the similar system. The need to quantify the uncertainty in a prediction also influences the choice of prediction method. The uncertainty that can be tolerated in the costs on which the quote is based may be determined by the company’s policy on contingency margins and whether the cost is perceived to be competitive. Estimating the uncertainty in a prediction may be difficult where the prediction is made by analogy. In the example just given, the uncertainty in the prediction of the relative processing power required would depend on an estimate of the uncertainty in the prediction of the peak database transaction rate required. As this was only counted from a preliminary specification and relied upon a heuristic, this is difficult to quantify. If a numerical model had been developed from previous measurements, it might have been used to estimate likely bounds on the transaction rate.
3.5 Viewpoint The viewpoint of the person or people who will use a prediction also influences the measures to be predicted. It describes the stakeholders who have a need for the prediction. Examples of people who use predictions for system development include customers, bid teams, project managers, and software developers. Viewpoint, like purpose, contributes to the decisions about which system attributes are of interest and which lead to measures. For example, the customer for the new system and the project manager are both interested in how long it will take to deliver the completed system. The project manager and the software developers are also interested in how long individual tasks will take. Viewpoint also influences the accuracy with which a prediction needs to be made. The need for greater accuracy can influence the selection of both measure and method. For example, the bid team and developers view the selection of system hardware differently. The bid team is interested in making a quick prediction of the required system capacity, to select and
74
FIONA WALKERDEN AND ROSS JEFFERY
cost the system hardware. Once the contract is awarded, the developers are constrained by the cost of the hardware in the bid, and the system’s performance requirements. Before the purchase is made, the developers need to establish with certainty that the selected hardware will be adequate to meet the user demand. In preparing the bid, an estimate of the required system capacity was made by an analogy with one similar system, and the measure was an approximate ratio of the two systems’ capacities. The developers must also estimate system capacity, but use a method and measure that allow a more precise assessment of whether the selected hardware will meet the requirements. The developers make a new prediction of the required processing capacity based on a detailed analysis of the system’s requirements. This allows them to develop a model of the peak load on the new system based on a set of user-initiated transactions over a period of time. Using CPU utilization statistics for the hardware platform provided by the database vendor, the developers are able to estimate the total CPU utilization for the set of transactions, and then compare this t o the CPU capacity available for user transactions during the peak period.
3.6 Time Time represents when a prediction is made: the point at which a prediction is made with respect to system development methods. The time at which a prediction is made determines what data is available as input to the prediction process. The availability of data determines which measures can be predicted and the methods by which they can be predicted. For example, in our scenario, the developers need to make a more precise prediction of the required processing capacity. This prediction can be made only when a detailed analysis of the systems requirements has been documented. The time at which a prediction is made also influences which system attributes are of interest, and hence measures. For example, initially the project manager is interested in an overall prediction of system testing effort. Once the predictions of the required processing capacity have been revised, the project manager is interested in a prediction of the effort required to carry out performance testing and tuning on the system, because there appears to be some risk that the system will not meet its performance requirements.
3.7
Environment
The environment for system development includes targets and constraints based on project and organizational goals, inputs to the development pro-
SOFTWARE COST ESTIMATION
75
cess, and project plans. This view of the environment for system development is based on the description of environment used in the GQM process (Basili and Rombach, 1988). Targets and constraints become measures in predictions that are needed to assess whether they can be met. In our scenario, the customer’s primary reason for purchasing the new system is that the existing system cannot meet the forecast user demand. In consequence, two of the project’s most important constraints are the customer’s performance target, and the date by which the existing system should be upgraded to continue to satisfy user demands. Performance and time to delivery are hence system and development attributes of great interest to the customer. This motivates the predictions of system capacity required to meet the performance target during development and the predictions of delivery date. The environment includes the artifacts on which measurements may be made to support predictions. The availability of such artifacts limits which measures can be used for a prediction. An example of an early life-cycle artifact is the customer’s specification. In our scenario, this is main information available prior to the contract being awarded. The bid team relies on it to estimate the number of user-initiated transactions per second that the new system must handle. The environment also includes attributes that characterize the system and the development process, such as the application domain and the available labor. Measures of current system and development process attributes such as these are inputs to the prediction process. System and development process attributes also determine the differences and similarities between the new system development and previous experience within an organization. In our scenario, the company has developed only one similar system, and the experience from this system becomes the basis of a number of predictions for the new system.
3.8 Experience The “experience base” in the Experience Factory (Basili et al., 1994) is an appropriate model for the sort of experience relevant to the prediction process. It includes elements such as data from measurement programs, models of products and processes, evaluations of methods and techniques, and written records of the lessons learned from previous developments. Available experience influences the selection of both the measures to be predicted and the methods by which to predict them because it contributes prior data, models, and expertise. In our scenario, some measurements from the similar system developed previously are available. However, the company had no models on which to base predictions of the required system
76
FIONA WALKERDEN AND ROSS JEFFERY
capacity while preparing the quote. As the quote was needed quickly, the prediction was made less precisely by analogy with what was known from the one similar system. The measures and method selected in this case, because of the time constraints and limited information, also made it difficult for the bid team to quantify the uncertainty in their prediction.
3.9 Discussion The elements of the framework are not mutually independent. For example, the viewpoint of the person using a prediction or the time at which a prediction is made can be associated with the purpose. However, the examples given in the preceding sections demonstrate that purpose, viewpoint, time, environment, and experience each contribute in a distinguishable way to the selection of what to estimate and how. The GQM process (Basili and Rombach, 1988) uses elements equivalent to purpose, viewpoint, environment, and experience as inputs to the selection of measures. The elements of the selection framework that are novel are time and method. Time has been shown t o be significant in selecting predictive measures and methods because it largely determines what data are available about the system being developed. The previous sections also demonstrate that selecting the method is an important step in preparing to make a prediction. The selection of a method depends on the following factors: 0
0 0 0 0
The availability of current and prior data The availability of prior models developed via the method How much time is available to make a prediction How certain the prediction should be and whether this needs to be quantified How much expertise the people making the prediction have with the method
All but the last of these factors also contribute to the selection of the predictive measure, although the primary factor in this selection is what attribute is of interest. The selected measure also influences the selection of method. The measure is what determines the availability of current and prior data, from the perspective of the method.
4.
Software Cost Estimation Processes
This section describes software cost estimation processes that have been proposed over the previous 15 years. Most of the processes (Bailey and
SOFTWARE COST ESTIMATION
77
Basili, 1981; DeMarco, 1982; Heemstra, 1992; Arifoglu, 1993) describe how to apply a single prediction method, which is based on one or more algorithmic models. Boehm (1981) takes a broader view of the software cost estimation process. Boehm’s process includes establishing the objectives of the estimate, planning the estimation activity, and applying and reconciling estimates made by several techniques. In contrast to the other software cost estimation processes are two examples of processes for group estimation: “Wideband Delphi” and “Estimeeting.” These group estimation processes are adjuncts to the others, rather than replacements for them. They suggest two different approaches to involving a number of people in the estimation process. At the conclusion of this section, the software cost estimation processes are compared with the prediction process presented in Section 2. There are elements in these processes that are shared with, missing from, or additional to those of the prediction process. These elements are identified and discussed to evaluate the contributions of the prediction process and the limitations of the proposed software cost estimation processes.
4.1
Bailey and Basili Process
Bailey and Basili (1981) proposed the “Bailey-Basili meta model” as a process for effort estimation that can be used in a particular organization. The process has three main steps: 1. Calibrate the effort model. The effort model is calibrated by using a least-squares regression on a local data set to calculate the model’s coefficients. 2. Estimate upper and lower bounds on the model’s predictions. The calibration step results in the calculation of the “standard error of the estimate” (SEE) for the model. The SEE can be used to calculate upper and lower confidence intervals for an effort estimate made by the model, assuming a normal distribution. 3. Adjust the model’spredictions to take account of significant cost drivers. The effect of cost drivers on effort is estimated by calculating an effort adjustment factor and applying this adjustment to the original effort estimate. The effort adjustment factor for a point used in the original regression is the number by which the actual value should be multiplied to get the predicted value. An effort adjustment model is calibrated using multiple linear regression to determine cost driver coefficients for three cost drivers, representing the methodology used in software development, complexity of the software to be developed, and experience of the personnel.
78
FIONA WALKERDEN AND ROSS JEFFERY
The final estimate is made by using both models: the first to make an initial estimate of effort, and the second to adjust that estimate to take account of cost drivers.
4.2
Boehm Process
Boehm (1981)proposes a seven-step process for software cost estimation, which is summarized briefly as follows. 1. Establish objectives. The first step is to establish what the objectives of our cost estimates are, so that we avoid wasting time gathering information and making estimates that are not relevant to the decisions we are making, and so that we estimate to a consistent and appropriate degree of accuracy for these decisions. 2. Plan for required data and resources. In the next step, we should plan our estimation activity, to overcome the temptation or pressure to shortcut the process of preparing estimates. 3. Pin down sofrware requirements. The third step requires us to establish whether the specification on which the estimates are based is costable. We need to know what we are trying to build to estimate at all accurately. Boehm suggests that a measure of whether a requirement is costable is whether it is testable, so the testability of each requirement can be reviewed. 4. Work out as much detail as feasible. The fourth step advises us to make an estimate based on as much detail as is consistent with our estimating objectives, because more detail will improve the accuracy of our estimates. 5. Use several independent techniques and sources. Boehm classifies estimation techniques as algorithmic models, expert judgment, analogy, Parkinson (work expands to fill available resources), price-to-win, top-down, and bottom-up. The fifth step of the software cost estimation process advises us to use more than one technique to make a cost estimate. Using a combination of techniques allows us to avoid weaknesses of any single technique and to take advantage of each technique’s differing strengths. 6. Compare and iterate estimates. Once we have made our estimates using two or more techniques, we should compare them. If they are inconsistent, our next step is to investigate the differences and make another estimate using the same techniques. The goal is for the estimates to converge on a more realistic estimate, rather than making an arbitrary compromise between the initial estimates. 7. Follow-up. The final step in the software cost estimation process follows up on the estimates made during previous steps. We must collect data on actual costs and model parameters during the project, and compare these to the estimates. We should use the results of the comparison between
SOFTWARE COST ESTIMATION
79
estimates and actual values to improve our estimation techniques and models. We should update our cost estimates once more accurate estimates or actual measurements of model inputs can be made. We should also reestimate costs when the scope of the project changes.
4.3 DeMarco Process DeMarco (1982) proposes a process for developing and applying cost models based on locally collected data. He advocates the use of singlefactor cost models derived by linear regression. A single-factor cost model predicts effort for all or part of a project, based on a single independent variable. The effort estimate obtained from the single-factor cost model should be adjusted by applying a correction factor. The correction factor is derived in the manner proposed by Bailey and Basili (1981). DeMarco’s approach supports bottom-up as well as top-down estimation. He suggests breaking down a project into cost components such as design effort, implementation effort, and integration effort. Each cost component is estimated by one or more models based on measures that are available at the time the estimation is made. Suggestions of which cost components to correlate with which measures are included in the process description. DeMarco also advocates the use of a time-sensitive model that relates total project effort to duration. He suggests using the COCOMO schedule model (Boehm, 1981) for this purpose. The activities that make up DeMarCO’S software cost estimation process are as follows. 1. Select the cost components to be modeled, The first step in the process involves selecting the components for which to make a cost estimate. The cost components, from which a total cost estimate will be derived, are based on a decomposition of system development into activities such as specification, design, implementation, and testing. 2. Select the measuresfrom which to predict the component and total cost. Once the cost components have been selected, the next step is to select measures from which the effort to complete these components can be predicted. For example, code measures could be used to predict testing costs. 3. Develop the related single-factor cost models, based on existing data. Once the measures have been selected, the regression models from which effort is to be predicted must be calculated from existing project data. 4. Estimate cost and the standard error of the estimate for each component model. When the value of the measure used in the single-factor cost model can be estimated or measured, the model is used to predict effort for the component. If an estimate of the input measure value is used, the effort
80
FIONA WALKERDEN AND ROSS JEFFERY
should be reestimated when the actual value of that measure can be measured. 5. Adjust the estimates based on differences between the current and past projects. DeMarco recommends that the effort estimates calculated from the single-factor cost models should be adjusted based on a subjective assessment of the differences between the current and past projects used to derive the cost model. 6. Calculate the total cost from the component costs. Once component costs are estimated, the total cost is estimated from the sum of the component costs. 7. Calculate a total estimating error. The error in the total estimate is estimated by adding the standard error of estimate for each component, which should be conservative in most cases. 8. Use a time-sensitive cost model to constrain the estimate of total project effort. The final step in DeMarco’s process uses the estimate of total project effort as an input to a time-sensitive cost model, such as the COCOMO schedule model, in order to estimate project duration. DeMarco (1982) emphasizes the need for estimators to have feedback on their performance and improve the accuracy of their estimates.
4.4
Heemstra Process
Heemstra (1992) describes a general process of software cost estimation. His process assumes the use of effort models that depend on size, and the use of cost drivers in conjunction with these models to make productivity adjustments. The process involves seven steps. 1. Create a database of completedprojects. The initial step in the estimation process is the collection of local data to be used in validating and calibrating a model. 2. Validate the estimation model. The environment in which a cost model has been developed and the completed projects on which it is based may differ from the environment in which the model is used. Heemstra advises, therefore, that before a precalibrated model is used for the first time in an organization, the model should be validated by assessing its accuracy on data from projects completed in that organization. 3. Calibrate the estimation model. If the previous step indicates that the model is not valid for the environment in which it is now being used, the model needs to be recalibrated using data from projects that have been completed in the current environment.
SOFTWARE COST ESTIMATION
81
4. Estimate size. In this step, the value of a size measure is estimated from characteristics of the software to be developed. 5. Estimate effort and productivity. Once a size estimate has been made, the next step converts the size estimate into an estimate of effort in personmonths. The cost drivers that influence the effort for software development are assessed and used to adjust the effort estimate. 6. Distribute effort across project phases. In the next step, the total effort and project duration are distributed across software development phases. 7. Analyze the sensitivity of the estimate and associated risks. The sensitivity and risk analysis step of the software cost estimation process assesses the sensitivity of cost estimates to values of the cost drivers and determines risks associated with project cost estimates.
4.5 Arifoglu Process Arifoglu (1993) has also proposed a general process of software cost estimation. His process involves four steps. 1. Estimnte size. This initial step is equivalent to the size estimation step of Heemstra’s process (Heemstra, 1992). 2. Estimate effort and duration. The next step converts the size estimate to an estimate of total effort and an estimate of project duration. Arifoglu suggests the use of the COCOMO and COCOMO schedule models for this step. 3. Distribute effort and duration across life cycle. Once the total effort and duration have been estimated, these are distributed across the software development life cycle. 4. Normalize effort to calendar time. The final step of Arifoglu’s process converts the number of working days required to complete a project to a number of elapsed calendar days. Arifoglu advocates the use of the model of Esterling (1980) to perform this conversion. Esterling proposes a model to estimate the percentage of a working day that is spent working directly on a task. However, this step in the process follows a prediction of project duration using a model such as COCOMO. The duration predicted by COCOMO is already a calendar time estimate (including weekends, and average allowances for vacation and sick leave), so in this case the process may take nonproductive calendar days into account twice.
4.6
PROBE
Humphrey (1995) describes a process of “proxy-based estimation” (PROBE) as part of his personal software process. The PROBE method
82
FIONA WALKERDEN A N D ROSS JEFFERY
advocates the use of personally selected measures and regression models based on personal data to estimate the size of a software product. A proxy is a size measure that can be used to estimate a product’s length in lines of code. As an example, PROBE uses a count of objects as a proxy size measure. The estimate of lines of code is used to predict the effort for an individual to produce the product. The upper and lower bounds of the prediction interval must also be estimated. The schedule for producing the product is estimated from the effort by allowing for the proportion of potential working hours spent on activities not directly related to production, such as vacations, sick leave, or paperwork. Simple regression models are used to estimate lines of code from proxy size measures and labor hours from lines of code. Feedback from each estimate, by comparing it with a measurement of the actual value, is part of the personal software process. The feedback is introduced to improve an individual’s estimating performance. The philosophy of the personal software process is to improve the abilities of individuals. This limits the applicability of PROBE to team projects, as individual performances vary too widely to be used t o predict costs for an entire project (DeMarco and Lister, 1987).
4.7 Wideband Delphi It is possible to improve the accuracy of an estimate by combining the judgment and estimates made by a number of people. Boehm (1981) describes the Wideband Delphi technique for combining the opinions of experts to make a size estimate. Each expert is provided with a specification and an estimation form. In a group meeting, the experts discuss the estimation issues. Each expert fills out the estimation form anonymously. A summary of the estimates from this round is presented to each expert, and another group meeting is held to discuss the estimates from the previous rounds. Further rounds are conducted until the estimates are judged to have converged satisfactorily. Wideband Delphi is similar to the original Delphi technique, which originated at the RAND Corporation. However, before each round of anonymous estimates is made, the experts meet t o discuss issues and the last round’s estimates. In the original technique there is no interaction between the experts. The discussions make it less likely that issues relevant to an estimate are overlooked.
4.8
Estimeeting
Taff et al. (1991) describe another process for group estimation, “Estimeeting,” which has been used by AT&T Bell Laboratories. A meeting,
SOFTWARE COST ESTIMATION
83
known as an “estimeeting,” is held to produce an estimate of the effort to develop a feature of a system. Prior to the estimeeting, a group of estimators review the requirements and high-level design for the feature. The estimators are not part of the team that is developing the feature, but are experts on subsystems that will be changed when the feature is implemented. In an estimeeting, the feature team present the requirements and the design to the estimators. The presentation is followed by a question-andanswer period. Then the estimators make estimates for the subsystems on which they are expert. The estimators consult among each other and with the feature team members. Following the meeting, a feature team member is responsible for summarizing the results of the meeting and calculating the total estimate. Both Wideband Delphi and Estimeeting take advantage of meetings to discuss and understand the feature or activity for which an estimate is to be made. However, in Wideband Delphi the final estimate is reached by consensus, whereas in an estimeeting the final estimate is a summation of the individual estimates. Estimeetings allow estimates to be made in more detail, but require a high-level design so that the allocation of requirements to subsystems is known. Wideband Delphi has the advantage of consensus on each estimate, but estimates are likely to be made in less detail, as each estimator duplicates the work of the others.
4.9
Discussion
4.9.1 Characterize Current or Future Project The first step of the prediction process and the Boehm process aim to characterize the project for which a prediction is to be made and its environment. The other software cost estimation processes described previously do not explicitly include such a step. However, this characterization is implicitly part of any process that involves the assessment of the influence of cost drivers on effort estimates. A possible advantage of carrying out the characterization step explicitly is that the constraints and targets set for the project will be clearly defined prior to estimates being made. This allows the constraint values to be used as inputs to predictions.
4.9.2 Select Predictive Measures and Methods The Boehm (1981) and DeMarco (1982) processes each include steps that involve selecting the measures to predict, as accomplished by the second step of the prediction process. In contrast, the processes of Bailey
84
FIONA WALKERDEN AND ROSS JEFFERY
and Basili (1981), Heemstra (1992), and Arifoglu (1993) each assume that a single measure of system size, such as lines of code or function points, will be used to generate a prediction of overall effort. However, in a realistic system development scenario more than a single overall estimate of effort and duration will be needed. The inclusion of a step to select which measures to predict allows a process to be applied more generally than a process that assumes the measures are the same for all projects.
4.9.3 Plan Measurement and Collection of Data Boehm (1981) includes a step to plan for the data and resources required by the estimation activity. This is analogous to the third step of the prediction process. The activity of collecting the data to support calibration of estimation models is included explicitly in the process of Heemstra (1992). The processes of Bailey and Basili (1981) and DeMarco (1982) imply data collection also, as these processes include steps in which estimation models are developed or calibrated. Data collection is also part of the PROBE process (Humphrey, 1995). As reliable data collection is likely to be a significant effort in a team environment, it is arguably important to include the planning of measurement and data collection as a distinct step in an estimation process.
4.9.4 Make Predictions and Assess Their Accuracy The fourth step of the prediction process involves assessing the accuracy of the predictions, to provide feedback to the system development team. Feedback is necessary for improving the accuracy of estimates, as well as for project planning and control. Feedback is part of the processes of Boehm (1981), DeMarco (1982), and Humphrey (1995), but it appears to be absent from the other processes.
4.9.5 Evaluate Predictive Measures and Methods Used The fifth step of the prediction process analyzes and evaluates the measures and methods used for prediction. Improving the accuracy of the next set of predictions, by improving estimation techniques and models, is also one of the intentions of the follow-up step in the process of Boehm (1981).
4.9.6 Package the Experience The final step of the prediction process packages what has been learned so that this experience can be reused when making predictions in the future.
SOFTWARE COST ESTIMATION
85
Packaging what has been learnt from the current experience in estimating is not an explicit part of the processes described. Including this activity explicitly in the process is valuable as the activity contributes to future rather than current projects, and may therefore be overlooked.
4.9.7
The Prediction Process and Additional Elements of Software Cost Estimation Processes
There are a number of steps in the software cost estimation processes described earlier that do not appear either explicitly or implicitly as part of the prediction process. The validation step included by Heemstra (1992) assesses whether a preexisting software cost model is applicable in a particular environment. An assessment such as this should be part of the selection step of the prediction process, where a preexisting model may be selected. The manner in which the prediction process is described implies that models are available from the previous experience, and that selections are made among these. The final step of the prediction process, where experience is packaged, includes the activity of updating models in the experience base, so that these can be used in future predictions. In practice, it may be necessary to calibrate or develop a model from existing data at the start of the prediction process, as no prior models may be available. This scenario is reflected in the processes of DeMarco (1982) and Heemstra (1992), which include steps explicitly for model development and calibration before predictions are made. The processes of DeMarco (1982), Heemstra (1992), and Arifoglu (1993) explicitly include a step to estimate the duration or elapsed time for the software project. The prediction process does not prescribe which measures should be estimated, as the appropriate measures are assumed to be dependent on the context in which the predictions are made. As software cost estimation is often used to support project planning, it is likely that an estimate of duration is often sought in conjunction with an effort estimate. However, in situations where the duration is a constraint on the software project, it may be more effective to predict how much can be achieved in the available time, using the duration as an input to the prediction. Once the estimate of effort has been made, Heemstra (1992) and Arifoglu (1993) include a step that distributes effort across the project phases, or life cycle. These two processes are both examples of a top-down approach to estimation. A top-down software cost estimate is an estimate for total effort or duration, based on overall characteristics of a project. It must be
86
FIONA WALKERDEN AND ROSS JEFFERY
broken down into estimates for individual tasks if it is used in project planning. A bottom-up approach to software cost estimation decomposes the system development into a series of activities or system deliverables and estimates effort for each of these individually, and then combines these estimates for a total estimate of effort. Both bottom-up and top-down approaches can be applied to each of the prediction methods discussed in the selection framework. The prediction process does not assume that one or the other is preferable. This will depend on the circumstances in which predictions are being made. Boehm (1981) includes a step to pin down the software requirements, which emphasizes the need for defining a set of costable requirements before attempting to make estimates based on them. The importance attached to having well-defined requirements on which to base estimates is confirmed by the survey of Lederer and Prasad (1993), in which participants viewed changes and misunderstandings about requirements as the key causes of inaccurate estimates. However, in practice, estimates are often required in circumstances where no clear definition of the requirements is available or can be prepared as a basis for estimation, for example, in bidding for a contract. In these circumstances it is important that the assumptions on which an estimate is based be documented, as suggested in the prediction process. The processes of Bailey and Basili (1981), DeMarco (1982), Heemstra (1992), and Humphrey (1995) all incorporate estimates of the uncertainty or prediction intervals associated with an estimate. The prediction process does not refer to the need to do this, although consideration of the need for uncertainty estimates is part of the selection framework. Other process steps that explore the certainty of software cost estimates are sensitivity analysis (Heemstra, 1992) and the use of more than one model to estimate same quantity (Boehm, 1981). Heemstra (1992) includes risk analysis as part of the software cost estimation process. Risk analysis and software cost estimation are interrelated project management activities. Project risks can be assessed in terms of their potential cost and schedule impacts (e.g., Fairley, 1994). Any of the assumptions on which an estimate is based poses a risk to a project, based on the likelihood that the assumption turns out to be incorrect and the impact of its being incorrect on the estimate. The assumptions documented as part of the prediction process are inputs t o the risk analysis. The team processes described in this section serve as examples of how to combine the estimates generated by team members. They could be used in conjunction with any of the other described software cost estimation processes, or the prediction process.
SOFTWARE COST ESTIMATION
87
5. Software Cost Estimation Models This section reviews a variety of software cost estimation models. Software cost estimation models have received more attention from researchers than processes or studies of cost estimation practice. The selection framework classifies prediction methods as empirical, analogical, theoretical, and heuristic. We classify each model presented in this section according to the prediction method on which it is based primarily, as a model may be based on more than one of these methods. Models based on each method are presented in separate subsections. The classification of empirical models is extended in this section, by classifying the models as either parametric or nonparametric. A parametric model has an explicit functional form, relating a dependent variable to one or more independent variables, for example, the COCOMO model (Boehm, 1981). A nonparametric model has no explicit functional form, for example, a model developed using a machine learning technique such as an artificial neural network. Thus the subsections based on our classification are these: 0 0 0 0 0
Empirical parametric models Empirical nonparametric models Analogical models Theoretical models Heuristics
There are no models that are based primarily on heuristics, and so in this case we discuss briefly how heuristics are used in conjunction with other software cost estimation models. For each of the other methods, techniques for calibrating models developed by that method are discussed, as a model must be calibrated for the environment in which it is applied if it is to be accurate (Cuelenaere ef al., 1987; Jeffery and Low, 1990). The approach to estimating the uncertainty or range of possible values associated with an estimate is also reviewed for each method. The most common estimation models are empirical parametric models. Where effort is predicted based on one or more simple measures, these models have been extended, in some cases, by the use of cost drivers. The advantages and disadvantages of cost drivers are discussed in this context. In contrast to the majority of models presented, which predict effort, a couple of models have been developed for predicting the elapsed time for a project. These are presented briefly in the subsection on empirical parametric models.
88
FIONA WALKERDEN AND ROSS JEFFERY
In a final subsection we discuss the different methods and models presented from the perspective of the selection framework in Section 3.
5.1
Empirical Parametric Models
5.1.1 Effort Models The simplest form of an empirical parametric model of effort is a function that relates the effort to develop a system or program to a size measure. In this context, a size measure is a count of some feature of a product of the development process, for example, a count of the number of lines of code in a program. Effort is often measured in person-days or person-months. The models are developed by fitting the function t o a data set of size and effort value pairs, using regression techniques. Models with linear and exponential relationships between effort and the size measure are most commonly explored. The model of Albrecht and Gaffney (1983) is based on linear relationship between effort and function points. COCOMO (Boehm,l981) is based on an exponential relationship between effort and lines of code. Other functional forms, such as quadratic, have also been explored by some researchers (e.g., van der Poel and Schach, 1983; Banker et al., 1994). Viewed glibly, the development of an empirical parametric model is an exercise in curve fitting. This view emphasizes some of the pitfalls inherent in the development of these models. Courtney and Gustafson (1993) sound a warning to software researchers who set out t o discover empirical relationships by trying different combinations of measures and functional forms before choosing to report the one with the highest correlation. The chances of discovering a “good” model by this approach are high with small data sets. There are also some less conspicuous pitfalls in developing empirical parametric models. Once a model is proposed, the number of observations in the data set used to fit the model must be large enough to justify the number of independent parameters that appear in the model. The model could also be invalidated if the input parameters are not mutually independent, so assumptions of their independence need to be tested. The statistical techniques that are used to derive these models also assume that the data set has certain properties and that it consists of samples from the same underlying population. This last assumption may be difficult to justify where observations are taken from a number of different organizations, because the ways in which systems are developed and the ways in which measures such as effort are collected may vary greatly.
SOFTWARE COST ESTIMATION
89
In spite of the pitfalls, the method for developing empirical parametric models does not deserve to be disparaged, as it is arguably the simplest and most elucidating way to investigate empirical relationships. Practical guidelines for software developers seeking to develop predictive models by this method are to keep the model as simple as possible, collect data samples carefully and collect as many samples as possible.
Models Based on Lines of Code. A great variety of effort models have been explored. One of the best-known effort models is of the form EFFORT
=
a LOC
’,
where EFFORT is the labor to develop a system measured in personmonths and LOC is the number of lines of code to be developed, measured in thousands of lines of code. This model has been investigated by Walston and Felix (1977), Bailey and Basili (1981), and Boehm (1981) as the basic version of the COCOMO model. The value of the exponent, b, indicates either economies or diseconomies of scale as the number of lines of code to produced increases. Walston and Felix report a value less than 1, which indicates economies of scale, whereas the other researchers report values greater than 1, which indicates the opposite. Kitchenham (1992) examines 10 data sets, including those of Bailey and of Basili and Boehm, and determines that there was no statistical support for supposing that value of the exponent is significantly different from 1. In this case, the possibility that relationship is linear rather than exponential cannot be rejected. However, the model does not allow for economies and diseconomies to exist within one data set. To investigate the possibility that this is confounding the conclusions, Banker et al. (1994) use a model of the form EFFORT
=
a
+ b LOC + c LOC ’.
In examining 11 data sets they found that the value of the coefficient of the quadratic term, c, was significantly different from zero in six of these. These six include the data sets of Bailey and Basili and of Boehm. This supports the view that some data sets do display nonlinearity. The quadratic form of the model should only be considered valid over the range of LOC values, which appear in the data set to which the model has been fitted. Banker et al. report a negative value for the quadratic coefficient in four of the six significant cases. This is disquieting because in cases where the quadratic coefficient is negative, the effort will eventually become negative. It is more reasonable to expect effort to be modeled by
90
FIONA WALKERDEN AND ROSS JEFFERY
a monotonic, increasing function of lines of code, regardless of whether economies or diseconomies of scale exist. Models based on lines of code are successful in explaining the variation in effort, whether the functional form is linear or nonlinear. Conte et al. (1986) give an example of a linear model with a correlation coefficient, R2, of 82% and mean absolute relative error of 37%. Miyazaki and Mori (1985) give an example of a calibrated COCOMO model with a low mean absolute relative error of 20%. The difficulty with using models based on lines of code predictively is that lines of code cannot be measured until the system is complete. We discuss the implications of this later when we consider the uncertainty associated with an estimate.
Models Based on Function Points and Other Compound Measures. The function point measure was developed by Albrecht
(1979) as an alternative to lines of code for measuring software size. A number of researchers (e.g., Albrecht and Gaffney, 1983; Kemerer, 1987; Matson et al., 1994) investigate models of the form EFFORT = a
+ b FP,
where EFFORT is the labor in person-months to develop a system, and FP is the function points value for a system. The function point measure has been criticized for combining counts in a way which has no theoretical basis or reasonable interpretation (Kitchenham er al., 1995). A function point value for a system is derived by summing and weighting counts of five features of the system: inputs, outputs, queries, logical files, and interfaces to other systems. To add these counts together, there must be models that can convert the counts to a common unit. Consider measuring the length of a building. If part of the building’s length has been measured in feet and part in meters, the values measured cannot be added directly to determine the total length. The partial lengths must be converted to the same unit first. Unfortunately, in the case of function points there are no models to guide us in converting from a count of queries to a count of files, and so on. Other size measures that have been used to predict effort can be similarly criticized. Van der Poel and Schach (1983) add together directly counts of files, data flows, and processes to derive a size measure. COCOMO 2.0 (Boehm et al., 1995) introduces an Object-Points measure that sums weighted counts of screens, reports, and 3GL modules in a system. Different weights are applied to scale the counts according to the object type and an ordinal measure of complexity, one of simple, medium, or difficult. Numeric values, such as 1,2, and 3, can be associated with each of simple, medium, and difficult, respectively. However, summing of ordinal scale
SOFTWARE COST ESTIMATION
91
measures is invalid within the framework of measurement theory, as it can lead to incorrect conclusions (Fenton, 1991). For example, summing ordinal values to calculate the arithmetic mean can result in contradictions if the mappings to numeric values are changed, even though ordering is preserved. Although it is difficult to construct a compound measure that is above criticism, the practical motivation for such a measure is that it allows systems to be compared via a single number, and models for estimating effort need include only a single independent variable. An alternative is to treat each component of the compound measure as a separate independent variable in the model. Where there are more than a couple of components, this increases the difficulty of the statistical analysis, although arguably there are tools readily available to facilitate this task. Matson et al. (1994) demonstrate that it is possible to develop a more accurate model by treating the function point components separately. It may not be necessary to resort to more complicated models, however. Jeffery and Stathis (1996) compare an effort model based on logical internal files alone with an effort model based on total unadjusted function points. For their data set, the simpler model was more successful in explaining the variation in effort than the function point model.
Models Based on Other Size Measures. DeMarco (1982) proposes a number of measures that could be used to predict effort for particular system development activities or total effort. The measures based on the system specification are called “Function Bang” and “Data Bang,” with the intention that productivity is measured as “bang per buck.” These measures are designed to be calculated from system data flow diagrams and entity relationship diagrams, respectively. One disadvantage of these measures is that they are not simple counts. They rely on tables of complexity weights to modify the counts before a sum is calculated. In this respect they share some of the disadvantages of compound measures such as function points. We are not aware of any published results validating the use of these measures for predicting effort. Basili and Panlilio-Yap (1985) investigate an effort model based on the number of pages of documentation for a system, EFFORT = a
+ b DOCPAGES,
where pages are counted for written material that describes the system, excluding source code. This model is successful in explaining more than 85% of the variation in total effort for their data set of 23 projects. Mukhopadhyay and Kekre (1992) investigate models to estimate effort based on measures of system features, specific to the system’s application domain. This is in contrast with typical measures, which are based on
92
FIONA WALKERDEN AND ROSS JEFFERY
features from the domain of system development, such as lines of code. In their case the domain is manufacturing process control, and they count application features, such as whether a manufacturing process needs to communicate with upstream or downstream processes and what its position and motion control capabilities are. Based on a data set of 34 projects, they are able to demonstrate that effort estimates based on their measures are similarly accurate to estimates made ex post from actual measurements of lines of code. The technique of looking for suitable measures outside of the domain of system development is particularly interesting, as it is one which software researchers are likely to overlook in the quest for good predictors of effort. Brownlow (1994) investigates an effort model that can be applied to a system developed using object-oriented analysis and design. It is based on the number of objects and the number of services in the system. The form of the model investigated is EFFORT
= a
+ b OBJECTS + c SERVICES.
The data set used in this investigation contained only 12 projects, and actual effort estimates for these projects were not available. Instead, the model was developed based on effort predicted by an alternative model. These factors limit the conclusions about model validity that can be drawn from this investigation. Lo et al. (1995) investigate models for predicting development effort for graphical user interfaces (GUIs). The models are based on counts of GUI features: action widgets and data widgets. An action widget is used to initiate commands or functions, for example, a “SAVE” button. Data widgets display or accept data, for example, a list box or text entry field. One of the models investigated is of the form EFFORT = a
+ b ACTION WIDGETS + c DATA WIDGETS,
where effort is the number of person-hours to code and unit test a window containing a known number of action and data widgets. This model was able to explain approximately 75% of the variation in effort within a data set for 35 windows.
Models That lnclude Staffing Level. One of the arguments of those who support the existence of diseconomies of scale in system development is that as the system size increases, so does the development team size. As team size increases, the number of communication paths between team members increases rapidly, leading to proportionately more time being spent in communicating development issues and decisions. Jeffery (1 987b)
SOFTWARE COST ESTIMATION
93
explores the relationship between productivity, lines of code, and maximum staffing level. The model investigated is of the form PRODUCTIVITY
=
a LOC MAX STAFF -'.
Productivity is equivalent to lines of code divided by development effort measured in person-months. This model is able to explain between 60 and 80% of the variability in productivity for the data sets investigated, and it supports the view that productivity declines as development team size increases. Conte et al. (1986) describe the COPMO model, which also models the relationships among total effort, size, and staffing. The COPMO model is based on the assumption that the effort to develop a system of a given size may be modeled by the effort to develop the system, were people to work independently, plus the effort required to coordinate the development process within an interacting team. The derived model is of the form EFFORT
=
a
+ b LOC + c STAFF ',
where effort is measured in person-months, LOC is lines of code, and STAFF is the average staffing level. The average staffing level is equivalent to the total effort divided by the project duration measured in months. This model has been fitted successfully to subsets of the COCOMO data set. We are not aware of any direct validations of the assumptions on which the model is based. For a large system, it is arguably difficult to justify the assumption that there is any reasonable measure of the effort to develop the system without team interactions.
5.1.2 Elapsed Time Models Project managers, customers, and system developers are usually keen to estimate how long it will take to develop a system from start to finish, or whether it can be completed by a particular date. There may be a desire to estimate the minimum time in which the system can be completed. The duration of system development is clearly related to the number of people on the development team, but Brooks (1975) has made US mindful that people and time cannot be simply interchanged. Given also some of the evidence for economies and diseconomies of scale, is there an optimum staffing profile for a project that maximizes the productivity? Can more people be assigned to work on the same project to reduce the duration, at the expense of reduced productivity? At what point does this strategy cease to be feasible? These are some of the motivations for researchers who have sought to explore the relationship between system development effort and duration.
94
FIONA WALKERDEN AND ROSS JEFFERY
Putnam (1978) proposes a model relating the time to deliver a system to the total life time effort and size of the system: S = Ck K’” td4/3, where S is the size in lines of code, K is the total life-cycle effort, and td is the time at which the system is delivered. The total life-cycle effort extends of the entire lifetime of a system. The effort to deliver the system is the fraction of this expended up until the time of delivery. Ckis the technology factor, the value of which is deduced from characteristics of the project. The Putnam model assumes that staff loading over the life time of the system follows a Rayleigh curve. This assumption was based on an earlier empirical study (Norden, 1970). The parameter fd is actually the time to reach the peak of the Rayleigh curve. The model also relies on Putnam’s own empirical observations, from which he deduced that K / t d 3 takes on only certain discrete values. Parr (1980) proposed a variation on Putnam’s approach, replacing the Rayleigh distribution with a similar one that is nonzero at the origin. The change was proposed to model projects where some staff are already working at the start of the project. Putnam’s model excludes system analysis and specification. COCOMO (Boehm, 1981) includes a model that relates development time to development effort:
T,,,, = a EFFORT ”, where T,,,, is the nominal development time in months and effort is measured in person-months. Based on empirical observation, Boehm speculates that there is an “impossible region” for development, which represents infeasible staffing strategies. The impossible region is the region is defined approximately by Tdzv
0.75
Tnom,
where Tdevis the compressed development time. Where the actual development time is desired to be 75% of the nominal development time, the intermediate COCOMO model recommends a cost multiplier value of 1.23, or a 23% increase in actual effort over the nominal value due to the schedule compression. Attempts to validate these models have met with mixed results. Basili and Beane (1981) attempt to fit the Putnam and Parr models to data for seven projects with average staff sizes of fewer than 10 people and durations of less than 18 months. They find that the models do not fit the data well. The projects in Putnam’s data set were larger than these. Basili and Beane speculate that there may be no natural shape for the effort distribution of
SOFTWARE COST ESTIMATION
95
projects of the scale in their data set, which is in conflict with the theory on which the Putnam and Parr models are based. Kitchenham and Taylor (1984) carry out a detailed study of both models. Neither models produce accurate predictions for their data sets. However, they point out that the models were used without calibration, and that the projects in their data set were small, rather than the medium or large projects to which Putnam suggests his model applies. Kitchenham and Taylor comment on problems they encounter in attempting to validate Putnam’s model. It is difficult to select the appropriate technology constant value for a project and to collect effort data that satisfy the model’s assumptions. The Putnam and COCOMO models imply that reducing the duration of a project below what would be expected for a project of a given size will increase the total effort expended. A study by Jeffery (1987a) explores the relationship between actual and expected values of effort and elapsed time in 47 projects. This study reveals a complex relationship between these values. Although some projects with shorter-than-expected duration do show the anticipated increase in effort, all other combinations appear possible: shorter duration associated with lower effort; longer duration associated with greater effort; and longer duration associated less effort. Kitchenham (1992) confirms the existence of this relationship in a separate data set. It is possible to speculate about circumstances that may lead to any one of these four results. For example, a project that takes longer to complete and more effort than expected for its size may be poorly organized; it may have an inexperienced team; or the work may be expanding to fill the available time. A project that takes longer to complete but less effort than expected for its size may have improved productivity by having a smaller team than usual, at the expense of taking longer to complete the project. A project that takes less time and less effort to complete than expected for its size may have unusually high productivity, or may be working overtime, which is not accounted for in the reported effort, or both. Clearly there are factors that can be appealed to to explain variations in project productivity that are regularly observed. Jeffery (1987b) reexamines his data set and finds that the relationship between project size and staffing level accounts for more than 70% of the variation in productivity. Many empirical parametric models use cost drivers to take account of factors that influence productivity.
5.1.3 Cost Drivers The models discussed so far do not take into account a variety of factors, such as the experience of the system developers, which we expect to influ-
96
FIONA WALKERDEN AND ROSS JEFFERY
ence productivity. These productivity factors are also known as cost drivers. Cost drivers are usually assessed on an ordinal scale, for example, low, medium, or high experience, or a nominal scale, for example, a classification of the system application domain. Fifteen cost drivers are used in the Intermediate COCOMO (Boehm, 1981). The influence of a cost driver on effort is modeled by multipliers, which either increase or decrease the nominally expected effort. Once the value of a cost driver has been assessed, the value of the multiplier is determined from a table that maps cost driver values to multipliers. For example, a high level of application experience is expected to reduce effort 9%. The function point measure also incorporates the influence of cost drivers. This allows the function point value for a system to be used as a basis for productivity comparisons between projects. Unadjusted function points are multiplied by the technical complexity factor, which is calculated from assessments of the degree of influence of 14 separate productivity factors. The influence of a productivity factor need not be modeled as a simple effort multiplier. COCOMO 2.0 (Boehm et al., 1995) models the influence of five factors as exponential multipliers of the lines of code value, in its effort equation. One difficulty with using a large number of cost drivers in estimating effort is that the cost drivers may not be independent. Kitchenham (1990) demonstrates that there is a relationship between two of the COCOMO cost drivers based on projects in the COCOMO data set, and points out that relationships between input parameters will make a model unstable. One way to overcome the need to consider large number of cost drivers is to consider only those that are significant in a particular organization or environment. Subramanian and Breslawski (1993) demonstrate techniques that can be used to choose a subset of cost drivers relevant to a specific organization or environment from a larger set, such as the set of cost drivers used by Intermediate COCOMO. These techniques are demonstrated on the COCOMO database and reduce the number of cost drivers considered from 15 to 4 without sacrificing accuracy. Kitchenham (1992) draws a similar conclusion, finding that from a set of 21 cost drivers, 7 accounted for more than 75% of the variation in productivity. There is further evidence that simpler models may be satisfactory. Jeffery and Stathis (1996) find that in the case of function points, the technical complexity factor does not significantly improve the accuracy of the model for their data set, and confirm this finding in data sets used by other researchers. Simplifying models is attractive because it makes models easier to apply and easier to calibrate.
SOFTWARE COST ESTIMATION
97
5.1.4 Calibration When empirical models are applied outside of the organization or environment on whose data they are based, the predictions made by the model are likely to be inaccurate, unless the model is recalibrated using local data. Even more generic models such as COCOMO fail to make accurate predictions without calibration. The need for calibration has been confirmed in studies by Kemerer (1987), Kitchenham and Taylor (1984), and Jeffery and Low (1990). To calibrate an empirical parametric model one must go back to the basic functional form and fit the function to the local data set. The number of data points needed depends on the number of independent parameters in the model and their possible ranges. This constrains the complexity of the model that can be calibrated accurately. Boehm (1981) describes procedures for calibrating the COCOMO model. Miyazaki and Mori (1985) give an example of how to apply these procedures and demonstrate that the accuracy of the predictions is significantly improved by calibration. However, models that include a large number of cost drivers are difficult to calibrate. The immediate difficulty is that the data set required for calibration may be much larger than is typically available within a single organization. This difficulty can be alleviated by reducing the set of cost drivers used by the model to those that vary significantly within the environment to which the model is being calibrated. When attempting to calibrate a model with cost drivers it is crucial that the definitions of the cost drivers be applied consistently. Cuelenaere et af. (1987) observe that the time at which a user first attempts t o calibrate a model is likely to be the user’s first experience with the model, for example, when the model is first being introduced into an organization. They have developed an expert system to help users apply cost driver definitions correctly and consistently when calibrating the PRICE SP cost model. A further issue with cost drivers is how to calibrate the values of the effort multipliers to which cost driver values correspond. For example, if a high level of application experience decreases the expected effort by 9% in the uncalibrated model, by how much should it decrease the expected effort in the calibrated model? Miyazaki and Mori (1985) propose a method by which COCOMO effort multipliers can be adjusted to suit their environment and report that this increased both the precision and stability of their calibrated model.
5.1.5 Uncertainty Estimates A single value does not offer any clues about how certain or uncertain an estimate is. When estimating via a model, additional uncertainty is
98
FIONA WALKERDEN A N D ROSS JEFFERY
introduced because any model will be unable to explain all of the variation in the dependent variable and also because the values that we supply as input parameters are themselves uncertain. Conte et al. (1986) suggest that a model is acceptable if its mean absolute relative error is less than or equal to 25%, and if at least 75% of the predicted values fall within 25% of the corresponding actual value. These measures of acceptability are widely used when the accuracy of models is assessed. Even where a model satisfies these criteria, 25% percent of predicted values may differ from actual values by more than 25%. The uncertainty of model estimates cannot be neglected. Input parameter values are uncertain when we are unable to measure their values directly. For example, when using a model such as Intermediate COCOMO to estimate effort at the start of a project, the number of lines of code must be estimated and judgment must be used to decide on values for the cost drivers. The uncertainty associated with input parameter values can be eliminated if it is possible to measure them directly. However, estimators may not be able to develop or calibrate a model based on direct measures, because suitable historical data are not available. Effort models based on lines of code are frequently developed, although lines of code must be estimated indirectly to make an early life-cycle prediction of effort. Arguably, measurements of total effort expended on a project and total lines of code produced are two of the simplest measures to collect, which may account in part for the popularity of such models. The reduction in uncertainty gained by using a model that is based on direct measures will be offset if the model itself is less accurate. For example, Kemerer (1987) finds that effort models based on lines of code explain more of the variation in effort than models based on function points. The success of models based of lines of code also explains their popularity. Whether an effort estimate based on function points is more certain than one based on lines of code must be assessed by comparing estimates based on function point measurements and ex ante estimates of lines of code or ex ante estimates of both. Low and Jeffery (1990) demonstrate that function points can be estimated more consistently than lines of code, which implies that function point estimates will introduce less uncertainty into an estimate of effort than lines of code estimates. The uncertainty of the estimate still depends on both the uncertainty of the model on which it is based and the uncertainty in the model’s inputs. A number of techniques can be used to estimate the range of possible values for the estimate. The standard deviation of the error in the estimate has been used to assess the uncertainty in an empirical model and estimate
SOFTWARE COST ESTIMATION
99
a range of possible values. For example, COCOMO 2.0 (Boehm et af., 1995) suggests using a range of approximately one standard deviation around the most likely value. The standard deviation, in this context, is a measure of the variation within the historical data set. Unfortunately, this may not be a true measure of the uncertainty of a new estimate. The model on which the estimate is based is incomplete. It only takes into account some of the factors that affect the actual value. Consider making an estimate for a project affected by a factor that is constant for the historical projects on which the model is based. The model will not be able to predict the influence of this factor, and there is no guarantee that the actual value for the new project will fall within the range of values predicted. For example, an effort estimate for a new project that has a client-server architecture may be based on effort to develop similar systems that have a host-based architecture. If the change in system architectures is not identified as a cost driver or risk factor, its influence will not be taken into account by the estimate. If the change has a significant effect on productivity, the estimate may be quite unreliable. To use standard deviation as a measure of uncertainty, the data used to derive the model must also satisfy some statistical assumptions. For example, it is assumed that the error values in linear regression models are independent, normally distributed random variables with zero mean and identical standard deviations. However, the standard deviation may increase with the size of the system in some data sets (Matson et af., 1994). In summary, using standard deviation as a measure of the uncertainty in a model’s estimates may be misleading. The uncertainty associated with an input parameter can be incorporated into a model by estimating the lowest, highest, and most likely value of the input parameter. Assuming a beta distribution, the expected value and standard deviation of the input parameter can be estimated as Expected Value
=
( L + 4M
+ H)/6
and Standard Deviation
=
(H
-
L)/6,
where L is the lowest, M the most likely, and H the highest value for the input parameter. This technique was employed by Putnam (1978) to obtain an estimate of system size, by adding the expected values of size for each system component, and estimating the overall standard deviation as the square root of the sum of the squares of the standard deviations for each component. The technique can be used to estimate the range of an input value
100
FIONA WALKERDEN AND ROSS JEFFERY
to a model. The model can then be used to calculate the range of the estimate based on the range of the input parameter. Fuzzy numbers is an alternative approach to assessing the uncertainty of input parameter values. Fei and Liu (1992) propose f-COCOMO, the fuzzy constructive cost model. f-COCOMO represents cost driver multipliers as fuzzy intervals rather than a single value, to model the uncertainty associated with the influence of cost drivers on effort. By using fuzzy numbers for cost driver multipliers, Fei and Liu estimate upper and lower bounds for effort estimates. When a model has many input parameters, each with a range of possible values, the range of estimates generated by the model increases. Although such a wide variation in input values would not occur in practice, Conte et al. (1986) report that a variation in effort of up to 800% is possible in Intermediate COCOMO when the range from highest to lowest values for each cost driver is combined. The range of possible values for an estimate increases further when the uncertainty in input values is combined with the uncertainty associated with the model. Ideally, we would also like to be able to assess how likely it is that the actual value falls within an estimated range. This is a substantially more difficult problem than estimating a possible range of values. Estimating the likelihood of a particular value based on the standard deviation of the error in the estimate is questionable for the same reasons that the standard deviation is not likely to be a true measure of the uncertainty in a model. In particular, the assumption about the error values being normally distributed is critical to the validity of this approach. Estimating the uncertainty associated with an estimate is difficult. Calculating an estimate for a range of input values is worthwhile, however, at least to investigate the sensitivity of the model to its inputs.
5.2
Empirical Nonparametric Models
5.2.7 Effort Models Briand et al. (1992) describe the optimized set reduction (OSR) technique, which is a pattern recognition approach for analyzing data sets. OSR is based in part on decision trees. OSR has been applied to a combination of the COCOMO and Kemerer (1987) data sets to predict productivity. Each project within the data set is represented by its COCOMO cost driver and effort values. The optimized set reduction technique selects a subset of projects on which to base a
SOFTWARE COST ESTIMATION
101
prediction of productivity for the new project, where productivity is effort in person-months divided by lines of code. The projects selected for the optimal subset all share some common cost driver values with the new project, for example, all projects with nominal complexity, low reliability requirements and high database size. OSR selects these cost driver values so that the distribution of productivity values in the subset of selected projects is measured as good, according to an established statistical criterion. An example of a good distribution is one where there is a clear peak about the average value and a statistically significant number of data points. A probability distribution for productivity is derived from the frequency distribution of the selected projects over a range of productivity intervals. Productivity for the new project is predicted by calculating its expected value based on the derived probability distribution. Briand et al. (1992) compare the accuracy of the OSR technique to a COCOMO model calibrated for the combined COCOMO and Kemerer (1987) data sets and a stepwise regression model. The OSR technique has a lower mean absolute relative error than both the two parametric models, with the COCOMO model performing least favorably. One advantage of OSR is that it can be applied with incomplete input data. It is possible to make an estimate for a project where only a subset of the cost driver values are known. Another advantage is that nominal or ordinal cost driver values can be used as inputs without being mapped to numeric multiplier values. Srinivasan and Fisher (1995) describe two further nonparametric methods for generating effort models. The first method uses a learning algorithm to derive a decision tree. The second method uses back-propagation to train an artificial neural network. These methods were also tested on the combined COCOMO and Kemerer (1987) data sets. COCOMO cost driver values and lines of code estimates are inputs to the effort models. The effort estimates from the artificial neural network had a lower mean absolute relative error than the decision tree. Differences in the sampling techniques mean that the results presented by Srinivasan and Fischer (1995) are not directly comparable with those of Briand et al. (1992), although the same data sets are used. It appears likely that the accuracy of both the artificial neural network and the decision tree is comparable with that of OSR and the stepwise regression model. Srinivasan and Fisher (1995) indicate that the computational cost of training the artificial neural network is high in comparison to the cost of deriving the decision tree. Briand et al. (1992) do not comment on the
102
FIONA WALKERDEN AND ROSS JEFFERY
computational cost of the OSR technique in comparison t o other approaches they compared it to, such as stepwise regression.
5.2.2 Calibration To be applied confidently, each of the techniques just described requires a large number of data points because of the large number of independent variables and value ranges covered by the models. Both sets of authors comment on the small size of the COCOMO data set (63 projects) for applying their techniques. Both sets of authors also comment on the desirability of all projects in the data set coming from the same environment. These comments are not surprising, because the same considerations apply to the calibration of the COCOMO model itself. However, although the COCOMO data set may be small, it is significantly larger than many organizations could hope to collect. Where a large enough data set is available within a single organization, it may also be difficult to argue that all projects come from the same environment. Decision tree, artificial neural network, and OSR techniques can still be applied where the number of independent variables is reduced to complement the size of the available data set, for example, lines of code as the single independent variable. However, it is unclear whether these techniques are superior to simple regression techniques under these circumstances.
5.2.3 Uncertainty Estimates The uncertainty of the model can be assessed by the mean absolute relative error. Briand et al. (1992) suggest that the uncertainty of an OSR estimate can be assessed by the information theory entropy of the derived probability distribution. The value of this entropy measure is what is minimized when selecting the optimal subset. They observe that this value correlates well with the mean absolute relative error. A range estimate can also be generated by calculating estimates based on a range of possible input data values. When a range of input values is used with a parametric model, the range of estimates produced can be understood by inspection of its functional form. When a range of input values is used with a nonparametric model, the range of estimates produced may be less easy to understand. Nonparametric models based on a decision tree or OSR rely on partitioning the data set to generate an estimate. Different input values will, in general, select different subsets. There is a risk that any selected subset is nonrepresentative, in the sense that it is supposed to contain similar projects, but the actual effort values may vary widely as a result of factors not
SOFTWARE COST ESTIMATION
103
considered by the model. This may cause estimates based on a range of input values to vary discontinuously.
5.3 Analogical Models 5.3.1 Effort Models ESTOR is a case-based reasoning model developed by Mukhopadhyay et af. (1992) to estimate software development effort. Case-based reasoning is a form of analogical reasoning that employs five basic processes: 0 0
0 0
Construction of a representation of the target problem Retrieval of a suitable case to act as source analog Transfer of the solution from the source case to target Mapping the differences between source and target cases Adjusting the initial solution to take account of these differences
In ESTOR the cases are software projects, and each is represented by the values of a set of measures. The measures used by ESTOR are function point components and Intermediate COCOMO model inputs. ESTOR retrieves one case to act as a source analog based on the values of the function point components of the project for which the estimate is sought. A vector distance calculation is used to find the nearest neighbor. The initial solution or effort estimate for the project is the effort value for the analogous project. The differences between the analog and new project are determined by comparing the values of their measures. The effort value for the analog is adjusted to take account of these differences by applying a set of rules. The rules used by ESTOR are derived from verbal protocols of an expert whose estimates were accurate for the data set used. The rules adjust the effort value by a multiplier if particular preconditions on the target and source project values are met. The data set used to develop ESTOR is a subset of 10 projects from the Kemerer (1987) data set. ESTOR was tested on all 15 projects of this data set, with a reported mean absolute relative error of 53%. Shepperd et al. (1996) describe the tool ANGEL, which also supports estimation by analogy. ANGEL is based on a generalization of the approach of Atkinson and Shepperd (1994). Atkinson and Shepperd (1994) describe a model for estimating by analogy. Projects are represented by function point components. Analogous projects are neighbors of the new project, identified by calculating the vector distance from the new project to other projects in the data set. Effort
104
FIONA WALKERDEN AND ROSS JEFFERY
for the new project is predicted from a weighted mean of the effort values of its neighbors. In ANGEL, the user can specify the measures on which the search for analogous projects is based. ANGEL can also automatically determine an optimal subset of measures for a particular data set. ANGEL can be requested to search for one, two, or three analogous projects and calculates an unweighted mean of their effort values to predict effort for the new project. This approach is very similar to that of ESTOR. Both ANGEL and ESTOR represent projects by values of readily available measures, and both use a vector distance calculation to search for analogs. ESTOR uses only one analog on which to base its estimate, whereas ANGEL may retrieve and use the effort values from several analogs. The main difference is that ESTOR adjusts the effort value of the analogous case by applying rules, whereas ANGEL will either use the effort value directly where one analog only is retrieved, or calculate a mean of the effort values for the analogs. ANGEL performed as well as or better than linear and stepwise regression models for effort. The regression models were based on the measures in the data set that displayed the highest correlations with effort. O n the Kemerer (1987) data set, the reported mean absolute relative error for ANGEL is 62%, which compares with more than 100%for the regression models and 53% for ESTOR. Although ESTOR appears to perform better than ANGEL on this data set, the adjustment rules for ESTOR were developed based on 10 of the 15 projects in the set, and these rules may not be as successful when applied to projects from different data sets. An advantage of these approaches to estimation by analogy is that they can succeed where no statistically significant relationships can be found in the data. Shepperd et al. (1996) give an example of a small data set of eight projects, for which ANGEL gives a mean absolute relative error of 60%, which compares to 226% for a linear regression model based on the same data set.
5.3.2 Calibration The data set that is searched for analogous cases must contain cases that are representative of the case for which an estimate is to be made, if estimation by analogy is to be successful. In practice, this means data for systems developed in the same environment is needed, and the differences between the source case and target case should be as few as possible. To support estimates for a wide variety of system development projects, many data points are needed, as the input space of measures and value ranges is very large. However, analogy-based estimation can be applied
SOFTWARE COST ESTIMATION
105
successfully with small data sets, provided these are from the same environment. In fact, Shepperd, Schofield, and Kitchenham find that partitioning data sets according to attributes of the environment improves the accuracy of ANGEL’S estimates.
5.3.3 Uncertainty Estimates The uncertainty of the model can be assessed by the mean absolute relative error and a range estimate can be generated by calculating estimates based on a range of inputs. However, as for nonparametric empirical models, estimates based on a range of input values is likely to show discontinuous variations, because of different analogs being selected. It is also possible that the source case(s) selected may share the same values as the target, but not be representative of the target case because of factors that are not considered in the representation of the cases. This may account for the improved accuracy of estimates based on partitioned data sets.
5.4 Theoretical Models 5.4.1 An Effort Model One example of a theoretical model is presented in this section. AbdelHamid and Madnick (1986,1991,1993) have developed a model of software development project management. Dynamic feedback relationships among staff management, software production, planning, and control are modeled via a simulation language. Simulations of project management scenarios can be run to investigate the effects of management policies and decisions. The model works from an initial estimate for overall effort and then explores how the actual effort is influenced by the model’s assumptions about interactions and feedback between project and decisions. One scenario which Abdel-Hamid and Madnick have explored considers the circumstances in which more people can be added to a late project to get it finished sooner. Brooks’ law (Brooks, 1975) states that “Adding manpower to a late software project makes it later.” The model of AbdelHamid and Madnick (1991) indicates that adding manpower makes a project more costly, but not necessarily later. This is consistent with the Jeffery (1987a), whose data set includes examples of projects that took more effort than expected to complete, but not more time. Provided people can be added under circumstances that result in a net increase in the project’s average productivity, the project can still finish earlier. Key factors are
106
FIONA WALKERDEN A N D ROSS JEFFERY
the point at which extra people join the project and the experience of these people. Abdel-Hamid and Madnick (1991,1993) are able to demonstrate via their model how both underestimates and overestimates of project effort can lead to lower average productivity and increased overall effort. As estimates for new projects are based on past projects, they suggest that their model can be used t o explore what the minimum effort for a completed project would have been, if it had been estimated correctly at the outset. Future estimates can then be based on the corrected effort for the project.
5.4.2 Calibration When this model is applied in a new environment, the model’s assumptions should be examined to see whether they are valid. For example, the model relies on assumptions about management policies that may be inaccurate in a new environment and hence invalidate the existing model. The model also relies on a number of parameters that have to be determined specifically for each environment in which it is applied, for example, distribution of effort by phase.
5.4.3 Uncertainty Estimates The model is well suited to generating range estimates, based on initial conditions, as it is intended to be used in an exploratory fashion. Since the model has been used primarily for exploration rather than prediction, the published material includes only a small number of example projects from similar environments. This makes it difficult to assess how accurate the model would be for projects from a wider range of environments.
5.5 Heuristics Cost drivers are an example of how heuristics are incorporated into empirical parametric effort estimation models. Although the influence of cost drivers can be explored statistically, the initial assessment of what factors are likely to influence cost is an example of heuristic judgment. The effort multiplier that corresponds to a cost driver value may also be arrived at heuristically, if insufficient data is available to quantify the effect of the cost driver using statistical techniques. The rules used in the case-based reasoning model of ESTOR are also examples of heuristics. The rules adjust the effort for an analogous project based on an expert’s understanding of the how certain circumstances influence effort.
SOFTWARE COST ESTIMATION
107
Arguably, heuristics can be expected to be used in domains such as software cost estimation, where few theoretical models of causal relationships exist and where very many factors interact and influence outcomes in ways that are difficult to model empirically.
5.6 Discussion 5.6.1 Measures Most of the models reviewed have been developed to predict the measure of total effort for a software project or system development. Interest in these models is expected, as total effort is clearly an important measure in the domain of software cost estimation. However, when the models and methods described in Section 5 are considered from the perspective of the elements of the selection framework that influence the selection of measure and method, it is apparent that a focus on prediction of total effort is too narrow. Predictions of alternative, related measures may be more useful in some circumstances. However, models for predicting measures other than total effort, such as duration, are underrepresented in the literature.
5.6.2 Methods Of the methods described for developing models, empirical parametric methods are some of the easiest to apply. Minimal tool support is required. Linear regression lines can be calculated simply with the aid of a pocket calculator, and most of these or a spreadsheet will have basic statistical functions built in to speed up the process. Access to a statistical package will be desirable if the model to be fitted has more than one or two independent variables, however. Analogy-based estimation is also straightforward to apply, provided only a small data set needs to be searched for analogs, and the number of variables to consider is no more than half a dozen. Tool support will be needed once the numbers of cases and variables increase. Developing and applying models based on the examples of empirical parametric and theoretical methods described in this section will require specific tool support. The availability of tools suitable for developing and applying models, within the system development environment, will constrain the selection of method. Even with the tools available, setting up a model based on an empirical nonparametric method such as an artificial neural network involves more work than preparing a model based on a statistical regression. This is one example of how the time available to prepare an estimate will influence the selection of method.
108
FIONA WALKERDEN AND ROSS JEFFERY
Analogical methods have proved quite robust where little data is available, whereas empirical nonparametric methods need data sets that are substantially larger. If little data is available, the most appropriate choices of method are likely to be analogical or simple empirical parametric models with a single independent variable. How similar a new system development project is to historical projects also influences the selection of method. If the new project differs from all historical projects, in a way that is recognized, then ideally an estimate should take this difference into account. For example, a heuristic could be applied to adjust an estimate based on the historical projects. It may not be practical to add a cost driver to a model to take account of a factor that is specific to a particular project, such as the need for the new project to generate additional training documentation.
5.6.3 Purpose The purposes for which software cost estimates are required are varied. Some of the purposes for software cost estimates that are regarded as important by practitioners are planning, controlling projects, assessing feasibility, and preparing quotes or bids for presentation t o customers (Lederer and Prasad, 1993). An estimate of total effort for system development will be useful for all of these purposes, as will an estimate for total duration. However, for planning system development, estimates of effort and duration for groups of related system activities or system phases will also be useful. Once system development is under way, estimates or reestimates for activities will continue to be useful, as will estimates for total effort and total time to complete.
5.6.4 Viewpoint From the viewpoint of an organization’s management and from the viewpoint of a customer, the most interesting software cost estimation measures are total effort and total duration, and once development is under way, the totals to complete. Individual developers are less likely to be interested in total effort estimates. They may want to track their own productivity, however, to make effort estimates for their own activities. For example, in some organizations, developers are expected to sign up to meet a target duration for a particular activity. Estimates based on group productivity figures generally will not be satisfactory, because of the significant variations commonly found between individual developers (DeMarco and Lister, 1987).
SOFTWARE COST ESTIMATION
109
5.6.5 Time Estimates of total effort are clearly useful prior to or at the start of system development. However, this is the time relative to system development activities when there is the least information available on which to base a prediction. Models that estimate total effort based on lines of code are quite common. Unfortunately, an accurate estimate of the lines of code will most likely require detailed information about the system that is not available at this time. Models for estimating total effort that are based on measures available early in the system development life cycle are clearly desirable. Models based on function points offer some improvement over lines of code, as it appears that function points can be estimated more consistently from specification and design descriptions than lines of code (Low and Jeffery, 1990). However, the models that can be developed for a particular environment are constrained by the historical data that are available. As discussed in the earlier section on empirical parametric models, it may be relatively easy to obtain data pairs of total effort and lines of code. Considerably more experience and effort is involved in counting function points than lines of code, so data pairs of total effort and function points are likely to be harder to obtain. Function points and lines of code are strongly correlated, however, and so function points can be used to predict lines of code (e.g., Jeffery and Low, 1990). A model to predict function points from lines of code can be developed based on values for system components rather than entire projects, which reduces the amount of function point counting that must be done. As initial software cost estimates are made based on limited information, reestimating when additional information is available is desirable. Once system development is under way, the interest shifts from total effort to total effort needed to complete development. A reestimate of the total effort less the effort expended so far may not be a satisfactory measure of the total effort to complete, however. For this measure to be a satisfactory, the reestimate of total effort needs to take into account the actual progress that has been made so far, as well as the effort so far. For example, an initial estimate of the total effort to develop a system may be based on a rough, preliminary estimate of function points. A new estimate may be calculated from a reestimate of function points, made after a high-level design is complete. The model assumes the same average productivity for system development for both estimates, however. If the productivity of the development team is substantially different from that assumed by the model, the new total effort estimate will not incorporate
110
FIONA WALKERDEN A N D ROSS JEFFERY
this knowledge, and the estimate of total effort t o complete the development also will not. Issues such as this complicate the process of reestimation and indicate that measures that reflect actual progress will be important for accurate software cost estimation, once a project is under way. Estimation models need to incorporate these measures to estimate the total effort or time to complete successfully.
5.6.6 Environment The environment for system development contributes factors such as targets and constraints. When a project starts there is often a target for delivery date and constraints on how many staff can ultimately be assigned to work on the project and on the availability of these staff. Estimates of these are needed to plan system development or check whether it is feasible to deliver within the desired time. Total effort, duration, and staffing are closely related and interdependent, but there may be independent constraints on all three. This makes the problem of predicting any one or two of them complex. Existing parametric models, such as COCOMO schedule (Boehm, 1981) and Putnam (1978), have not proved widely successful in explaining the relationships among effort, duration, and staffing across a range of organizational settings. The dynamic model of Abdel-Hamid and Madnick (1993) appears able to explain interrelationships among staffing, duration, and overall cost in a qualitative way, but the model is not easy to apply, because it requires a specialized simulation tool. Software cost estimators could benefit from approaches that assist them in solving this type of multidimensional, interdependent, constrained problem. Most existing models predict a single dependent variable based on one or more independent variables, which is considerably simpler.
5.6.7 Experience Historical data are arguably the most important elements of an organization’s experience base. The availability of historical data is critical in model development, as the measures that can be predicted are dictated by the measures for that data values have already been collected. Models that predict total effort based on lines of code have been discussed already as an example of this. Experience in developing and applying measures and models must also be cultivated within organizations, if the benefits of collecting local data are to be realized. The simplest models to develop and apply are empirical parametric models, with few variables, and analogical models. Models that
SOFTWARE COST ESTIMATION
111
are more difficult to develop and apply may be valuable, however, for example, a model such as Abdel-Hamid and Madnick (1993), or models based on a large number of variables. Although these may take more effort to set up initially, the effort may be repaid if the experience is not lost and the models are reused.
6. Software Cost Estimation Practice Although much has been published on models, fewer researchers have studied how software costs are estimated in practice. What follows is an overview of several studies conducted within particular organizations and several cross-organizational surveys. This overview is concluded by a discussion of observations from these studies and reflection on some of the causes to which these observations may be attributed.
6.1
Cost Estimation at the National Security Agency
Masters (1985) describes the how software cost models, such as the COCOMO model (Boehm, 1981) and the model of Putnam (1978), have been used at the National Security Agency (NSA) in the United States. As part of its procurement process, the NSA has required contractors to submit reports estimating a collection of cost model parameters. These estimates are then used by the NSA as inputs to four selected models. Masters (1985) claims that some contractors frequently adjust their inputs to substantiate their bids. This practice is believed to be a factor in subsequent cost overruns and schedule slippage.
6.2
Cost Estimation within an Information Systems Department
Lederer er al. (1990) report on the cost estimating process followed by a manufacturing organization with an internal information systems department (ISD). Estimates and actual values have been compared for a number of projects, with actual effort in the range of a few days to 120 days. It is reported that 60% of estimates differ from actual values by 20% or more. The authors point out that these are small projects, so the consequences of inaccurate estimates are not likely to be as serious as for large projects. The organization follows a locally developed estimation process. An initial ballpark estimate is required by user and ISD management to determine whether a project can be cost justified. This estimate is often very inaccurate because the scope of the project is not known in detail, and in some cases ISD is requested to provide an estimate without preparation.
112
FIONA WALKERDEN AND ROSS JEFFERY
If user management chooses to proceed based on the initial estimate, a second estimate is prepared more carefully. The estimator identifies modules and their associated functions and files, although information about the required functionality is still limited. The estimator assesses the complexity level of each module subjectively. An existing matrix records analysis and programming effort for modules of different complexities. The estimator uses this matrix to estimate effort for each module, and adjusts the sum based on experience and intuition. The subjectivity and uncertainty about functionality in these estimates leaves room for politics to influence the outcome. Effort is reestimated at each project phase, but pressure is applied to fit new estimates to the previous ones. Estimates are arrived at through negotiation. It is arguable whether an estimate that has been negotiated deserves to be called an estimate at all. If an estimate is altered without altering the assumptions on which it is based, to indicate that a project target or constraint can be satisfied, it adds no value to the management of the project.
6.3 Cost Estimation at Jet Propulsion Laboratory Hihn and Habib-agahi (1991) report the results of a survey of cost estimation techniques at the Jet Propulsion Laboratory ( JPL). Survey participants were asked to estimate the size and effort to develop a piece of software, based on a design document describing the software. Size is measured in lines of code, and effort in person-days. The actual size and effort associated with this piece of software was known, as it had been implemented prior to the survey. The survey shows that majority of technical staff at JPL use informal analogy and high-level partitioning of requirements to estimate software costs. Both effort and size estimates were inaccurate, and the average estimates were significantly lower than the actual values. Estimates of effort were found to be more accurate than estimates of size. Only 16% of the size estimates fell within 20% of the actual size, whereas 33% of effort estimates fell within 20% of the actual effort. The subgroup of people with relevant application experience performed significantly better than the overall survey group in estimates of both size and effort. In this group, 27% of the size estimates fell within 20% of the actual size, and 46% of the effort estimates fell within 20% of the actual effort. The inaccuracy of the estimates is attributed in part to a lack of emphasis on careful cost analysis within the organization. This in turn was attributed to an environment in which more money would usually be forthcoming for work when necessary or where the budget or schedule constraints are set
SOFTWARE COST ESTIMATION
113
in advance and inconsistencies among requirements, cost, and schedule are expected to be resolved while work is in progress.
6.4 Cost Estimation at AT&T Bell Laboratories Taff et af. (1991) describe the process followed by AT&T Bell Laboratories in estimating development effort for releases of the 5ESS Switch. The projects that develop these releases are large, with 1000 to 2000 people working on these projects annually. Estimates for features to be included in a release are developed over a number of months. Estimates are based on high-level designs for components, a bottom-up approach, as it is believed that these will be better understood and will match the experience of the estimators more closely. Estimators compare the component designs informally with their own experience, and they may use personal rules to arrive at an estimate. Forms and checklists are used as tools to help ensure that work is not overlooked and that assumptions are documented. In some cases these estimates are compared with ones generated by a mathematical model developed in-house. The estimates are developed during “estimeetings,” or formal group estimating meetings, which are described in the previous section on estimation processes. Total actual costs of the releases are reported as having “tracked well” against the total estimated costs. However, data are not published to allow the accuracy of the per-feature estimates to be assessed. A number of qualitative benefits of their estimating process are reported in the study. For example, project teams accept the estimates as official and project management has greater confidence in the estimates.
6.5 Survey of Cost Estimation in the Netherlands Heemstra (1992) reports findings of a field study of software development projects in 598 Dutch organizations. Among its conclusions are that 80% of the projects overrun budgets and durations, and that the mean overruns are 50%. The field study also finds that 35% of the organizations make no estimates, in which case the planned budget and duration are presumably determined by factors external to the project. Estimation by analogy (61%)and expert judgment (26%) are the methods most commonly used by organizations that make estimates. Fourteen percent of these organizations report that they also use parametric models. The organizations surveyed may use more than one method of cost estimation. Of the organizations surveyed, 30% indicated that the estimates were constrained by cost or available resources.
114
FiONA WALKERDEN AND ROSS JEFFERY
Only 50% of the organizations participating in the survey record data from completed projects, which implies that where estimation methods such as analogy are being used to estimate costs, the analogy is made only informally.
6.6 Survey of Cost Estimation in the United States Lederer and Prasad (1993) report the results of a survey of 112 information systems managers and analysts from different organizations in the United States. The survey indicates that approximately 63% of projects significantly overrun their estimates, whereas 14% significantly underrun their estimates. When asked to rate how extensively they used a particular estimating method, survey participants rated comparison with similar, past projects based on personal memory significantly higher than any other method. This is consistent with the findings of Heemstra (1992) that informal estimation by analogy with past projects is the most common estimation method. Only 17% of survey participants report using a software package to help estimate development costs. This group reported a higher proportion of projects overrunning their estimates than the group that did not use software packages. This result is also consistent with the finding of Heemstra (1992) that cost models do not improve the accuracy of estimates. Lederer and Prasad (1993) canvassed the opinions of survey participants on the causes of inaccurate estimates. Four of the top five reported causes are related to difficulties in requirements elicitation and definition. The other cause of inaccurate estimates that appeared in the top five was overlooked tasks.
6.7 Expert Judgment in Cost Estimation Vicinanza et al. (1991) studied the use of expert judgment in cost estimation. Experts were asked to estimate the effort for a set of projects based on inputs to COCOMO and function point models. The estimates were generally made without relying on formal algorithmic techniques. The same set of projects had been estimated algorithmically in a previous study (Kemerer, 1987). When the results were compared, the experts’ estimates correlated more highly than estimates made by the models, although the mean error varied more widely for the experts than the models. The experts’ success is attributed to a greater sensitivity to factors affecting productivity in a particular situation. However, there is some contrary evidence, from studies of expert performance in prediction in other domains, which suggests
SOFTWARE COST ESTIMATION
115
that predictions made by experts are inferior to those made by simple linear models (Johnson, 1988).
6.8 Discussion These studies highlight the need for measurement, feedback, and packaging of experience to support software cost estimation. These are key parts of our prediction process that do not appear to be prevalent in practice. The studies reported previously indicate that most estimates are made informally, based on the estimator’s previous experience (Lederer, et al., 1990;Taff etaf.,1991;Hihn and Habib-agahi, 1991;Heemstra, 1992;Lederer and Prasad, 1993). Cost models are used rarely, if at all, and have not been demonstrated to improve the accuracy of estimates (Heemstra, 1992; Lederer and Prasad, 1993). The prevalence of informal estimates is inevitable where there is little or no recorded data available on which to base an estimate. The lack of suitable historical data is likely to be one of the factors limiting the application of formal estimation techniques and cost estimation models. Heemstra (1992) reports that only 50% of participating organizations recorded data from completed projects. It is unlikely that many of the organizations surveyed had suitable data available, as it is well noted that software engineering methodology and practice has placed too little emphasis on the importance of measurement (e.g., DeMarco, 1982; Fenton, 1991). The evidence that the use of cost models is not associated with improved accuracy may also be a consequence of the lack of historical data. Cost models that are not calibrated for the environment in which they are applied are likely to be inaccurate (e.g, Mohanty, 1981; Kemerer, 1987; Jeffery and Low, 1990). The collection and storage of appropriate data as part of an experience base is the key to developing reliable local cost models. It is also possible that cost models have been used in the manner suggested by the NSA report (Masters, 1985) to substantiate that a particular cost target will be achieved. Where this is the case, it would be surprising to find the use of cost models associated with greater accuracy. Between 60 and 80% of projects are reported to overrun their estimates significantly (Heemstra, 1992; Lederer and Prasad, 1993). Although earlier surveys of software cost estimation practice have not been researched, the level of accuracy reported in the surveys of Heemstra (1992) and Lederer and Prasad (1993) does not suggest that the software development community has improved the accuracy of cost estimates over the past 20 years. One prerequisite for improvement is feedback, by comparing estimates with actual values and analyzing the reasons for accuracy or inaccuracy. Feedback is part of step 4 of our prediction process. Such feedback is
116
FIONA WALKERDEN AND ROSS JEFFERY
difficult where there is no defined process for estimation or the process is not supported by appropriate measurement. Although the process cannot eliminate difficulties inherent in estimation, such as allowance for changes in system requirements, organizations which have attempted to improve their estimates through better practice have reportedly been successful (Taff et al., 1991; Daskalantonakis, 1992). Even if inaccurate estimates can be largely attributed to difficulties in eliciting and defining requirements, as suggested by the survey of Lederer and Prasad (1993), the value of using an estimation process and models should not be discounted. However, where requirements change regularly, it is important to track the assumptions on which estimates are based, and to reestimate when these assumptions are no longer valid.
7. Contributions
7.1
Review of Software Cost Estimation Research
This article draws together a wide range of research into software cost estimation. It reviews not only software cost estimation models, but also software cost estimation processes and practice. The review of processes and practice provides a context for the review of software cost estimation models. This context allows us to consider how to apply models successfully and what difficulties arise in practice.
7.2 The Prediction Process and Selection Framework The prediction process and the framework for selecting predictive measures and methods that are presented in this article highlight the importance of measurement within the context of system development. Measurements must be made to collect the data on which predictions are based. The prediction process and framework also emphasize the need to package and reuse experience. If the goal is to improve the accuracy of predictions, access to such an experience base is necessary. An experience base contains past data, models, and expertise in applying prediction methods and models. The framework for selecting predictive measures and methods identifies a number of factors that characterize the circumstances in which predictions are being made. The framework can be used to help select the measures and methods that are appropriate to these circumstances. Measures and methods are not “one size fits all.” The framework can help avoid the selection of software cost estimation models and methods primarily on the basis of factors such as how widely known they are.
SOFTWARE COST ESTIMATION
117
The prediction process and selection framework also provide a basis for analyzing and interpreting the research into software cost estimation that is reviewed in this article.
7.3 Analysis and Interpretation of Previous Research The analysis and interpretation of the research reviewed in this article identifies a number of limitations and difficulties in software cost estimation research and practice and offers some explanations of why they have arisen. The lack of measurement within the software development industry is constraining progress in the field of software cost estimation. This is a difficulty faced by both researchers and practitioners. Researchers need data to explore and develop new models and methods for software cost estimation. Practitioners need data to use existing methods and models successfully. Models for predicting total system development effort are important and have been the focus of much previous research. If this focus continues, however, it may be a limitation, as models for predicting effort for system development activities are needed for planning and monitoring progress once system development is underway. Making accurate estimates using software cost estimation models is difficult. Uncertainty is introduced because the model explains only part of the variation in effort. Developing models that are better at explaining this variation, and hence more accurate, is a great challenge to researchers. This must be complemented by effort on the part of practitioners to calibrate and apply the models correctly. Uncertainty is also introduced where the values of input parameters cannot be measured. There are arguably too few models that are suitable for early life-cycle estimation, if the ability to measure input parameters is the criterion by which to choose them. An obvious difficulty with the practice of software cost estimation, industrywide, is that the accuracy of software cost estimates is not improving. One explanation for this is that most practitioners take an informal approach to estimation that does not incorporate the feedback which is necessary to improve.
7.4
Needs for Future Research and Recommendations for Practice
The limitations and difficulties in software cost estimation research and practice identified in this article suggest some needs for future research and some recommendations to help improve practice.
118
FIONA WALKERDEN AND ROSS JEFFERY
7.4.1 Research Needs Research into early life-cycle measures for predicting effort is needed, so that estimates can be based directly on the information that is available prior to or at the start of system development. The unsuitability of models based on lines of code for this purpose has been discussed already. Research into models for predicting effort for system development activities is needed as well as further research into models for total effort. Research in this area may possibly lead to more accurate models, if the models are used to predict effort for well-defined and clearly understood activities. One of the disadvantages of developing models for total system development effort is the variation which is inevitable in the interpretation of this measure. However, the fact that a value for total effort is often readily available is an advantage to researchers. Models to estimate effort for development activities will be closely coupled to definitions of life cycles and processes, which is likely to make data collection more difficult, especially across organizations. A challenging area for research is the development of accessible models for predicting the effects of the relationships among effort, duration, and staffing for system development and constraints on these. Project managers need to be able to come up with solutions that satisfy constraints on all three, and these managers would benefit from assistance in identifying optimal solutions. The simple view that total effort can be deduced from attributes of the system alone, exemplified by a model that predicts effort based on lines of code, ignores the interactions among staffing, target completion date, and expected effort. The total effort expended is determined by the effects of these interactions. Developing approaches to software cost estimation that can take advantage of knowledge about the current progress will be valuable to practitioners who reestimate during system development. Reestimation is necessary because the assumptions on which estimates are based frequently change during system development. The challenge in this area is to develop approaches to help measure progress and then incorporate these into the estimation process, so that realistic estimates of the effort or time to complete can be made.
7.4.2 Recommendations for Practice Circumstances in which system development can proceed without effort estimates are arguably very rare. As estimates are needed, they will be used regardless of how they are made or how inaccurate they are suspected to be. The problem will not go away, so we may as well try to improve our
SOFTWARE COST ESTIMATION
119
estimates by changing our practices. Although the prospects for accurate estimates based on formal software cost estimation methods and models may not look promising, the other extreme, making estimates as one-off stabs in the dark, offers no opportunities for improvement. The most important recommendation for improving practice is to follow a process for software cost estimation, at least informally. The feedback from assessing the accuracy of the predictions and analyzing the causes of inaccuracy is necessary for improvement. Adopting a prediction process such as the one proposed in this article should provide benefits both in the short term, when selecting the measures to predict, and in the long term, as experience is gained and packaged for reuse. The other key area for improvement is measurement. A software cost estimation process that is adopted without the benefits of appropriate data to support it will not progress beyond making stabs in the dark. Software cost estimation models developed by researchers are not an overnight solution to improve estimates. Models need to be calibrated for the environment in which they are applied, if they are to be used with any confidence. Calibration of existing models and development of local models will be necessary to make accurate estimates. The software cost estimation methods and models described in this paper vary in how difficult they are to calibrate and apply. If the time available to prepare an estimate is limited, it is recommended that simple approaches be adopted, such as empirical parametric models, at least until some experience in estimation has accumulated. Practitioners may also benefit by considering measures that are specific to their application domain when developing models, rather than restricting the selection of measures to those related to software processes or products. Application-specific measures may be suitable for early life-cycle prediction. Software researchers are likely to focus on measures from the software domain, and so may miss opportunities in this area.
7.5
Foundation for Software Cost Estimation Too I DeveIo pment
The work presented in article paper is the foundation for development of a software cost estimation tool, A3CE, the “Analogical And Algorithmic Cost Estimator.” This development is being carried out within the Centre for Advanced Empirical Software Research (CAESAR) at the University of New South Wales, and is funded by the CSIROMacquarie University Joint Research Centre for Advanced Systems Engineering (JRCASE) as part of the SQUATter project (Offen and Jeffery, 1997).
120
FlONA WALKERDEN AND ROSS JEFFERY
In this context, one of the main contributions of the work is the perspective that it provides on the sorts of methods, measures, and models that the tool should support. It is clear that the tool needs t o be flexible, because of the variety of measures and models which are potentially useful in software cost estimation. One of the objectives of the tool is to help estimators improve the accuracy of their estimates. Software cost estimation models must be calibrated for the environment in which they are applied, if their predictions are to be reliable. To meet this objective the tool therefore needs to allow users to calibrate or develop models based on local data. In order to support this, and to support estimation by analogy, the tool must include a repository for this local data.
8. Conclusion 8.1 Overview of the Work That Has Been Done The work presented in this article is motivated by a desire to find ways to improve the accuracy of software cost estimates. This requires an understanding of why existing approaches to software cost estimation have not resulted in accurate estimates. A prediction process and framework for selecting predictive measures and methods have been developed and used to analyze and interpret existing research in the field of software cost estimation. Parallels can be drawn between the prediction process and the Quality Improvement Paradigm (QIP) of Basili and Rombach (1988). The selection framework fills an equivalent place in the prediction process to the Goal/Question/Metric Paradigm (GQM) in QIP.
8.2 Areas That This Work Has Not Addressed The review of the software cost estimation literature has excluded research that deals explicitly with software maintenance and reuse. Research into estimation for software maintenance is important because a large proportion of software development is done to enhance and repair existing systems. Research into estimation for systems that reuse software is important because reuse is a strategy for containing software costs and reducing the risks and unpredictability associated with the development of new software. Although these areas of research have not been addressed specifically in this work, the strategies for improving the accuracy of software cost estimates that this work identifies are fundamental to successful prediction and so are also applicable to maintenance and reuse estimates.
SOFTWARE COST ESTIMATION
121
In considering how to make software cost estimates more accurate, the work presented in this article focuses on how to improve the way in which we make estimates. This only considers one side of the equation. The other side of the equation is a detailed analysis of why system development effort is typically so difficult to predict, and why in some cases estimates are met successfully in spite of the odds. Abdel-Hamid and Madnick (1986) observe that estimates influence the outcome. When estimates are accepted and plans are based on them, the estimates become targets for developers. The targets can keep moving, however. Lederer and Prasad (1993) report that change in requirements is regarded by practitioners as a major cause of inaccurate estimates. Software cost estimation may also reap benefits from initiatives in software process improvement. Developers may do things differently from one project to the next simply because the processes followed by the previous project are not known. Within an organization, processes that result in a consistent approach to the activities involved in developing software can be expected to eliminate some of the variability in effort expended. Any innovations that reduce the variability in effort expended on system development activities will make it easier to predict effort more accurately.
8.3 Outcomes of This Work and Opportunities for Further Research Industrywide, the accuracy of software cost estimates does not appear to be improving from one decade to the next. One explanation for this is that most practitioners take an informal approach to estimation that does not incorporate the feedback which is necessary to improve. This highlights the need for a software cost estimation process that incorporates feedback. The prediction process proposed in this article is a suitable model for this. The lack of measurement within the software development industry is constraining progress in the field of software cost estimation. Researchers need data to explore and develop new models and methods for software cost estimation. This highlights the need for a tool set such as SQUATter (Offen and Jeffery, 1997), whose objective is to provide a comprehensive environment for measurement of and research into software processes and products. The focus of much previous research in software cost estimation has been models for predicting total system development effort based on readily available measures such as lines of code, and to a lesser extent function points. There are opportunities for broadening the research into models to focus on early life-cycle effort prediction, models to predict effort for
122
FIONA WALKERDEN AND ROSS JEFFERY
activities as well as total effort, and models of the interrelationships among effort, duration, and staffing.
8.4 Consequences for Practitioners An important consequence of this work for practitioners is that following an estimation process, with improvement as a goal, may be the most effective way to improve. One reason for this is that feedback, centered on an analysis of the reasons for differences between estimates and actual values, is necessary to improve the accuracy of estimates. Feedback is a mandatory step in an improvement process. Adopting a prediction process such as the one proposed in this article should also provide benefits both in the short term, when selecting the measures t o predict, and in the long term, as experience is gained and packaged for reuse. Another key area for improvement is measurement. A software cost estimation process that is not supported by appropriate data will fail. Data must be collected and stored so that software cost estimation models can be calibrated for the environment in which they are applied. Without calibration, the models will be unreliable and cannot be used with any consequence. A strategy for capturing experience is also important to make and sustain improvements in the accuracy of estimates. Some of the software cost estimation methods and models described in this article are neither quick to set up nor easy to apply. An experience base enables the reuse of models and passing on of expertise developed in using them.
8.5
Development of a Software Cost Estimation Tool
The consequences of this work for practitioners lead to a number of needs that can be met by the development of a software cost estimation tool. The main features that A3CE, the “Analogical And Algorithmic Cost Estimator,” will support are the collection of measurement data, the development and calibration of models, the production of estimates and the evaluation of the accuracy of these estimates. If the tool is successful in assisting practitioners within an organization to improve the accuracy of their software cost estimates, it will have made a substantial contribution. More accurate estimates will provide financial benefits to the organization and a competitive advantage. Inaccurate estimates are a curse on management, users, and developers alike. Any initiative that can be shown to result in a significant improvement in accuracy will be welcomed with open arms.
SOFTWARE COST ESTIMATION
123
REFERENCES Abdel-Hamid, T. K.. and Madnick, S . E. (1986). Impact of schedule estimation on software project behavior. ZEEE Software, July, pp. 70-75. Abdel-Hamid, T. K., and Madnick, S. E. (1991). “Software Project Dynamics: An Integrated Approach.” Prentice Hall, Englewood Cliffs, NJ. Abdel-Hamid. T. K.. and Madnick. S. E. (1993). Adapting, correcting, and perfecting software estimates: A maintenance metaphor. Computer, March, pp. 20-29. Albrecht, A. J. (1979). Measuring application development productivity. In Proc. Joint SHAREIGUIDEIIBM Application Development Symposium, pp. 83-92, October 1979. Albrecht, A. J.. and Gaffney, J. E. (1983). Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Software Eng. 9(6). 639-648. Arifoglu, A. (1993). A methodology for software cost estimation. SIGSOFT Software Eng. Notes 18(2), 96-105. Atkinson. K., and Shepperd, M. J. (1994). The use of function points to find cost analogies. In Proc. European Software Cost Modelling Conference, Ivrea, Italy, May 1994. Bailey, J. W.. and Basili, V. R. (1981). A meta-model for software development resource expenditures. In Proc. 5th International Conference on Software Engineering, Lund, Sweden, July 1983. pp. 107-116. Banker, R. D.. Chang, H.. and Kemerer, C. F. (1994). Evidence of economies of scale in software development. Inform. Software Tech. 36(5). 275-282. Basili (1994) p. 117. Basili. V. R. (1995). The experience factory and its relationship to other quality approaches. In “Advances in Computers,” Vol. 41, pp. 65-82, Academic Press, San Diego. Basili, V. R., and Beane. J. (1981). Can the Parr curve help with Manpower distribution and resource estimation problems? J. Syst. Software 2, 59-69. Basili, V. R., and Panlilio-Yap, N. M. (1985). Finding relationships between effort and other variables in the SEL. In Proc. IEEE Compsac, Chicago, IL, October 1985. Basili, V. R.. Caldiera, G.. and Rombach, H. D. (1994). In “Encyclopedia of Software Engineering,” (J. J. Marciniak. Ed.), John Wiley, New York. Basili, V. R., and Rombach, H. D. (1988). The TAME project: Towards improvement-oriented software environments. IEEE Trans. Software Eng. 14(6), 758-773. Boehm, B. W. (1981). “Software Engineering Economics.” Prentice Hall, Englewood Cliffs, NJ. Boehm. B. W., Clark, B., Horowitz. E., Westland. C., Madachy, R., and Selby, R. (1995). Cost models for future software life cycle processes: COCOMO 2.0. I n “Annals of Software Engineering Special Volume on Software Process and Product Measurement” (J. D. Arthur and S. M. Henry, Eds.), J. C. Baltzer AG, Science Publishers, Amsterdam. Briand. L.. Basili, V., and Thomas, W. (1992). A pattern recognition approach for software engineering data analysis. IEEE Trans. Soffware Eng. 18(11), 931-942. Brooks, F. P. (1975). “The Mythical Man Month.” Addison-Wesley, Reading, MA. Brownlow. L. S. (1994). Early estimating in the object-oriented analysis and design environment. In Proc. European Software Cost Modelling Conference, Ivrea, Italy, May 1994. Conte, S . D.. Dunsmore, H. E., and Shen. V. Y. (1986). “Software Engineering Metrics and Models.’’ Benjamin Cummings, Menlo Park, CA. Courtney, R. E., and Gustafson, D. A. (1993). Shotgun correlations in software measures. Software Eng. J . January, pp. 5-13. Cuelenaere. van Genuchten, and Heemstra (1987) p. 111, 137, 152. Cuelenaere, A. M., van Genuchten, M. J. I. M., and Heemstra, F. J. (1994). Calibrating a software cost estimation model: why and how. Inform. Software Tech. 29(10), 558-567.
124
FIONA WALKERDEN A N D ROSS JEFFERY
Daskalantonakis, M. K. (1992). A practical view of software measurement and implementation experiences within Motorola. IEEE Trans. Software Eng. 18(1I ) , 998-1010. DeMarco, T. (1982). “Controlling Software Projects.” Yourdon Press, New York. DeMarco, T., and Lister, T. (1987). “Peopleware.” Dorset House, New York. Esterling, R. (1980). Software manpower costs: A model. Datamation, March, pp. 164-170. Fairley. R. (1994). Risk management for software projects. IEEE Software. May, pp. 57-67. Fei. Z.,and Liu,X. (1992). f-COCOMO: Fuzzy constructive cost model in software engineering. In Proc. IEEE International Conference on Fuzzy Systems, San Diego. March 1992. Fenton. N. E. (1991). “Software Metrics: A Rigorous Approach.” Chapman and Hall, London. Heemstra, F. J . (1992). Software cost estimation. Inform. Software Tech. 34( 10). 627-639. Hihn, J., and Habib-agahi, H. (1991). Cost estimation of software intensive projects: A survey of current practices. I n Proc. 13th International Conference on Software Engineering. Austin. Texas, pp. 13-16, May 1991. Humphrey, W. S. (1995). “A Discipline for Software Engineering.” Addison-Wesley. Reading, MA. Jeffery. D. R. (1987a). Time-sensitive cost models in commercial MIS environments. IEEE Trans. Suftware Eng. l3(7). 852-859. Jeffery, D. R. (1987b). The relationship between team size, experience, and attitudes and software development productivity. In Proc. IEEE Comp. SOC.COMPSAC, Tokyo, Japan. October 1987, pp. 2-8. Jeffery. D. R., and Low, G. C. (1990). Calibrating estimation tools for software development. Software Eng. J . 5(4). 215-221. Jeffery, D. R.. and Stathis. J. (1996). Function point sizing: Structure, validity and applicability. Empirical Software Eng. 1(1), 11-30, Johnson. E. J . (1988). Expertise and decision under uncertainty: Performance and process. In “The Nature of Expertise” (M. T. H. Chi, R. Glaser, and M. J. Farr. Eds.). Lawrence Erlbaum Associates, NJ. Kemerer, C. F. (1987). An empirical validation of software cost estimation models. Conzniirn. ACM 30(5), 416-429. Kitchenham, B. A. (1990). Software development cost models. I n “Software Reliability Handbook,” (P. Rook. Ed.). Elsevier Applied Science, New York. Kitchenham, B. A. (1992). Empirical studies of assumptions that underlie software costestimation models. Inform. Sofrware Tech. 34(4). 21 1-218. Kitchenham, B. A,. and Taylor. N. R. (1984). Software cost models. ICL Tech. J . May, pp. 73-102. Kitchenham. B. A.. Pfleeger, S. L.. and Fenton, N. (1995). Towards a framework for software measurement validation. l E E E Trans. Software Eng. 21(12), 929-944. Lederer, A. L.. and Prasad, J. (1993). Information systems software cost estimating: A current assessment. J . Inform. Tech. 8, 22-33. Lederer. A. L., Mirani, R.. Neo, B. S.. Pollar, C.. Prasad. J., and Ramamurthy. K. (1990). Information system cost estimating: A management perspective. MIS Quurterly 14(2). 159-178. Lo, R., Webby, R., and Jeffery, D. R. (1995). Sizing and estimating the coding and unit testing effort for GUI systems. I n Proc. Australian Conference on Software Metrics, Sydney. November 1995. Low, G. C., and Jeffery. D. R. (1990). Function points in the estimation and evaluation of the software process. IEEE Trans. Suftwure Eng. 16(1). 64-71. Masters, T. F. (1985). An overview of software cost estimating at the National Security Agency. J . Paranietrics, 5( 1). 72-84.
SOFTWARE COST ESTIMATION
125
Matson. J. E.. Barrett. B. E.. and Mellichamp, J . M. (1994). Software development cost estimation using function points. IEEE Trans. Software Eng. 20(4), 275-287. Miyazaki, Y.. and Mori, K. (1985). COCOMO evaluation and tailoring. In Proc. 8th International Conference on Software Engineering. London, UK, August 1985, pp. 292-299. Mohanty. S. N. (1981). Software cost estimation: Present and future. Software Practice and Experience 11, 103-121. Mukhopadhyay. T., and Kekre. S. (1992). Software effort models for early estimation of process control applications. IEEE Trans. Software Eng. 18(10), 915-924. Mukhopadhyay. T.. Vicinanza, S.. and Prietula, M. J. (1992). Estimating the feasibility of a case-based reasoning model for software effort estimation. MIS Quarterly 16(2), 155-171. Norden, P. V. (1970). Useful tools for project management. In "Management of Production," (M. K. Starr, Ed.), Penguin Books, Baltimore. Offen, R. J.. and Jeffery, D. R. (1997). A model-based approach to establishing and maintaining a software measurement program. IEEE Software, in press. Parr, F. N. (1980). An alternative to the Rayleigh curve model for software development effort. l E E E Trans. Software Eng. May, pp. 291-296. Putnam. L. H. (1978). A general empirical solution to the macro software sizing and estimating problem. IEEE Trans. Soffware Eng. July, pp. 345-361. Shepperd, M. J.. Schofield. C., and Kitchenham, B. (1996). Effort estimation using analogy. In Proc. 18th International Conference on Software Engineering, Berlin, Germany, March 1996. Srinivasan, K., and Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Trans. Soffware Eng. 21(2). 126-137. Subramanian, G. H., and Breslawski, S. (1993). Dimensionality reduction in software development effort estimation. J. Sysr. Software, 21(2). 187-196. Taff. L. M., Borchering, J. W., and Hudgins, W. R., Jr. (1991). Estimeeting: Development estimates and a front-end process for a large project. IEEE Trans. Sofhoare Eng. 17(8), 839-849. van der Poel, K. G., and Schach. S. R. (1983). A software metric for cost estimation and . Software, 31, efficiency measurement in data processing system development. .ISysr. 187- 191. Vicinanza, S.. Mukhopadhyay, T., and Prietula. M. (1991). Software effort estimation: An exploratory study of expert performance. Inform. Sysrems Res. 2(4), 243-262. Walston. C. E., and Felix, C. P. (1977). A method of programmingmeasurement and estimation. IEM Syst. J. 16(1), 54-73.