A Model for Software Sizina June M. Vemer and Graham Tate Massey University, New Zealand
There is a need for better software sizing models. Several existing models, including Albrecht’s Function Point Analysis and DeMarco’s system BANG, are examined as candidate models, and some of their deficiencies are noted. An architecture for a sizing model for applications systems is presented, its major components are identified, and their general functions described.
1. THE NEED FOR BElTER MODELS
SOFIWARE
SIZING
Notwithstanding the complexity of the problem of software costing, considerable progress has been made in software cost estimation based on a given size, notably in the COCOMO [4] and Putnam SLIM [15] models. Experience, however, still plays a large part in successful size and cost estimation [3]. Most computer installations are too small to have a separate cost estimation group, have limited access to experts experienced in size estimation, and have inadequate historical records. To make matters worse, some sizing/costing methods are installation-dependent, even to the extent of the basic unit of measurement being defined somewhat differently within the same basic model [l, 6, 8, 161. Many managers are understandably uneasy and defensive about the basis of their estimates. As shown in Figure 1, software sizing is at present the weakest link in the software cost estimation chain. Common reasons for inaccurate sizing, usually underestimation, include optimism, imperfect recall, avoidance of confrontation situations, and incomplete product knowledge [4]. There is a need for software sizing models with levels of reliability similar to those obtained for the costing models COCOMO [4] and COPMO [7]. DeMarco [8] defines a good metric in this context as being measurable, independent (objective), accountable, precise, consistent, and available early enough to be useful.
Address correspondence to: Miss J. M. Verner, Computer Science Department, Massey University, Palmerston North, New Zealand.
Gilb has recently drawn attention to the complex interrelationships between.cost, quality, design decisions, and management control in software development [ 121. In practice, software sizing and costing are parts of an iterative decision-making process, involving tradeoffs between system specifications, aspects of software quality, and scheduling, within resource constraints. 2. RECENT AlTEMPTS PROBLEM
TO SOLVE THE SIZING
During the past several years, there have been some useful attempts to solve the sizing problem, particularly for commercial data processing systems. In 1979 Albrecht [l] introduced Function Point Analysis (FPA). He believed that lines of code as a measure was too language-dependent and instead measured the “size” of a system in function points. In 1982, DeMarco [8] published work on a sizing and costing method that used a different unit, called System BANG, to measure system size. Itakura and Takayanagi [ 131 developed a language-dependent modeling technique that they used for size estimation of batch programs in a particular banking environment. FPA appears to be the most widely used of these methods, possibly because it is both language-independent and simple to use. It is popular in IBM installations and is increasingly being used for sizing systems written in fourth-generation languages, often for productivity estimation. FPA and System BANG are examined as system sizing candidates in more detail later. 2.1 Function
Point Analysis
FPA was used by Albrecht to compare productivity between projects that were written in different languages and used different technology. He used the average number of lines of code required to develop a function point to show the relative productivities of COBOL, PL/ 1, and DMS/VS [ 11. FPA was used by Rudolph to produce productivity figures for certain 4GLs [ 161. 173
The Journal of Systems and Soiiware 7, 173-177 (1987) 0 1987 Elsevier Science Publishing Co., Inc., 1987
0164-1212/87/$3.50
J. M. Verner and G. Tate
174
Figure 1.
Though it has been primarily used in productivity studies, FPA can also be used as the basis of a software sizing method. The function point measure is said to be based on the user’s view of system function and it is claimed that nonDP users can evaluate the measure [ 161. Figure 2 illustrates the steps involved in the FPA calculations. The raw function point count is based on the number and complexity of: external input types external output types logical internal file types external interface file types external inquiry types. This initial count is summed and later modified using 14 system adjustment factors to get a final function point count for the system [2]. Since some of these system
Software sizing and costing.
adjustment factors could be classed as cost drivers rather than size drivers, the final function point count is more a measure of total effort than of size, but this can easily be changed. FPA treats the system as if all the inputs, outputs, queries, etc., flow into or out of a single black box process. Thus the raw function point count does not measure the effects of complex processing. FPA tries to overcome this obvious flaw by its use of system adjustment factors. Here the effects of interactive interfaces and complex processing on the system as a whole, together with other cost drivers, are used to modify the raw function point count. These adjustment factors seem to be the weakest part of the method. They are small additive corrections with equal m~imum weight (an unlikely situation in the real world) and with a relatively small total effect, having a possible adjustment range of from -35% to -t-35%. Given the subjectivity of the adjustment criteria, and the tendency
external input, output, inquiry h interface-file types
Figure 2. Function point analysis.
raw function
TOTAL EFFORT in FUNCTION POINTS
175
A Model for Software Sizing of assessors towards means, the overall adjustment of the raw function points will rarely be more than 15 % . Moreover, these adjustment factors appear rather oldfashioned as they seem to be aimed largely at batch processing systems. Function points and counts of unique I/O operands were compared by Albrecht and Gaffney as estimators for delivered source ins~ctions and effort in COBOL and PLlI prugrams. They found that develupment time and application size were both “strong functions” of function points and I/O data item count for the applications analyzed [23, 2.2 lIeMarco’s System BANG In 1982, DeMarco introduced another measure of effort called System BANG that measures ’ ‘net usable function from the user’s point of view” [8, 91. Since “BANG is the independent variable that drives the cost model, ” it is also a can’idate measure for system size. System BANG is calculated from Structured Systems Analysis specifications [ 111) as shown schematically in Figure 3, The system specifications are developed down to funcFigure 3. DeMarco’s system BANG.
weighted funct~onel primitrve increment table
( input/output tokens for ) \ ,_exh prrmitive_,/ e-
( each primltlve
(
usmg Its )
factor for each tuDe
tional primitives (FP). A FP is described as ‘
J. M. Vemer and G. Tate Structured Systems Analysis [ 111. Indeed, until it is automated, it is unlikely that it will be used widely at all.
The software sizing model is seen as consisting of three main components: A, B and C (see Figure 4). The inputs to the sizing model are:
3. AN APPROACH TO SOFIWARE SIZING
1. The system model, which is, as far as possible, an abstract specification of the system free from implementation considerations or other similar constraints-it describes what the system is to be, e.g., in data flow diagrams, a data model, and associated dictionary entries. 2. The system constraints, including requirements not in the “what” category which can be consolidated into size drivers, e.g., aspects of interactiveness and user-friendliness, program clarity, and system decentralization or distribution.
The software sizing and costing model shown in Figure 4 provides the framework for this research in software sizing. The great complexity of the whole sizing and costing problem, coupled with the relative success of existing models in costing from a given size, has led to concentration by the authors on the estimation of system size in KDSI (thousands of delivered source instructions) . Objections have been raised to KDSI as a system size measure [5, 6, lo]. Gaffney, in particular, argues for a more language-independent measure of “software function” [lo]. However, KDSI has the great advantage of providing a measurable output of the sizing model as well as an input to the common costing models. A more fundamental objection can be raised in relation to the sizing and costing framework shown in Figure 4, namely that it assumes that the sizing and costing processes are separable. This assumption has so far been largely vindicated by the success of COCOMO and other size-based costing models. If further research showed that many significant cost drivers were also size drivers, it may have to be questioned. At this stage, research has been restricted to data processing (DP) applications, such as those commonly found in commercial processing, implemented in what Boehm calls the organic mode [4]. This has the advantage of greatly restricting variability within the domain of interest. It also has the pragmatic advantage of the ready availability of empirical data. A complication is the increasing variety of languages in which data processing systems are now commonly written, including many so-called fourth-generation languages. Often, more than one language is employed in a single system.
A major goal for part A of the sizing model is objectivity. There appears to be no good reason why sizeA should not be calculated directly from a computerized system model as a by-product of the system definition process. A critical issue is the unit of measure for sizeA. Candidates include lines of code in a standard language, or a language-independent measure, such as * function points, System BANG, or a similar metric. Gaffney [lo] suggests that “the amount of machine code produced (after taking compiler inefficiency into account) may provide a better quantification of function than the lines of source coding employed,” but this seems to exchange language-dependence for machinedependence. It is believed that improved languageindependent metrics can be developed based on experience using function points and System BANG, both of which have deficiencies as noted in Sections 2.1 and 2.2. Part B of the sizing model has the role of mapping the language-independent sizeA to the language-dependent sizeB, which can be thought of as being the size of a “neutral” source coding of the system in the chosen language. Where several languages are used, or where the mapping is different for clearly definable parts of the
Figure 4. A software sizing and costing model.
177
A Model for Software Sizing system, sizeB is a weighted sum of components. The notions of language level and expansion ratios, together with work similar to that of Albrecht and Gaffney [2] relating function points to LOC in different languages, is relevant here. Part C of the sizing model aims to apply appropriate correction factors to sizeB, or parts thereof, to obtain a realistic final size in KDSI taking into account such constraints or design decisions as user-friendliness or program clarity. The oft-quoted work of Weinberg and Schulman [17] would suggest that such adjustment factors can, in special cases, be much greater than the + or - 35% limit of FPA. In most systems, large adjustments would, however, only apply to parts of the system, e.g., the user interface.
4. SUMMARY AND FUTURE WORK Existing methods of software sizing have a number of deficiencies, including too great a degree of subjectivity. An approach that combines and develops features of Albrecht’s FPA and DeMarco’s System BANG may be a candidate for automated software sizing as a by-product of system specification. A survey of publish~ research has led the authors to the conclusion that the three main issues in software sizing are system function, programming language dependence, and design criteria or constraints. The isolation of these factors seems to be essential to the unders~nding of the sizing process. An ~chitecture for a software sizing model suitable as a front-end to a size-based software costing model has been presented. This architecture makes explicit the roles of the abstract system model(s) (sizeA), the language(s) employed (sizeA -+ sizeB), other size drivers (sizeB -+ sizeC), in the compu~tion of KDSI for a software system. Partitioning the sizing process in this manner is believed to be necessary to the achievement of objectivity in the specification-related component of system size (sizeA). Work is proceding on the refinement of the sizing model and the collection of data for determining its parameters and validating it in several commercial DP environments.
REFERENCES 1. A. J. Albrecht,
Measuring A~li~ation ~velopment Productivity, SHARE-GUIDE 83-92 (1979). 2. A. J. Albrecht and J. E. Gaffney, Jr., Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation, IEEE Transactions on Software Engineering EE-9, 6, 639-648 (November
1983). 3. B. W. Boehm, Controlling Software Projects, (foreword), Yourdon, New York, 1982. 4. B. W. Boehm, Software Engineering Economics, Prentice-Hall, Englewood Cliffs, NJ, 1981. 5. A. Bryant, J. A. Kirkham, and B. W. Boehm, Software Engineering Economics-A Review Essay, Software Engineering Notes 8, 44-60 (July 1983). 6. M. L. Coleman, Measuring Software Development Productivity, Technical Report No. 3888A, Aluminum Company of America, Feb. 1982. 7. S. D. Conte, H. E. Dunsmore, and V. Y. Shen, Software effort estimation and pr~u~tivi~, in: Advances in Computers, Vol. 24 (M. C. Yovits, ed.), Academic Press, New York, 1985. 8. T. DeMarco, Controlling Software Projects, Yourdon, New York, 1982. 9. T. DeMarco, An Algorithm for Sizing Software Products, Performance Evaluation Review 12, 13-22 (SpringSummer 1984)_ 10. J. E. Gaffney, Jr., The Impact on Software Development Costs of Using HOL’S, IEEE Transactions on Software Engineering SE-12, 496-499 (March 1986). 11. C. Gane and T. Sarson, Structured Systems Analysis Tools and Techniques, Improved System Technologies Inc., New York, 1977. 12. T. Gilb, Estimating Software Attributes: Some Unconventional Points of View, Software Engineering Notes 11, 49-59 (January 1986). 13. M. Itakura and A. Takayanagi, A Model for Estimating Program Size and Its Evaluation, Proceedings of the IEEE 6th international Conference on Software Engineering, IEEE Catalog No 82CH1795-1, September 1982. 14. M. H. Halstead, Elements of Software Science, Elsevier, New York, 1977. 15. L. H. Putnam, Software Cost Estimation and Life Cyck Control, IEEE Computer Society Press, New York, 1980. 16. E. Rudolph, Productivity in Computer Application Development, Dept. of Management Studies, Working Paper no. 9, University of Auckland, New Zealand, March 1983. 17. G. M Weinberg and E. L. Schulman, Goals and Performance in Computer Programing, Human Factors 16, 70-77 (1974).