Resolving the software science anomaly

Resolving the software science anomaly

Resolving the Software Science Anomaly D. N. Card and W. W. Agresti Computer Sciences Corporation, Silver Spring, Maryland The theory of software sc...

632KB Sizes 29 Downloads 85 Views

Resolving the Software Science Anomaly D. N. Card and W. W. Agresti Computer Sciences Corporation,

Silver Spring, Maryland

The theory of software science proposed by Halstead appears to provide a comprehensive model of the program construction process. Although software science has been widely criticized on theoretical grounds, its measures continue to be used because of apparently strong empirical support. This study reexamined one basic relationship proposed by the theory: that between estimated and actual program length. The results show that the apparent agreement between these quantities is a mathematic artifact. Analyses of both Halstead’s own data and another larger dataset confirm this conclusion. Software science has neither a firm theoretical nor empirical foundation.

INTRODUCTION Software science [l] is one of the most comprehensive

numerical models of programs and the process of program construction. Coulter [2] sees an anomaly in software science because it has strong empirical support, but its assumptions are incorrect applications of cognitive psychology. Researchers, however, continue to use software science measures and to incorporate them in software measurement products sold to software development practitioners. Consequently, the resolution of this anomaly at the interface of theory and practice should be a high priority among software engineers. Most of the empirical support for software science is based on analyses of the relationship between estimated and actual length. Researchers frequently report correlations of 0.95 or higher between these quantities [3]. Correlations of this magnitude are, however, nearly unprecedented in studies of other human activities. The purpose of this study was to reexamine the fundamental relationship in software science: that between the estimated and actual program length. Halstead’s own data [l] were reanalyzed and compared with a larger sample from the Software Engineering Laboratory (SEL) database [4]. The SEL is a research project

Address correspondence to D. N. Card, Computer Sciences Corporation, 8728 Colesville Road, Silver Spring, MD 20910.

sponsored by the National Aeronautics and Space Administration/Goddard Space Flight Center and sup-

ported by Computer Sciences Corporation and the University of Maryland. It has collected data from more than 45 production software projects. This study was undertaken as part of a larger software measures [5].

MATHEMATICAL

SEL investigation

of

DEPENDENCE

The high correlations between estimated and actual program length are a consequence of one being mathematically dependent on the other. This can be shown easily. Halstead [l] proposed that the actual length of a program can be estimated with the following equation: fi = nllogZnI + nJ0g2n2

(1)

where fi = estimated program length, nl = number of unique operators, and n2 = number of unique operands. The log2 function is a consequence of Halstead’s model of how programmers select and combine operators and operands. Researchers attempting to validate software science have found high correlations between the actual length and this length estimate. These correlations cannot, however, be accepted at face value. Consider this representation of the actual length:

N=N,+N,

(2)

where N = actual program length, N, = total number of operators, and N2 = total number of operands. For any given program, values A and B can be found so that the total number of operators and operands can be expressed as functions of the number of unique operators and operands:

N, = n,A

(3)

Nz= n2B

(4)

where A and B are repetition coefficients program. Substituting tion (2) produces

N=nlA+nzB

Equations

specific to this (3) and (4) into Equa-

(51 29

The Journal of Systems and Software 7, 29-35 (1987) 0 1987 Elsevier Science Publishing Co., Inc., 1987

0164-1212187/$3.50

30

D. N. Card and W. W. Agresti

Compa~ng Equatjons (1) and (5) shows that N and 64 are both increasing functions of nl and n2_Because the coefficients A, B, log,nr, and logznz are all always positive, a positive correlation must exist between N and g. An increase in v~ab~a~ (nr + n2) also increases program length (N). Correlation is a measure of dependence, but N and fi are dependent by definition; naturally the correlation coefficients are high. Consequently, the usual interpretation of the correlation coefficient does not apply in this case [6], Furthermore, tests of significance cannot be defined without first determining the intrinsic cot,relation between N and n = ni f n2+ The correct way to test the software science theory is to dete~ine whether Iogznt and loglnz predict the repetition factors, A and B, respectively. That is, do the log, terms add any additional information about N that is not already contained in nt and n2. An ex~ination of Halstead’s data and another larger database shows that they do not.

Halstead’s Data Halstead provides values of nl, n2, and N for 20 programs in Table 5.1 (page 29) of his book [I]. Table 1 shows some summary statistics calculated from Halstead’s data. The table indicates that the software science factor fogznt consistency underestimates the nI repetition factor (A), whereas logzn2 consistently overestimates the n2 repetition factor (B). It is fortuitous that, to some extent, these differences compensate for each other. The vocabulary (n = nt + n$ is about as good a predictor of actual length (N) as the software science estimator (64) for these data. Both vocabulary and the length estimate (A) show a 0.99 correlation with actual length (N). (Co~elation c~f~cients are used for comparison with other li~ra~re.) That is, a singie constant does as good a job of predicting the number of repetitions of nl and n2 in a program as the logznr and logZnz coefficients proposed by Halstead. O&ham ]?] suggested that the simpler modei should be preferred in such cases.

Table 2. SEL Data S~mma~*

Measure Executable sta~ments Unique operators (n,) Unique operands (a) V~abula~ (n) Actual program length (N) Estimated program length (8) -~

Standard Deviation

Mean 65 18 47 65 430 370

58 8 37 44 478 315 ..-___

* Based on 1193 newly developed FORTRAN modules.

SEL Data A sample of t 193 newly developed FORTRAN modules was selected from the SEL database for this study. Table 2 summarizes some of the relevant measures. These programs are subs~ntia~ly larger than those represented in the Halstead sample. However, the correlation results from the SEL data (reported in Table 3) still support the previous conclusion: The software science model does not provide any additional information useful for predicting program length. Table 3 demonstrates the level of empirical support obtained when the correct pairs of num~rs are correlated. Although log2n, and logzn2 show slight correlations with the repetition factors A and B, their net contribution (relative to n alone) as predictors of N is effectively zero, The correlations of R and n with N are about equal. A comparison of Figures 1 and 2 provides a more graphic demonstration of this. Figure 1 plots the actual program length (N) against the software science estimate (fi). In Figure 2, a simple count of unique symbols, or vocabulary (n = nl -t n2), produces the same relationship. An examination of the residuals (difference between the actual and predicted values) for the Halstead length estimator shows that it consistently underestimates for large modules (Figure 3). The discrepancy between the means of N and &Iin Table 2 supports this contention. Other researchers also have noted this ~enomenon [S].

Table 3. SEL Correlation Results __I__-

Table 1. Summa~ of H&stead% Ifafa _.._____--____1 Measure

Mean

nt A

22.0 4.2 9.0

n2

75.6

towb

4.6 3.6

b2n2

B _~_..

Correlation Maximum

Minimum “-~-9.0 3.2 2.6 8.0 2.4 2.4

82.0 6.4 28.2 433.0 8.8 5.4 .~-

Pair

A, log,n,

B, log& &, N n, N

Correlation Coefficient*

0.25 0.40 0.85? 0.85?

* Basedon 1193 newly developed FORTRAN modules. t Correlated variables dependent by definition.

Resolving the Software Science Anomaly

300.

soo.

31

l!iOW

Zlotl

2700

3300

3900

ACTUAL LENGTH

The Halstead length estimator can be modified to center the residuais around zero by substituting 6 for togznl and 6 for log2nZ(Figure 4). For small values of n, & and logzn behave similarly. Unfortunatefy, the accuracy of the estimator does not improve Thus, as I-Lamer and Frewin [9] suggested, ahemative formulations of equal or greater validity can be found. However, such changes imply shifts in the theoretic basis of software science. The data appear to support a wide range of theories equally well. DiSCUSSION Software science has been criticized on many fronts. Malenge [ 101 presented a thorough review, finding

Figure 1. Relationship between actual length and Halstead estimated length.

several errors in methodology, notably with Halstead’s use of logarithmic transformations. Couher [2] focused on the incorrect use of human memory models. Lister 181and Hamer and Frewin [9J criticized the experimental methods. Shen et. al. [ 111 summarized the criticisms of software science and pointed out the questionable derivation of some software science formulas. The results of this study are ~ndamental in that they chahenge program iength, which stands as the comerstone of the set of relations that comprise software science. For Halstead, it was “the first equation found to

32

D. N. Card and W. W. Agresti

*

1

l

c

.5*72 i X*5 0 ++z .+...,+r*.r+~L..t*o** m.

‘8 *‘ +.*c*+ 900.

..*.* vxKl

zmo

2ttm

8

tr*r.+.*r*t*r**t*

,.~.t~...*~...~*.*.+~~**

3300

m

ACTUAL LENGTH

Figure 2. R&tionship between actual length and vocabulary.

bold among the software parameters” [I]. MalengG confirms the centrality of program length as “the basic relation of software physics (like pV = nRT or F = ma) in the sense that, in what follows, Hafstead uses N or &I depending on the circumstances” [IO]. Thus, the failure of the length relation threatens the foundation of software science. Operators and operands may represent the wrong level of logical detail for modeling programs and the program construction process. Knuth f12] observed that programs contain recurring patterns. For example, 68 % of assignment statements from his sample of FORTRAN

programs contained exactly one operator and two operands. Assignment statements constituted 5 1% of all executable statements in his sample. This finding (reported in 1971) is not consistent with the software science model of program construction as a process of random selection of individual operators and operands. Programmers do not think in terms of individual operators and operands but rather in terms of paradigms, patterns, and plans [33]. The mathematical dependency noted earlier explains why the length equation has appeared to have deceptively strong empirical support. The (a~i~~i~ly~ high correlations between estimated and actual program length should not be surprising, nor should the strong relation reported between estimated length and execut-

Resolving

33

the Software Science Anomaly

. *4...*....+....t..,.+....+....+....+....*....t...,+....t.*..+...*+....+. . . 1

I

; . . . 1 ; . . .

2400 +

.

1

1

; . . .

1

1

1

3

1 t . . .

11 1

113 3 21 27 25 1 1?222311213 1 1 1 7342351524114 7 1 3 II 4323936~b313421221 1 77W*E*585@7Q4!%l722 ? , . om>tz : . a***W535331271i 13 2 1327 7 1 . 1 1 1 1 . I

-600.

-1200

: . . .

1

1 1

; . .

1 1

l

1

. t . . . ; . . .

1

;

1

; .

1

. .

. . . .+....



t

. . . .

300.

+.

+..*.*a.*.**

. ..+..*.t....t..*.+....t*...+....t....+....+....

900.

1!500

2100

2700

3300

8

‘B

39w

ACTUAL LENGTH

able statements, given Knuth’s [ 121 findings. Several researchers [14, 151 have reported that program length (or volume) is highly correlated with simpler measures, such as executable statements and decisions. Furthermore, another recent study [16] showed that software science measures are not as effective as the simpler measures in predicting the effort and error content of programs.

CONCLUSIONS This study has directly responded to the anomalous mismatch of the theory and practice of software science noted by Coulter [2] and expressed in the following way by Shen et. al. [ll]:

Figure 3. Residuals from original Halstead estimator. Since the length equation has been found to be a valid and useful formula in many different environments, there should be a better way to support it theoretically than that offered by Halstead. But, to our knowledge no sound justification has been offered to date. This study resolves the software science anomaly by demonstrating that the basic relation of software science lacks empirical, as well as theoretic, support. Other models based on counts of operators and/or operands [ 17, 181 are liable to similar analysis errors and theoretic limitations. Future models of the program construction process should be based more closely on observations of what programmers actually do [ 131.

34

D. N. Card and W. W. Agresti

11

l

1 21 7

31



1

. .

1

;

I 1

I

l

1

l

1

. 1

-1800

1

I .

;

1

l

.

l

-2700

.

l

; . .

; l

. l

l

;

-3ttao; . . *

. . .

1

-45UO1

+ *.. +.a*. * l..*+....+r...+***r

.+,...+....+..r.+..** 300.

+..*******

l

Sal.

1500

2100

2700

3300

+*********+.~

*l&i +a

3900

ACTUAL LENGTH

Figure 4. Residuals from modified Halstead estimator.

The challenge currently before the software engineering research community is to resolve its ~biv~en~e toward software science. Practitioners need a clear statement about the validity of software science measures. At present, that statement must be negative.

3.

4.

ACKNOWLEDGMENT

The authors would like to thank J. Elshoff and F. McGarry for their comments on an earlier draft of this paper. REFERENCES 1. M. H. Halstead, Elements of Software Science, Elsevier, New York, 1977. 2. N. S. Couher, Software Science and Cognitive Psychol-

‘. 7. 8.

ogy, IEEE Trans. Software Engineering 9(2), 166- 171 (March 1983). A. Fitzsimmons and T. Love, A Review and Evaluation of Software Science, ACMComputing Surveys 10(l), 3-18 (March 1978). D. N. Card, F. E. McCarty, G. T. Page, et. al., SEL-8I104, The Software Engineering Laboratory, NASA/ GSFC, February 1982. D. N. Card, F. E. McGarry, G. T. Page, et. al., SEL-83002, Measures and Metrics for Software Development, NASAIGSFC, March 1984. 0. J. Dunn and V. A. Clark, Applied Statistics: Analysis of Variance and Regression, John Wiley & Sons, New York, 1974, 243-246. William of O&ham (128%1349), The Law of Parsimony (Occam’s Razor). A. M. Lister, Software Science-The Emperor’s New Clothes?, Australian Computer J. 14(2), 66-71 (May 1982).

Resolving

the Software

Science

Anomaly

9. P. G. Hamer and G. D. Frewin, M. H. Halstead’s Software Science-A Critical Examination, Proceedings of the Sixth International Conference on Software Engineering. Computer Societies Press, New York, 1982, 197-206. 10. J. P. Malenge, Critique de la Physique du Logiciel, Universite de Nice, France, Pub. Inform. IMAN-P-23, October 1980 (M. Marcotty, trans.). 11. V. Y. Shen, S. D. Conte, and H. E. Dunsmore, Software Science Revisited: A Critical Analysis of the Theory and Its Empirical Support, IEEE Trans. Software Engineering 9(2), 155-165 (March 1983). 12. D. E. Knuth, An Empirical Study of FORTRAN Programs, Software-Practice and Experience l(l), 105133 (1971). 13. E. Soloway and K. Ehrlick, Empirical Studies of Programming Knowledge, IEEE Trans. Software Engineering 10(5), 595-609 (September 1984).

35

14. V. R. Basili, Evaluating Software Development Characteristics: Assessment of Software Measures in the Software Engineering Laboratory, Proceedings of the Sixth Annual Software Engineering Workshop, NASA/ GSFC, December 198 1. 15. D. Kafura, The Independence of Software Metrics Taken at Different Life-Cycle Stages, Proceedings of the Ninth Annual Software Engineering Workshop, NASA/ GSFC, November 1984. 16. V. R. Basili, R. W. Selby, and T. Phillips, Metric Analysis and Data Validation Across FORTRAN Projects, IEEE Transactions on Software Engineering 9(6), 652-663 (Nov 1983). 17. G. K. Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Reading, MA, 1949. 18. S. H. Zweben, A Study of the Physical Structure of Algorithms, IEEE Trans. Software Engineering 3(3), 250-258 (May 1977).