Programming Methodology, Organizational Environment, and Programming Productivity M. J. Lawrence University of New South
Wales
Programming data involving 278 commercial-type programs were collected from 23 medium- to large-scale organizations in order to explore the relationships among variables measuring program type, the testing interface, programming technique, programmer experience, and productivity. Programming technique and programmer experience after 1 year were found to have no impact on productivity, whereas on-line testing was found to reduce productivity. A number of analyses of the data are piesented, and their relationship to other studies is discussed.
INTRODUCTION With the steeply rising cost of software relative to hardware there has been continued interest in developing new techniques and support environments to assist the programming function, and considerable activity in retraining programmers in these new approaches. Regrettably, though, there has been little effort expended in empirically validating the effectiveness of the new techniques and environments. Research work on this topic has been undertaken using data from experimental settings [8, 19, 221 (usually using students as subjects), from the day-to-day work of a single organization’s programming group [6, 231, and from theoretical models [ 201. A major study that attempted to analyze the influence of various factors on practical programming productivity was reported by Walston and Felix [23] in 1977. One of their objectives was to provide data for the evaluation of improved programming technologies. They defined productivity as the ratio of delivered source lines of code to the total effort (in man-hours) required to produce the product, and identified a num-
Address correspondence to Dr. M. J. Lawrence, Department of Information Systems, The University of New South Wales, Post Ofice Box I. Kensington, New South Wales, Australia 2033.
The Journal of Systems and Software 2.257-269 0 Elsevier Science Publishing Co., Inc., 1982
(1981)
ber of factors heavily influencing productivity. A limiting constraint of that study from the standpoint of this paper is that data were collected on a project basis. This introduces two effects. First, because programming is generally held to be less than half the total effort, the potential influence of improved programming technologies is considerably diluted and more difficult to isolate: Second, for long projects there is a distinct likelihood that changes and modifications will increase work but have little effect on delivered source lines of code, thus confounding results concerning only the programming stage of the development life cycle. An additional limitation of Walston and Felix’s results is that the effect of each variable on productivity was considered independently and, furthermore, that because all data were drawn from one organization no light is shed on interorganizational differences. In this paper we analyze data involving a sample of 278 commercial-type programs collected from 23 medium-to-large organizations in Australia, representing semigovernmental organizations, manufacturing and mining concerns, and the banking and insurance industries. The data represent only the programmingthrough-unit testing stage of software development. The sample is analyzed to explore a number of hypotheses, some relating to claims made concerning the improved programming technologies and some relating to organizational impact. However, since the hypotheses relate to programming productivity, we shall discuss the definition of productivity and give the measure used in this study before formulating the hypotheses.
PRODUCTIVITY: PROBLEMS OF DEFINITION Productivity can be defined as the rate of accomplishment of satisfactory output per unit input. Thus to measure programming productivity there needs to be a quality control function to determine if output is satis-
257
258
factory, a means of measuring output, and a means of measuring input. Alternative measures proposed for programming productivity differ in their measurement of each of these three factors. Input is variously defined as dollars, man-hours, man-days, man-months, and man-years, and may be only for program development or may cover the full range of activities associated with a systems project including design, development, implementation, and all subsequent repair maintenance. There is a similar lack of consensus in the measurement of output where suggested alternatives include lines of code, number of operands and operators, statements, verbs, transfers of control, functional units (equivalent to COBOL perform paragraph), and lines of executable or procedural code [ 13, 17, 241. Since lines of code, number of statements, number of verbs, number of functional units, and number of transfers of control are all very highly correlated with one another [ 10, 31, simpler and more widely used measure of lines of code is considered representative of these code-oriented measures of output. But code-oriented measures are open to objection on the grounds that they are subject to accidental or deliberate manipulation by the programmer; furthermore, they measure an intermediate product, not the task the program has been written to accomplish. If systematic effects are present, such as the fact that experienced programmers typically implement a specification in fewer lines of code than an inexperienced programmer, then productivity calculations based on lines of code as an output measure will be biased. These objections to code-oriented measures can be overcome if it is possible to demonstrate that the same specification is generally implemented in a program of about the same length, independent of the identity of the programmer. Some evidence along these lines has been advanced. Jeffery and Lawrence [ 151 studied a sample of 93 programs produced by three organizations and concluded that for the two organizations where the programming standard was equivalent, the program length measured by the number of lines of procedure code was broadly equivalent for jobs of equal size. Chrysler [7] had previously studied 36 programs from one organization and concluded that program size could be predicted solely from program characteristics. Basili and Reiter [4] conducted an experiment using different software development approaches and found no impact on program size measured in modules or statements among the disciplined team, the ad hoc team, and ad hoc individuals. Additionally, the empirical findings of software science relative to program length [lo, 1 l] contribute further, perhaps though indirect evidence. Thus we conclude there is some evidence supporting the notion that program length is very highly correlated
M. J. Lawrence with the task size. Alternatively stated, if two programmers are given the identical program specification, there is evidence suggesting a high probability that they will produce programs of about the same length if they use the same language and operate under the same standards. It appears therefore that lines of code is a reasonable measure of output. It has an additional advantage over other measures of being widely used, in effect the de facto standard [ 171, thus enabling comparison of results from the various studies. However, not all organizations in this study use the same language and standards; therefore some systematic error will be present when using this measure. All the measures of output are insufficient to deal with the reuse of sections of code previously written for another application. Just to remove reused code from the line count (or other output measure) ignores the time taken to design the program, identify reusable modules, integrate the modules, and carry out any additional testing effort due to the reused segments. To illustrate, let us consider a program that can be made up entirely of prewritten modules. There will still be design effort to understand the specification and identify the modules required, and there is time required to assemble and test the program and perhaps remove bugs from the previously tested modules. The total time taken should be substantially less than writing the program afresh, but it will not be insignificant. The solution adopted to this problem was to eliminate any program that had more than an insignificant amount of reused code.
THE PRODUCTIVITY
METRIC USED
After lines of code had been chosen as an output measure, it then had to be decided whether to use total lines of code, lines of executable code, or lines of procedure code. Taking a rather pragmatic view, measures based on the total program length had to be discarded owing to the high proportion of reused code in the data declaration section (the COBOL data division), which is typically set up once in a copy library at the start of a project. Thus procedure lines of code was adopted as the measure for program size. Program quality is the hardest aspect to define in operational terms. In this study a program was deemed to be complete and of acceptable quality when it met the in-house programming standards and correctly processed the program test data. In summary, productivity is defined in this study as PL/T where PL is the number of procedural lines of code and T is the total time (in man-hours) put into the programming job by the programmer from the receipt of program specifications to the completion of program testing. In viewing the productivity analyses in this
259
Programming Productivity paper is should be borne in mind that biases among organizations are present because not all of the organizations in the sample have the same programming standards. PRINCIPAL RESEARCH HYPOTHESES
The data collected and the analyses performed were designed to enable a number of hypotheses and questions of interest to be investigated. Some of the principal hypotheses are listed in this section, grouped under the headings of programming environment, programmer experience, and programming methodology. Programming Environment Hl . Better testing facilities improve productivity.
Reflecting the Sackman et al. [ 191 experiment which found on-line program development more efficient, most large computer-using organizations have invested heavily to provide their programmers with on-line access. The justification for this investment would appear to be improved productivity. This hypothesis is investigated by collecting data on testing interface (batch, online, and on-line editing with batch compilation and execution) and average testing turnaround (one, two, and three or more per day) for each program. H2a. Programming productivity is highly industry dependent. H2b. Programming productivity is highly organization dependent. We hypothesized that organizations such as banks, insurance companies, and software houses, where edp (electronic data processing) is the lifeblood of the business, would have more highly professional edp management and thus more productive programmers than organizations engaged in manufacturing and mining, for example, where edp is in general less critical. Another factor tending to suggest significant productivity differences among organizations is the much discussed order of magnitude variability of individual programmer productivity observed by Sackman et al. [ 191. If individual productivity varies significantly then it seems reasonable to hypothesize a large variation for organizations. Programmer Experience H3. Programming productivity increases with experi-
ence. The typical pattern in industry is to gear programmers’ salary packages to their years of experience, with the salary rising quite steeply in the first few years before
flattening out. This hypothesis is formulated to test the assumption underlying this policy, and if proved correct to estimate the size of the effect. H4. Productivity is higher for simpler types of programs. In commercial systems, report-type programs are thought to be simpler than edit programs, with update programs generally held to be the most complex. This hypothesis will explore this ranking by examining productivity. Programming Methodology H5. The use of structured
programming and walk throughs increases the proportion of time spent on program design and coding while reducing the proportion of time devoted to testing. H6. Use of structured programming, top-down design, and walkthroughs either singly or together increases programming productivity.
Improved programming techniques including structured programming, top-down design, and program walkthroughs are held to result in higher productivity by reducing program testing. Yourdan [25] cites claims of a fivefold increase in productivity. These two possibilities are contained in hypotheses H5 and H6.
RESEARCH METHOD
Forty medium-to-large, mature edp-using organizations were approached to participate in this study of programming productivity. A number of organizations were unable to participate because they were in a stage of no development due to major equipment upgrading or to a heavy maintenance schedule. Some organizations that originally agreed to supply data were later unable to because of delays in the commencement of new program development work. In the end 23 organizations were able to supply data, which were collected over a 6-month period. In 20 organizations the data were collected for individual programs developed during the course of the data collection period, while for three organizations data were also extracted from records maintained by the edp department and which related to recently developed programs. A breakdown of the industry class of the 23 participating organizations is given in Table 1, together with the number of programs contributed. The types of data collected are listed in Tables 2 and 3. The counts refer to the total number of programs having the specified attribute. Collected data for the programmer experience in years (item 1) was reduced
260
M. J. Lawrence
Table 1
Table 2 Organizations
Semigovernmental or public authority Banks and insurance Manufacturing and mining Software house
Programs contributed
4 6 11
42 75 145 16
2 23
Total
278
Count 1. Programmer experience 1 year 2 year more than 3 years 2. Program language COBOL ASSEMBLER
67 104 107 216 30
1
RPG PL/I BASIC
to the three groups shown: up to 1 year, l-2 years, and 3 years or more. This step was taken to sharply focus attention on the trainee, intermediate, and experienced programmers. Intended program use (item 4) indicates the subsequent use to which the program will be put. This has been claimed to affect the quality, clarity, and completeness of source code and documentation and thus productivity (see, e.g., Brooks [5]). It was hypothesized that program type (item 3), future system environment (item 5) and data base interface (item 6), all affect the complexity of the system and should therefore have a bearing on productivity. Rigorous definitions of the programming techniques (item 7) were not given. Rather the concepts were discussed with the programmers when the project was initiated and no significant difference of opinion was detected as to the meaning of these terms. For the purposes of this study the following definitions were used: Top-down design identifies major functions to be accomplished, then proceeds from there to identify the next level functions required to implement the major functions and so on. This leads to a tree structure of functions. Bottom-up
design is the mirror image of the top-down approach. That is, the lowest level functions are defined first and the upper-level integrating functions developed later.
Structured
programming
(SP) is the discipline of writing a program with a limited set of constructs. For example, it might be specified that each module have one entry and one exit or that the program use only basic control structures, indentation and formatting of code, or hierarchical structure. Davis et al. [9] was
used
as the
reference
for
structured
COBOL.
other 3. Program type utility data base interface edit update processing file extract report other 4. Intended program use once ad hoc as part of operational system 5. Systqm environment batch on-line 6. Data base interface yes ELnk 7. Programming technique used top-down design structured programming walkthrough bottom-up design blank 8. Reason given for divergence of budget to actual rime poor estimate hardware problems system software problems system analysis deficiencies staff turnover on this project more modules than expected fewer modules than expected other blank 9. Testing interface on-line batch CRJE or equivalent blank 10. Average testing turnaround one shot per day two shots per day three or more shots per day blank
13 10 8 3 59 110 3 42 22 30 9 3 15 260 265 13 52 221 5 178 167 83 1 1
27 3 15 5 2 26 2 32 198 41 203 19 15 149 56 58 15
A Walkthrough
is a session in which the programmer invites one or more peers to “walk through” a program together by reading its code with the objective of detecting errors.
The programmer was given an opportunity in item 8 to comment on any reasons why productivity was affected. Items 9 and 10 summarize the testing environment.
It can be seen from the counts that the majority of programs were written in COBOL as part of an operational system, used batch testing having one test shot per day, and did not interface with a data base management system.
261
Programming Productivity Table 3. Quantitative Data Variable I. Estimated man-hours 2. Actual man-hours preparation and design (%) coding (%) debug to clean compile (%) test (%) other, including document (%) total time T (hr) 3. Total number of Lines of code (L) 4. Number of procedure lines of code (PL)
Average
15 35 12 28 10 99 1135 540
Range
1-52 6-8 1 l-50 3-72 I-68 7-760 30-9445 O-6000
In addition to the descriptive data listed in Table 2, the quantitative data shown in Table 3 were collected. The programmers were requested to note the program development time estimate made immediately before the start of the job. However, subsequent meetings with respondents have cast a measure of doubt on the reliability of these data, as it appears that some programmers may have recorded estimates made after starting coding. The development time was requested to be recorded under the five headings in item 2 of Table 3. Preparation and design includes all work done by a programmer after receiving the program specification and before commencing coding. Typically it includes a reading of the specification, meetings with the analyst, task familiarization, and program design. Coding is the work of translating the specification into code. The testing time is broken into two sections to differentiate syntax error removal from logic error removal. In Table 3 each category of time has been calculated as a percentage of the total program development time and averaged for each program. The number of lines of code, not including blank lines or comments, is reported in item 3. Item 4 is the number of lines of procedure-type code, equivalent in COBOL to the procedure division lines of code. Blank lines and comments are excluded. The assembly programs in general have had their total length reported, with zero procedure length reported. This accounts for the zero shown as the minimum value in the range. Data collection proceeded for about 4-5 months before preliminary analyses were carried out, and the data were reviewed with the respondent organizations. In this review a number of problems with misunderstanding data definitions were uncovered. The most important was the time that should be properly attributed to the writing of a program. For example, some organizations were excluding the time taken to attend a program-related meeting and some were excluding test data development time. Following this review some organizations undertook a review of their data in light of the clearer definitions. It is the revised data that are reported in this paper.
A validation of the claim that structured programming was used was carried out by sampling some 90 programs. By visual examination of the source code listing, the programmer’s description of the programs as structured or not structured was verified as correct, except for one case where a programmer described his ten structured programs as nonstructured because GOT0 statements were used occasionally. Thus some misattribution of programming methodology occurred, but it appeared only in a minority of cases. It was evident, however, when reviewing the development practices of a sample of programmers that no perceptible differences existed among those claiming to use top-down design and those not making this claim. Thus the topdown technique is not analyzed. It appeared from discussions with respondents that some selection had taken place in data collection. A few organizations did not report programs in which major difficulties had been experienced; the programmers reasoned these were freaks and unrepresentative of their usual productivity. Thus we can expect some bias in the direction of higher productivity. PRODUCTIVITY
ANALYSIS
In this section we report the results of the analyses carried out on the data. These analyses have been designed to test our principal research hypotheses, explore the statistical characteristics productivity data, and
of the observed
identify the factors which chiefly contribute to the observed variation in productivity. Since the productivity metric chosen excludes a calculation for assembly language programs, the sample for productivity analysis is reduced to 248 programs from 22 organizations. This results in a situation where the 216 COBOL programs make up 90% of the analyzed data. To remove language as a source of variation, a number of the analyses are done using only the COBOL programs. Organization:
Hypotheses
H2a and H2b
Table 4 displays the average productivity and standard deviation for each organization and the average for each industry group. It can be seen that the average productivity for the three industry groups-semigovernment, banks and insurance, and manufacturing and mining- is almost identical at PL/T = 7.7 or 7.8 procedure lines of code per hour, with the average for the two software houses slightly greater than 8.5. However, this small difference is not significant and could have
262
M. J. Lawrence
Table 4. Productivity by Organization Organjzation code
Type of organization Semigovernmental
11 14 15 22
Banking and insurance
I 2 7 9 20
Number of programs
Average PL/T
S.D.
29 3 4 6
8.5 8.4 5.8 5.7
2.4 1.4 0.9 3.1
10 13 I 7
5.6 8.7 8.6 8.4 5.5
39
7.7
6 6 21 4 3 36 6 7 20 II
5.9 3.6 9.0 11.9 3.9 6.5 6.2 12.3 6.2 6.6 10.5
2.3 1.4 4.2 4.4 1.1 3.0 2.1 4.1 4.7 5.0 3.6
7.8 9.7
2.2
7.3
1.6
-Z
manufacturing mining
Software
house
and
3 4 6 8 10 13 16 17 18 21 23
7.8
2
5
31 IS1 8
12
8 --G 248
Overall
2.1 4.6 4.4 2.1 5.4
8.5 8.0
3.74
arisen by chance. Although hypothesis 2a is rejected, it is interesting to observe the far greater scatter among the manufacturing and mining industry data than in the banks and insurance sector or the software houses. A majority of the individual organization-to~rganization productivity differences are not, statistically speaking, very great. Seven organizations out of the 22 in Table 4 have productivity values significantly different from the population mean and could not reasonably be expected to have arisen through sampling error. These are organizations 4, 8, 10, 13, 17, 18, and 23, all from the manufacturing and mining sector. The rest of the organizations can be regarded as having statistically the same productivity. While there is limited support for hypothesis H2b, the extreme variability of productivity observed by Sackman et al. [ 191 is not present at the organization level.
Frequency
maximum 19.5, and the standard deviation 3.74. Figure 2 for L/T shows the absurd situation one can encounter with no control on reused code in the productivity measure. The maximum value of 124 lines per hour would probably be beyond physical capability to write or type let along check and test. This figure can serve as a warning against careless use of productivity data. Allocation of Time and Effect of Programming Technique:
Hypotheses
H5 and H6
Programmers were requested to allocate their time for the following activities: preparation and design, coding, debugging to clear compile, testing, and other time, including documentation. A total of 2 13 programs had time allocation recorded (155 COBOL programs). A few organizations were unwilling to record time breakdowns because it did not mesh with their own internal systems. The programmer was also asked to tick one or more boxes on each program’s data sheet indicating the technique that were used for that program. Table 5 summarizes the programming techniques indicated. Almost all programmers indicated they were using at least one of the techniques, and many indicated that a number were being used. The time spent on each activity has been divided by the total programming time for that program and converted to a percentage. These have been averaged for all programs, for all programs using structured programming (SP), for ail programs where SP was not used, and for all programs where program walkthrough was used. Table 6 shows the allocation of programming time for COBOL programs broken out by programming technique; it leads to the following observations: The SP group used a slightly greater fraction of time in coding but a smaller fraction in design. Overall there is an amazing similarity in the percentage allocation of time for the groups, particularly in the total debugging time.
Table 5. Programming Techniques Used Number of programs
Distribution
The pr~uctivity frequency plots shown in Figures 1 and 2 relate to the COBOL programs in the sample. Figure 1 shows PL/T and, to illustrate how misleading a figure L/T (total lines of code per hour) could be, Figure 2 shows L/ T. PL/ T has a distribution with a mean value of 8.0 and an elongated right-hand tail as one would expect. The minimum pr~uctivity is 0.9, the
Top-down only Top-down and structured programming Top-down structured programming, and watkthrough Structured programming only Walkthrough only Top-down and walkthrough None
Percent
104 31
37 I1
40 58 2 38
15 21
3 278
L 100
1 14
Programming
Productivity
263
Table 6. Allocation of Programming
Time-COBOL Programs
All programs
Preparation and design Coding Total debugging compiling testing Other PL/T
Using SP
Using walkthrough
Mean
S.D.
Cases
Mean
SD.
Cases
Mean
S.D.
Cases
Mean
S.D.
155
16% 37% 40% 12% 28% 7% 8.1
11.2 13.9 14.1 1.9 13.1 7.2 3.1
129
15% 38% 40% 11% 29% 7% 1.9
11 14 14 8 13 6 3.8
26
21% 32% 39% 11% 28% 8% 8.6
13 9 14 7 I1 7 3.6
II
15% 38% 38% 13% 27% 7% 7.4
11 13 13 8 11 4 3.9
The differences in productivity of the groups is statistically insignificant. However, the averages show the programmers using walkthroughs a little lower than the non-SP group. Walkthroughs
Not using SP
Cases
do not impact the distribution
The use of SP does not influence time.
of time.
the distribution
of
Because the time spent debugging is virtually the same for all groups, it appears that hypothesis H5, re-
lating to the influence of SP and walkthroughs on time allocated, cannot be supported. As an explanation for this counterintuitive result the possibility of significant error in the SP claim can be eliminated: it has already been mentioned that this aspect was verified by sampling source code listings. However, the claim that
Figure 1. Procedure lines of code per hour--coeoL grams.
pro-
28
24
20
16
8
0
2
4
b
8
10
12 PL/T
14
16
18
20
264
M. J. Lawrence
50
40
30
3 j.j &
20
si 10
n
nnn
8
60
100
80
120
L/T
Figure 2.
Lines of code per hour-coaoor. programs.
walkthroughs were used was not verified, and because the discussion of walkthroughs with the programmers did not include mention of their detail or extent, they may have been on occasion a very cursory exercise. Indeed it would appear from the data that they must have been. Hypothesis H6, which states that top-down design, structured programming, and walkthroughs improve pr~uctivity, can be tested by observing the average productivity for the various cases given in Table 6. The insignificant differences in productivity among the 3 groups cause the hypothesis to be rejected. No statistical test is warranted because in fact the average productivity is slightly higher for the group using COBOL but not SP. It is possible that systematic errors in the productivity measure are responsible for the lack of support for H6, but it is difficult to see how H6 could
Table 7. Allocation of Programming Time-All
The proportion of time devoted to getting a clear compile, 12%, was much higher than programming managers expected. Not a great deal of time is spent before coding commences, probably owing to the simple nature of most of the programs and to similar programs having been previously written. Total debugging time, that is, time fixing mistakes made in coding, is the major activity in programming and appears not to be significantly affected by programming technique.
Programs
All programs
Preparation and design Coding Total debugging compiling testing Other PI/T (hr)
be accepted (for the programming through unit testing interval of the systems life cycle) in light of HSs being rejected, because the productivity increase of the techniques must come through reduced testing effort. Table 7 gives the same data but for all programs. It presents a similar picture to that observed in Table 6 for COBOL programs. Some further observations can be drawn on the overall time averages:
Using SP
Not using SP
Cases
Mean
SD.
Cases
Mean
SD.
Cases
Mean
S.D.
213
15% 35% 39% 11% 28% 11% 8.0
10 14 15 8 13 11 3.7
159
15% 37% 40% 11% 29% 8% 7.8
10 14 15 9 14 6 3.8
54
18% 28% 35% 10% 25% 19% 8.5
10 10 14 7 II 16 3.7
Programming
Productivity
Table 8. Productivity
265
by Language
Time Estimation
COBOL PL/ I BASIC
Productivity
Cases
Average PI,/ T
S.D.
216 13 10
8.1 6.4 1.3
3.1 2.3 4.1
A total of 190 programs contained an estimated time for the job. Unfortunately no record was kept on whether the estimate was made by the programmer, the supervisor, or both together. The analysis consisted of a correlation study between estimated time and actual time. A correlation of 0.92 exists between these two values with the regression line having an equation of the form:
by Language
estimated time = 0.86 X (actual time) + 2.6.
Unfortunately there is insufficient data to present meaningful results on productivity for any language apart from COBOL. For the sake of completeness, however, results are given in Table 8 for COBOL, PL/I, and
This indicates a systematic low bias for estimated time of about 10%. Thus when the actual time is 100 hr the estimate would be 88.6. This result agrees with the popular notion of the optimism of programmers. The correlation between actual time and PL was 0.85. Thus the program estimators appear to have done slightly better than had they used a simple formula based on the number of procedure line of code (assuming an accurate estimate of PL). On these figures it would appear that program development time is being estimated rather accurately. However it needs to be mentioned that although it was requested that the time estimate be recorded before coding commenced, no control was possible over this, and it is possible that some later, updated estimates were substituted.
BASIC.
Productivity by Experience Hypothesis H3
and Program Type:
Table 9 gives the average productivity by program type and experience category. The breakdown by program type is to enable comparisons of experience to be made roughly within the same program complexity category. Only edit programs have sufficient depth in each experience class to give a good basis for comparison. Trainee programmers (O-l year) show a slightly lower productivity than programmers with 2-3 years experience, but no increase is observable between the intermediate and the experienced group. An equivalent result is obtained when all programs are considered together. This supports results obtained by Jeffery and Lawrence [ 151 and agrees with a number of previous studies [ 161. It thus gives limited support for hypothesis H3 but suggests a peaking of productivity after about 1 year’s experience.
Table 9. Average Productivity
Regression Analysis of Productivity:
Hypothesis
H1
In order to avoid having an overdetermined matrix (for a discussion of this problem see Nie et al. [ 1S]), a base case eliminating complementary variables was selected before the regression analysis was run. The following characteristics were chosen as representative of the majority of programs:
by Experience and Program Type Experience
Program
Trainee (0-l yr)
type
Utility Data base interface Edit Update Processing File extract Report Other All programs
Intermediate (2-3 yr)
Cases
Mean
1
6.4
6 29 1 6 5 7 1 56
5.4 7.1 4.2 4.8 9.8 8.7 1Il.2 7.1
Cases
23 35 1
Exoerienced (more than 3 yr)
Mean
1
1.6 8.3 19.3 7.9
11 9 1 87
11.2 6.5 8.7
10.0
Cases
Mean
I
8.4
28 21 1 17 4 14 6 98
1.7 8.4 11.1 1.4 8.4 8.2 9.1 8.1
M. J. Lawrence
266 programmer
experience
greater
program and programmer variables affected productivity. The results of these three regressions are given in Table 10, in which the independent variables are listed in order of decreasing explained variance. These results show that the inclusion of both organization-identity variables and program and programmer variables provides a 50% increase in R2 (the fraction of variance explained by the regression equation) compared with either set of variables taken by themselves. When considered independently, the 38 program and programmer variables do a better job, albeit a poor one, at explaining productivity than the 22 organization-identity variables, but the difference is not very large. On the basis of these results there are organization-environment variables that, if identified and measured, can lead to an appreciably better regression equation for productivity. The variables and coefficients in the regression equation are interesting. Good testing turnaround has a beneficial impact on productivity, while lack of any indication for the testing interface variable was a 2.5 PL/ hr liability. (Maybe the programmers who can’t fill in a form correctly are equally poor at programming!) Online testing shows up as a liability that runs counter to hypothesis Hl. Having an interface to a data base system reduces productivity. The presence of an update
than 3 years,
coBor_ language, edit program, regular
use,
batch system, no data base interface, no reason given for variance tual time,
between estimated
and ac-
batch testing, 1-day turnaround, organization
and
23.
Three separate regression analyses, using the stepwise procedure, were performed on productivity as the dependent variable and using the following in turn as independent variables: 1. all program-related tity, 2. all program-related 3. organization-identity
variables
and organization
iden-
variables, and variables.
The organization-identity variables were included to investigate whether any factors unique to the organizations’ environment and not contained in the set of
Table 10. Regression Analysis Results: Productivity Variables
included
in analysis
R*
Variables
in regression
equation
Coefficient
Significance
Regression 1: 38 programmer and program variables and 22 organization-identity variables
36%
3 or more shots per day PL “blank” for testing interface on-line testing organization 8 organization 4 data base interface organization 17 2-3 years experience organization 5 organization 6 report program constant term
2.4 0.002 -2.5 -2.5 5.6 -4.5 -1.3 3.6 1.1 2.4 1.5 1.1 6.3
,000 ,000 ,005 ,001 ,001 ,001 ,012 ,003 ,013 ,014 ,040 ,051
Regression 2: 38 programmer and program variables
25%
3 or more shots per day PL “blank” for testing interface on-line testing data base interface update program processing program constant term
3.1 0.002 -2.8 -2.9 -1.3 +3.6 - 1.3 7.1
,000 ,000 ,002 ,000 .014 ,058 ,066
Regression 3: 22 organizationidentity variables
17%
organizations
1.8 -6.8 -3.9 -4.9 -6.5 1.4 -4.5
.002 ,004 .006 ,012 ,023 ,055 .068
17 4 13 1 10 8 3
Programming
267
Productivity
Table 11. Regression Results: Programming Time Coefficient
Variable PL Testing interface blank Utility program Bottom-up programming 3 or more test shots per day Constant
0.102 62 23 116 -18 15
program as a positive influence can be traced to a single update program with a very high productivity. The positive impact of PL on productivity is odd. A number of the very long programs had above-average productivity but the reason for this effect is not known, although this finding is supported by a number of other studies [e.g., 2, 161. Note that 2-3 years experience has a small positive coefficient relative to the base case when all variable are included but does not appear in the regression equation for program-related variables alone. This is as we would expect given the previous discussion on the role of experience. In line with the conclusions reached in our discussion of hypotheses H5 and H6, none of the programming-technique variables entered the regression.
Programming
Time
The metric that has been chosen for productivity suggests that the variable most highly correlated with programming time should be PL. This is indeed the case, the correlation between PL and T being 0.85. Performing a stepwise regression analysis with programming time as dependent variable and programmer and program variables as independent variables gives the results shown in Table 11. The base case is the same as in the previous section. Each of the variables in Table 11 is significant at better than the 0.008 level.
DISCUSSION Analysis of this productivity data reveals very little or no influence on productivity due to experience beyond 1 year, on-line computer access, the use of structured programming, or the use of walkthroughs. This is consistent with the findings of a number of other studies that sought to demonstrate empirically the benefits of programming methodology [21]. However, it needs to be remembered that in this study we are examining productivity benefits only. Other benefits, such as fewer latent bugs or reduced maintenance costs, may be realized through the use of a disciplined programming methodology and with greater programmer experience. These results at first may appear to be in conflict
with those of Basili and Reiter [4], who in a carefully designed and executed experiment showed that the use of a disciplined programming methodology (including SP, walkthroughs, and team management) favorably influenced a large number of program development and final product metrics. The metrics they used include computer job steps, program changes, statements, IF statements, decisions, tokens, and data variable scope counts. However none of these metrics directly measure pre ductivity or can be taken as its surrogate. The closest are computer job steps and program changes, both of which are related to the number of errors found during computer testing. These two metrics may be reduced through the use of a disciplined programming methodology if either the programmer who is using the disciplined methodology makes fewer errors in initially writing the code, errors are detected by manual walkthroughs, or both. Because the time taken to either write the code or carry out the walkthrough is not reflected in these metrics, they are unable by themselves to contribute to an understanding of productivity or (using the breakdown in this article) to the percentage of time allocated to programming. Thus, on reflection, the findings reported in this article do not conflict with the Basili and Reiter results. Bailey and Basili [l] found higher productivity over the whole system life cycle to be associated with the use of disciplined programming methodology. These findings together with those reported in this paper, suggest that the chief benefits of disciplined programming methodology are experienced in the early design and test stages. The new programming technologies are based on a view of programming as a creative problem-solving exercise in which a programmer constructs, from scratch as it were, a program. In this view structured programming provides a methodology for designing and constructing a program using a limited number of basic building blocks. Perhaps in reality much programming, particularly commercial work, is a mundane, repetitive task in which experienced programmers have seen each type of program they are called upon to write many times. Given a specification, they select a standard template from a mental library and go to work on the changes necessary to tailor it to the problem at hand. It may make little difference to the experienced programmer whether the template is structured or unstructured, and the finding of a constant proportion of debugging time suggests that errors are made at the same rate in making these changes, regardless of methodology. This “minimal change” model would also explain the very similar productivity rates observed in twothirds of the organizations, and it suggests why dra-
M. J. Lawrence
268 matic productivity differences among individual programmers (Shiel [21] suggests factors of 5-100) were not observed. These large variations in productivity represent comparisons made among programmers undertaking complex tasks at different stages of their learndevelop ing curve. Perhaps once programmers programs of a similar nature to those previously written the productivity di~erentials tend to vanish. It is likely that in commercial work, with its limited range of program types, learning is so phased to avoid large, disruptive jumps in productivity and thus further reduce variation.
taining significantly higher productivity than others, although no industry pattern is evident. When regression analyses were run on productivity the presence of the organization-identity variables had a marked impact on the explanatory power of the result, increasing R2 from 25% to 36%. This suggests that identification and measurement of the proper organization variables, in conjunction with program and testing type variables, could contribute to a better understanding of programming productivity variations. It is felt that these variables should reflect factors such as morale, group norms, personality, management style, level of supervision, training, and employee selection criteria.
CONCLUSIONS ACKNOWLEDGMENT
An acceptable though perhaps crude definition of programming productivity is the number of procedural lines of code written per hour measured from when a programmer receives the program specification to the completion of program testing. The research reported here, drawing on a wide sample of commercial programming work by professional programmers, casts doubts on a number of popularly accepted notions in the edp field. It finds no productivity increase sulliciently large to be detected with the measure adopted from programming experience beyond the first year; on-line testing facilities, which in fact are associated with lower productivity; structured programming; or walkthroughs. An increase in productivity is associated with better batch turnaround (three or more test shots per day) and a decrease in pr~uctivity with on-line testing and an interface to a data base. A great variety of programming techniques were reported, with almost half the programs having more than one technique used in their development. Concerning the distribution of effort during programming and productivity, there was no noticeable difference among the groups using structured programming, using walkthroughs, and not using structured programming If anything, those not using structured programming took more time in the preparation and planning stage and less time coding yet achieved higher productivity. The programmers were given an oppo~unity to attribute unusual productivity patterns to a number of causes (hardware difficulties, specification modifications, etc.) None of these causes were found to have a significant impact on productivity. The results indicate that some organizations are ob-
The author would like to acknowledge the assistance of 0. R. Jeffery in collecting the data that form the basis of this study.
REFERENCES 1. J. W. Bailey, and V. R. Basili, A Meta-Model for Software Resource Expenditures, Proc. 5th Int. Conj on Software Eng. 198 1. 2. V. R. Basili, and K. Freburger, Programming Measurement and Estimation in the Software Engineering Laboratory, J. Syst. Software 2,47-57 (198 1). 3. V. R. Basili, and Hutchens, D. H., Analyzing a Syntactic Family of Complexity Metrics. Tech. Rep. TR-10.53, Computer Science Technical Report Series, University of Maryland, May 198 1. 4. V. R. Basili, and R. W. Reiter, A Controtled Experiment Quantitatively Comparing Software Development Approaches, iEEE Trans. Software Eng. SE-7, (3) (May 1981). 5. F. P. Brooks, Jr., The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley, Reading, MA 1975. of Computer 6. E. Chrysler, Some Basic Determinants Programming Productivity, Commu~r. ACM 2 16 (1978). I. E. Chrysler, The Impact of Program and Programmer Characteristics on Program Size, A.F.Z.P.S. 47, 581587 (1978). 8. B. Curtis, S. B. Sheppard, and P. Milliman, Third Time Charm: Stronger Prediction of Programmer Performance by Software Complexity Metrics, Proc. 4th Znt. Conf. on Software Eng., Munich, September 1979. 9. G. B. Davis, M. H. Olson, and C. R. Litecky, Elementary Structured COBOL: A Step 6y Step Approach, McGraw-Hill, New York, 1977. 10. J. L. Elshoff, Measuring Commercial PL/f Programs Using Halstead’s Criteria, ACM SZGPLAN Not., May 1976. 11. A. B. Fitzsimmons, and L. T. Love, A Review and Evaluation of Software Science, ACM Comput. Sur. 10, 318 (1978).
Programming
269
Productivity
12. Karl Freburger, and V. R. Basili, The Software Engineering Laboratory: Relationship Equations, Tech. Rep. TR-93.5. Computer Science Technical Report Series, University of Maryland, May 1979. 13. T. D. Grossman, Programmer Productivity Measurement, Systems/Stelsel. March 1978. 14. M. H. Halstead, Elements ofSoftware Science, Elsevier, New York, 1977. 15. D. R. Jeffery, and M. J. Lawrence, An Inter-organizational Comparison of Programming Productivity, Proc. 4th Int. Conf: on Software Eng., Munich, September 1979. 16. D. R. Jeffery, and M. J. Lawrence, Some Issues in the Measurement and Control of Programming Productivity, In$ Manage. 4 (1981). 17. T. C. Jones, Measuring Programming Quality and Productivity, IBM Syst. .I. 17( 1) (1978). 18. N. H. Nie, C. H. Hull, J. Cl. Jenkins, K. Steinbrenner, and D. H. Bent, Statistical Package for the Social Sciences, McGraw-Hill, New York, 1975, p. 374. 19. H. Sackman, W. J. Erikson, and E. E. Grant, Exploratory Experimental Studies Comparing On-line and
Performance, Commun. ACM, 11 (1) (January 1968). R. F. Scott, and D. B. Simmons. Predicting Programming Group Productivity: A Communications Model, IEEE Trans. Software Eng. SE-l (4) (1975). B. A. Sheil, The Psychological Study of Programming, ACM Comput. Surv. 13, 101-120 (1981). B. Sneiderman, Exploratory Experiments in Programmer Behavior, Int. .I. Comput. If. Sci. 5 (2) (June 1976). C. E. Walston, and C. P. Felix, A Method of Programming Management and Estimation, IBM Syst. J. 16, 54-73 (1977). W. J. C. Wooton, Effective Control and ManagementThe Key to Higher Programming Productivity, Data Fair 73, Nottingham, April 1973. E. Yourdan, Programmer Attitude and Reactions Towards Programming Productivity Techniques, Proc. Annual Computer Personnel Research Conf: 1975.
Off-line Programming 20.
21. 22.
23.
24.
25.
Received 16 July 1980; revised 16 December 1981; accepted 2 February 1982