Human Factors in Software Engineering: A Review of the Literature K. Ronald Laughery, Jr. and Kenneth R. Laughery, Sr. Mkm Analysis and Design, Boulder, CO
A critical factor in the increased utilization of computer technology is the availability of software. Techniques must be developed to reduce the effort required to develop and maintain software and the availability of software developers must be increased. Both of these approaches could benefit from significant input from the human factors engineering community. Where should our efforts be focused? This paper will: ( 1) provide a framework for studying human factors in software engineering; (2) summarize the literature on human factors in software engineering; and (3) identify some research areas that should be addressed.
1. INTRODUCTION Five years ago, when we began reviewing the literature on human computer interaction, research directed towards the human factors in software engineering was sparse. A strong body of literature was developing on the human-computer interface, yet almost no attention had been given to improving the software development process from a human factors perspective. Reasons for this are unclear. Perhaps it was because few had anticipated the shortage of software developers, or because not enough human factors specialists were su~ciently qualified computer programmers to appreciate the problems. In the past few years the situation has changed markedly. Concern over the shortage of software developers and interest by the human factors profession has sparked a significant amount of quality research. The areas where human factors may prove useful to software engineering are being defined and addressed. The opportunities for improvement of the software development process are immense, and so is the need. It has been estimated that the demand for computer
Address correspondence to Dr. K. Ronald Laughery. Micro Analysis and Design, 9132 Thunderhead Dr., Boulder, CO 80302. The Journal of Systems and Software 5,3-14 Elsevier Science Publishing Co.. Inc.. I985
programmers currently outstrips supply by at least 50,000, and that gap is likely to increase [ 10,621. Without major changes in software development technology, the need for programmers could reach 1.5 million by 1990, more than triple the number working today. Clearly, major changes in software development technology are required, and the improvement of human factors in software engineering can play a strong role in effecting these changes. This opportunity is beginning to be considered by the software development community as is evidenced by Curtis [ 161. What are the Tasks ofthe Software Development Indiv~du~l or Team? In order to have a meaningful discussion on human factors in software engineering, we should first define the phrase “software engineering.” Software design and development involves many phases, all of which involve different cognitive tasks for the human. We should, therefore, have a context within which to discuss the various cognitive processes. Weinberg [65] identifies the following six phases: problem definition, analysis, flow diagramming, coding, testing, and documentation. These phases, Weinberg proposes, need not be performed in the order presented here. Brooks [S], focusing only on code generation, proposes only three phases: understanding, method-ending, and coding. Shneiderman [48] proposes a slightly different set of phases including the following: 1. 2. 3. 4. 5. 6. 7. 8.
Learning the programming language to be used Program Design Composition of program Comprehension of an existing program Program Testing Debugging Documentation ~odi~cation
Weinberg seems to place greater emphasis on differentiating the early stages of problem definition, analysis, and flow diagramming (although flow diagramming
(1985)
3 0164-12l2/85/$3.30
4
K. R. Laughery, Jr. and K. R. Laughery, Sr.
may be more generally referred to as program structure development). Shneiderman combines these three categories into one category of program design. Weinberg does not, however, seem to look beyond initial software development by the programmer experienced in the language being used. He does not consider learning the language. In today’s world of highly specialized languages, even a programmer with considerable experience may be required to learn the specific language to be used. Additionally, it has been estimated that up to 75% of programming work involves program modification. Therefore, this phase of software engineering should clearly be included in any categorization of tasks that may be performed by the software engineer. It is our belief that initial program design is perhaps the most critical phase of software development and, therefore, should be as well defined as possible. Additionally, software engineering should include preliminaries (learning the language), and postmortems (modification). We propose the following set of categories: 1. 2. 3. 4. 5. 6. 7. 8.
Learning the programming language/techniques Problem definition Problem analysis Program structure development Composition/coding Program testing and debugging Documentation Modification
The next section will review the literature on human factors in software engineering. Virtually all areas of the literature will be addressed since they are all relevant. After the literature has been reviewed, a brief summary will be presented of the salient points relevant to modeling the underlying processes of human programming behavior. We pose this taxonomy of software engineering tasks primarily to provide a meaningful structure within which to discuss human factors in software engineering. As is evidenced by Shneiderman [49], this process is undergoing changes. Therefore, this taxonomy should be considered only a reflection of current practices.
the literature to date has not generally followed such an organization. Rather, the literature has fallen more along the lines of disciplines such as cognitive psychology, industrial/organizational psychology, and computer science. To present the current state-of-the-art, a different organization is required to reflect the scientific community’s research thrusts. The following organization will be used for this paper: 1. The psychology of programmer behavior (e.g., the internal processes involved in program development) 2. Individual differences in software development (e.g., the characteristics that make a good programmer) 3. Training computer programmers 4. Language/coding practice effectiveness (e.g., the utility of structured programming, natural language programming) 5. Human errors in computer programming 6. Debugging computer software 7. Estimating programmer performance and software cost 8. Evaluating software quality The first three categories involve research directly aimed at the human: what capabilities does the programmer use while programming, how does the human develop software, and what can we do to improve programmer performance? The second three categories are directed towards the interface between the software developer and the computer: simplification of program design, debugging, modification through improving the computer language, and other interface design issues (e.g., error messages). The final two categories are ultimately aimed at providing the data processing manager with tools for evaluating a program, as well as programmer quality. Also, the literature on software quality assessment is necessary to the researcher who wishes to link any independent variable (e.g., individual attributes, environmental variables) to software development performance. We need good quantitative dependent measures of the performance of software developers. The following subsections will highlight the literature in each of the eight areas and identify findings relevant to the proposed study.
2. WHAT DO WE KNOW-THE LITERATURE ON HUMAN FACTORS IN SOFTWARE ENGINEERING
2.1. The Psychology
The nine phases of software development delineated in the previous section are presented as a high-level description of the tasks involved in software engineering. It is proposed that these phases and their hypothesized relationships should be used to focus future research activities on the specific processes being studied as well as which processes should be studied together. However,
Attention has been give to psychological factors in computer programming for many years. Probably the first discussion in the area of the psychology of computer programming appeared as early as 1968 in Sackman, Erikson, and Grant [44]. One of the first comprehensive endeavors in this area was a book by G. M. Weinberg entitled The Pscyhology of Computer Program-
of Computer Programming
Human Factors in Software Engineering ming [65]. In this and later writings (e.g., [64]), he
recognized that in the past no attention had been given to the human dimension of programming performance. He proposed that this lack of attention was attributable to the nature of computers. Computers were introduced in part to eliminate “personnel problems” by replacing humans with computers. Since human involvement in a system was less critical, why worry about human factors? Weinberg proposed that programmers, by their very nature, preferred to work with machines, not people. This inattention to the psychological dimensions of the software development process probably led to the feeling that programming was an art not easily explained or taught. In Weinberg’s book, many areas relating to the underlying cognitive processes were explored. At the time, there was little research to support many of the arguments. However, the book did provide valuable insights into the human variables that might affect programming performance and how these variables could be studied in the future. As verification of this statement, almost every article addressing the psychology of computer programming published since 197 1 references Weinberg’s book. Most of the subsequent studies of human factors in software development have focused on the effects of specific coding and/or language characteristics on observed human performance such as program readability (these studies will be discussed in Section 2.4). Few, however, have addressed the underlying theoretical problem-solving mechanisms involved in programming behavior. It is easier to manipulate an independent variable, such as the use of program comments, and observe some behavior, such as program recall, than to directly assess specific cognitive factors contributing to programming performance. Obviously, both types of work have merit. By examining the desirability of specific programming techniques we can immediately improve the programming process. Also, by noting factors influencing performance, we can gain insights into what cognitive mechanisms might explain the differences. Alternately, developing and validating models of the cognitive processes involved in programming will allow us to make inferences about how the programming process may be improved. The symbiotic nature of these approaches is necessary for rapid development of our understanding of the human factors in software engineering. Recently, Card, Moran, and Newell [ 111 have made substantial advances in bringing together various cognitive theories into models of human-computer interaction. However, they did not focus on the programming task. More work of this nature is warranted in the area of software development. In the classic book by Newell and Simon, Mornay
5 Problem Solving [37], a theory of how humans solve problems is presented. Some of the salient characteristics of their theory as applied to computer programming are the following:
Different individuals will use different techniques for solving the same class of problems. Therefore, to study specific problem soIving behavior (e.g., computer programming) we should model the specific behavior of individuals rather than aggregates in order to accurately define the underlying cognitive processes. Problem-solving behavior is controlled by a “production system” which consists of a set of pairs of conditions and actions to be performed when the conditions are met. Humans will select the actions which maximize goal attainment. Goal attainment is determined by the values of a set of goal state variables. Performing the actions will cause some change in the state of the data (including goal state variables) that define the conditions. Therefore, as actions occur, different conditions may result, leading to the selection of different actions. This theory is highly detailed, including: the definition of primitive units of data considered in evaluating the conditions, the various media for human information storage (e.g., LTM, STM), and what types of data are stored in the various media. The theory has significantly influenced the way the human as a problem solver is perceived. Clearly, software development is a subset of human problem solving behavior. Brooks [8,9] applied the Newell and Simon theory to modeling the coding phase of computer programming. By collecting 23 verbal protocols of a single subject who was writing code for a specific set of computer problems, Brooks was able to define the conditions leading to each of 73 identified actions which were performed by the programmer in 4 of the 23 protocols. These 73 rules, were then used to simulate the programmer’s generation of the code. They were able to generate all of the lines of code that the programmer generated, although two lines were inverted. Brooks also examined the number of new rules that had to be generated for each additional protocol to be simulated. He estimated that somewhere between 4 and 10 new rules per protocol were required. Based upon this finding, he estimated that the total number of rules necessary to represent a programmer’s total repertoire of programming knowledge would be on the order of tens of thousands. This would be consistent with the Chase and Simon [ 121 estimate that a chess master (another high-level problem solver) can recognize approximately 3 1,000 primitive piece configura-
K. R. Laughery, tions. According to Brooks, accurately modeling a programmer using this approach would represent a nearly impossible undertaking. However, this may not be the case. It is our expectation that there is a relatively small (e.g., less than 1000) number of tools (i.e., actions) available to the programmer, and the conditions defining when each tool should be applied are consistent. When we develop rules from a verbal protocol, we can expect that the first protocol will generate the most new rules, and each corresponding protocol will yield relatively fewer new rules, until some point at which all rules have been defined. (Brooks did not present the data in sufficient detail to verify this; however, the fact that he obtained 73 rules from the first 4 protocols and 10 rules/protocol, thereafter, seems to support this assumption.) However, the feasibility of delineating programmer activities is speculation. It does seem that research on the number and types of rules used by programmers is a potentially fruitful area of further research that could result in an improved understanding of programming behavior and better methods of teaching these rules to novice programmers. Currently J. R. Anderson of Carnegie-Mellon University is applying the production system human information processing concept to modeling computer programmers. The thrust is to apply a computer simulation model, named ACT, to modeling the programming process. This work has just begun and no conclusive results are yet available. Shneiderman and Mayer [ 561 also developed a nominal model of programmer behavior. The salient characteristic of their model is the differentiation between semantic and syntactic programmer knowledge and behavior. By their definition, “semantic knowledge consists of general programming concepts that are independent of specific programming languages,” and syntactic knowledge involves details concerning condition assignment statements, valid character sets, or the names of library functions.” They claim that there are many different levels of semantic knowledge concepts from the high levels of “binary searching” or “recursion by stack manipulation” to lower levels such as “interchanging the contents of two registers” or “summing an array.” Perhaps the best evidence for this distinction is the relative difficulty of learning the first programming language compared to learning subsequent similar languages. Learning the first language involves a large component of semantic knowledge development coupled with learning the language syntax. The learning of subsequent similar languages, however, is primarily syntactic knowledge development and, therefore, requires less total effort. Programmers find this a very compelling explanation to a universally observed phenomenon.
Jr. and K. R. Laughery,
Sr.
Shneiderman and Mayer further developed the concepts of the programmer model, although not to the extent of Brooks [ 81. The Shneiderman and Mayer model is a more global attempt at modeling programmer behavior with little detail, whereas Brooks models one programmer on one type of problem in great detail. Both approaches offer valuable insights. Weinberg [64] and Weinberg and Schulman [66] clearly identified the criticality of goal specification to a computer programmer. Several studies referenced in these articles provide strong evidence as to the effect of goal clarification. For example, a group told to develop a program “fast” required only 42% of the number of runs to prepare the program as did a group instructed to develop an “efficient” program. Alternately, the programs prepared by the efficient group required only half of the execution time when compared to the program prepared by the “develop it fast” group. One of the most serious implications of these findings concerns the need for clear specification of programming goals in both the study and practice of programming. This notion is consistent with the Newell and Simon theory that indicates that behavior will be affected by goal definition. To summarize, theories of human problem solving may offer the greatest insight as to the underlying mental processes involved in computer programming. Moreover, techniques associated with these theories may be the best techniques for studying the cognitive processes of computer programmers. Until a greater understanding is gained of these cognitive processes, classic experimental techniques may be “shots in the dark.” This is not to say that classic experimental techniques are not useful for testing specific hypotheses. Rather, a general theory of software psychology may come more quickly if other techniques such as protocol analysis and large-scale ecological data collection and analysis are employed.
2.2 Individual
Differences
What makes a good programmer? Studies have shown that there are differences in the ability level of experienced programmers on the order of 25 to 1 [43]. It would certainly be useful to determine the source of these differences. There are several reasons for wanting to know why programming abilities differ among individuals, including the development of selection tests for future programmers, and the assignment of the “right” programmer to the “right” job. Also, knowing the relationships between individual variables and programming ability will provide insight to the internal thought processes involved in software development. One major empirical study has been conducted to
Human
Factors in Software
date attempting to evaluate predictors of programming performance. This work was done by Love [29]. In this set of experiments, four measures of information processing ability were collected for several classes of introductory computer programming at the University of Washington. These measures were (1) performance on a paired-associates task, (2) digit span, (3) perceptual speed, and (4) subjective organization of words in a free-recall learning paradigm. These measures were then correlated with a number of measures of computer programming performance. Some signiticant correlations were observed, although there may be some doubt as to their validity. Out of 116 correlation coefficients computed only 8 were significant at the p < .05 level, thereby leading one to believe that at least some of the significant correlations were due to chance. Although the findings were consistent with expectations, no conclusions regarding selection criteria that may be used for programmers may reasonably be drawn from these data. No known studies are available examining other cognitive variables such as intelligence, mathematical ability, reasoning ability, or the plethora of other potentially relevant variables. Even in Ramsey and Atwood’s [40] extensive review of the literature on human factors in computer systems, only two references were cited under the topic of individual differences in programming. Neither of these studies was aimed at developing predictors of performance. No one has explored personality traits as predictors: Wynne and Dickson [69] determined that personality factors were related to performance in a man-machine interactive system. Shneiderman [49] suggested and justified the following personality factors as potential determinants of an individual’s programming ability: 1. 2. 3. 4. 5. 6. 7. 8. 9.
7
Engineering
Assertive/passive Introverted/extroverted Internal/external locus of control High/low anxiety High/low motivation High/low tolerance for ambiguity Compulsive precision Humility Tolerance of stress
Weinberg [65] summarized the potential relation of personality to programming performance in the following manner: Because of the complex nature of the programming task, the programmer’s personality-his individuality and identity-are far more important factors in his success than is usually recognized . there seems to be evidence that critical personality factors can be isolated and associated with particular programming tasks-at least in the
sense of their possession rendering one incapable of performing that task well. Consequently, attention to the subject of personality should make substantial contributions to increased programmer performance. There are some existing tests of programmer aptitude, but their preditive ability has not been verified. The Computer Programmer Aptitude Battery examines an individual’s ability in verbal meaning, mathematical reasoning, letter series, and number ability. The Wolfe Computer Aptitude Testing Company has tests for systems analysis and programming ability. Other tests, such as the Berger Test of Programmer Proficiency, measure programmer ability, not aptitude, and are not intended to predict who can become a good programmer. Finally, no one has attempted to develop predictors of ability in the higher levels of programming skills. Love [29] only tried to distinguish poor novices from good novices, not poor experts from good experts. The reason for this lack of research should be obvious; a study examining cognitive, personality, and other measures of ability to predict long-term acquisition of skill of computer programming would require tracking individuals through several years of training and work. Long-term studies are avoided by researchers because of the difficulties frequently encountered, although they are sometimes necessary. In this case it appears that one is necessary. Additionally, problems could be minimized if programming skill acquisition was monitored in a university environment. Attrition and its associated problems could be minimized.
2.3 Training Computer Programmers Very little research has been conducted on the ways and means of teaching people to program computers. This area could prove very fruitful if we could find ways to speed up the learning process both for novices and experts learning new techniques. Lewis [28] reviewed the approaches which were used to teach novices FORTRAN, although this review was oriented more towards how FORTRAN is taught rather than evaluating the effectiveness of these techniques. Soloway, Bonar, Woolf, Barth, Robin, and Ehrlica [60] and Bonar, Ehrlica, and Soloway [5] employed a model of programmer behavior and analytical techniques to determine what misconceptions students had that led to program bugs. The techniques and conclusions could be useful for further student research. Work is currently being performed at Texas Instruments in the development of computer-assisted instructional programs for computer programming. Also, there are several ongoing efforts being supported by the Office of Naval Research and the Army Research Institute examining the differ-
8
ences between experts and novices for different classes of problem solving skills. These efforts may shed some light on how people acquire the types of skills required to develop software, although little is being done on that topic directly. Some very interesting work has been done by Mayer [ 3 1,321 that specifically addresses instructional techniques to be employed while training programmers. Mayer applied some of the concepts of advanced organizers and elaboration to novices learning computer programming. There was clear evidence supporting the effectiveness of providing a simple model of the computer as an organizer prior to teaching a programming language. However, this was only expbred at the novice level. The utility of the concept of an advanced organizer for intermediate-level programmers (e.g., after 6 months experience) would certainly be interesting, Additionally, forcing subjects to elaborate upon their actions while they were initially learning programming techniques proved useful in retention and transfer. The data presented in this article provided support for the concept of internal model development being a key component to acquiring programming skill. Indications are that the better the models are developed (via elab oration) and the links between units are understood (via advanced organizers), the better the ability to program. However, these data tell us little about the elements or structure of the internal models necessary for good programming performance. We can also assume that models will be different at different levels of ability. Shneiderman 1481 found that an experienced programmer could recall meaningful lines of computer code better than a novice. However, random lines of computer code were no better remembered, This study, analogous to the chess board recall study by Chase and Simon [ 121, indicates that the elements of an experienced programmer’s internal models may involve chunks of information. These larger elements may then be more efficiently stored and manipulated than the smaller chunks available to the novice. McKeithen et al. [33J also confirmed that meaningful computer code was better remembered by experts than novices, whereas random code was not. Additionally, they conducted an experiment to examine how experts and novices learned computer language reserved words. Observed differences indicated that there was a correlation between programming expertise and the mental organization of programming concepts. The study, however, did not permit a clear statement of the way in which the mental organization changed as a function of experience. Additionally, larger chunks of meaningful information probably used by programmers were not explored (e.g., several lines of code performing a common function, such as summing an array). While
K. R. Laughery, Jr. and K. R. Laughery, Sr. this article does provide a strong basis for examining the mental models incorporated during the software development process as a function of experience, the direct implications for the training of computer programs are unclear. No known studies have been conducted on training experienced programmers in new techniques or languages. As already noted, Shneide~an and Mayer [.%I proposed a syntactic/semantic interactive model of programmer behavior. It may be inferred from this model that the experienced programmer, familiar with the semantics of software design, could adapt relatively easily to a new syntax. Ehrman 1201, on the other hand, sees the multiplicity of languages as an impediment to programmer productivity. The nature and extent of the relationships between programmer experience and syntactic/semantic understanding are unknown. However, there is evidence that these constructs are valid descrip tors of human information processing variables [ 561. 2.4 Language /Coding Practice Effectiveness A sizeable body of literature is beginning to develop on the merits of various coding practices. Many programming methods and language features have been tested and their worth verified or denied. For the sake of brevity we will only summarize this literature. Some of the findings are in Table 1, where the left column identifies the language/coding practice, the center column identifies the trend of the findings, and the right column states references. Dunsmore and Gannon [ 191 studied a variety of other programming features as they related to ease of software construction, comprehension, and modification. They found that ease of construction seemed to be related to average nesting depth, percentage of global variables used for data communication, and the average number of live variables per statement. Probably the coding practice which has had the greatest significance in the last decade is structured programming. While the power of structured design is evident to those who use it, Basili and Reiter [3] provided solid empirical evidence regarding its efficacy, particularly for large programming efforts requiring a programming team. In a critical review of this literature, Sheil [45] states the following: One of the most salient characteristics of psychological research on programming is its preoccupation with the issues of contemporary computing practice. While the practical concern is understandable, many of the studies are so narrowly focused in an attempt to settle some debate among computer scientists that they are of dubious scientific value.
9
Human Factors in Software Engineering Table 1. Summary of Findings Language/coding practice
Findings
References
Conditionals
Nesting (if-then-else) appears to be superior to test and jump (ifgo to)
]59,26,1,58,25]
Control flow
Structured programming appears to be easier to comprehend
[ 29,46,59,30,68]
Flow Charting
Flow charting does not appear to be a particularly useful form of documentation. However it may be useful in learning programming and debugging
(52,32,49,7]
Indenting
May be a useful practice
[34,29,57,67]
Use of Mnemonic Variable Names
Weak evidence indicates that mnemonic names may be useful. Does not appear to have a strong effect.
[67,47,49]
Commenting
There appears to be an interaction between the level of commenting desired and the program complexity.
[67,52,47]
Throughout most of this review, Sheil repudiates many of the studies that were conducted. We take exception to some of Sheil’s comments. Many of the referenced studies have focused narrowly on examining contemporary programming practices. However, if these contemporary programming practices are not studied they may become embedded, never to be changed, regardless of experimental findings. The sooner they are studied, the better. Additionally, many of the findings were counter-intuitive (e.g., the apparent ineffectiveness of flow charting, indenting, and mnemonic names). These findings provide scientifically valuable information regarding the processes involved in program development behavior. Certainly, all the answers are not in. However, some of the pioneering work by Weissman, Shneiderman, Sime, Green, Sheppard, Love, and others should be treated as a good first step in answering many of the questions which must be asked. 2.5
Human Errors in Computer Programming
One of the ways to study human behavior is through an analysis of critical incidents. Examining human errors in software development is a type of critical incidents
analysis. By simply examining the types of errors made, we can gain insight into the behaviors underlying the error and the cognitive processes involved. Another approach is to use the likelihood of errors as a dependent variable for studying some independent variables such as a language feature. Both approaches have been employed. Boies and Gould [6] examined the likelihood of syntax errors causing compilation failures in an operational environment. They found that syntax errors occurred in less than 20% of the attempted ~mpilations. This was a somewhat lower percentage than was found by Moulton and Muller (1967); however, the populations studied were different. Moulton and Muller also classified the compilation errors by the statement type in which they occurred and the execution errors by their input/ output, reference and definition, or logic faults. One of the conclusions that can be drawn from these studies is that syntactic errors do not appear to be the major bottleneck in program development. This finding is also somewhat supportive of the Shneiderman and Mayer [56] proposal of the syntactic/semantic model of programmer behavior discussed in Section 2.1. Glass 1221 studied software errors found during program maintenance, as opposed to development, He made a strong argument that bugs found after the program has been put into operational use are an order of magnitude more expensive to repair than if they were found earlier. He developed a catego~zation scheme by examining errors found in large programs (lOO,OOO300,000 lines of source code) that were in use. By examining the types of errors, he concluded that the persistent cause of errors is “the failure of the problem solution to match the complexity of the problem to be solved.” His solution is that higher level languages should be developed. In terms of understanding human behavior, the findings and solution beg the question. However, the accuracy of his proposed solution can be seen in the proliferation of high-level programming techniques such as simulation languages and CAD/ CAM techniques. 2.6 Debugging Computer Software Reviewers (e.g., [48,45]) tend to group studies of human errors and debugging together. On the surface this is reasonable, since debugging is simply the correction of errors caused by the human. However, the common types of errors are more likely to be indicators of mechanisms involved in program development, whereas the ways and means of debugging should tell us more about program comprehension and understanding. For this reason, we have kept the two issues separate. Boehm [2] estimated that 2550% of initial pro-
K. R. Laughery, gram development time is spent debugging. Probably the best effort in attempting to understand the process is summarized in Gould and Drongowski [24] and Gould [23]. In this. effort, experienced programmers were asked to debug programs with planted bugs. Gould and his associates also looked at the utility of various information sources such as an input listing, output listing, and specification of error type. Their findings indicated that: (1) there was large individual variation in ability to debug; (2) there was little if any effect of information source available on speed or accuracy of debugging performance; (3) different bug types (e.g., assignment vs. interaction) required different efforts to find; and (4) after a subject had debugged a program once he could find new bugs much more quickly. Based upon these data coupled with protocol analyses, Gould [23] developed a simple model of the debugging process that involved three steps. Step 1 involves the selection of a debugging tactic. In Step 2 the debugging tactic gives clues as to the specific nature and location of the bug. If the clue is sufficiently strong, stop and correct the bug. If there is no clue generated, go to Step 1. If there is a weak clue, go to Step 3. Step 3 involves the generation of an hypothesis which can be tested by a new debugging tactic by going back to Step 1. This model is described by Figure 1. The basic structure of this model is amenable to a production system modeling approach as proposed by Newell and Simon
[371. Other research on debugging (e.g., [54,7]) has attempted to evaluate the utility of various forms of doc-
Figure 1. Gould model of the debugging
WEAK
+
I
Y
STEP 3 GENERATE HYPOTHESIS
1
process.
Jr. and K. R. Laughery,
Sr.
umentation for aiding the debugger. Shneiderman reported little success with flow charting, although highlevel comments, functionally organized program modules, and mnemonic names seemed to be somewhat helpful. Brooke and Duncan, on the other hand, indicated that flow charting may speed up the debugging process. One criticism of Brooke and Duncan’s experimental procedure, however, is that they used nonprogrammers. It is unsafe to assume that findings for nonprogrammers will generalize to programmers until we better understand the cognitive changes which occur while learning to program.
2.7 Estimating Programmer Performance and Software Cost This is one of the areas that has been addressed at some length, most likely because of the need for cost estimating tools by industry when developing software. Much of this literature is in systems management journals, not systems analysis or programming journals. Putnam and Fitzsimmons [39], for example, provided models for estimating time to develop software based upon the number of lines of source code, number of personnel, and several other variables. No attention is given in these models for factors affecting individual programmer performance. Other articles (e.g., [21,63]) included human variables in estimating overall software cost, although not on an individual-by-individual basis. Other efforts (e.g., [ 13,17,18], have addressed specific factors that will mediate an individual’s programming performance. Unfortunately, but not surprisingly, the models become less quantitative as they become more directed towards individual programmer performance and less directed towards overall system cost. This, again, illustrates our lack of a basic understanding of the information processes underlying computer software development. An interesting article not directly linked to cost estimating was published by Lemos (1980). This study involved the examination of correlations among several measures of programming ability in the classroom. The most interesting findings were that the ability to read programs was highly correlated with program writing ability, and that language grammar knowledge was not linked to either program reading or writing. This could have substantial implications in the experimental designs of future research in software design, since evaluating an individual’s ability to read a program is much easier than evaluating an individual’s ability to write a program. In a related article by Persio et al. [38], the use of the memory construction technique as a predictor of be-
Human
Factors in Software
11
Engineering
ginner programmers’ ability in the classroom was explored. The memory construction technique has been used in many studies (e.g., [46,33]) and involves the ability of subjects to correctly recall a program from memory after viewing it for several minutes. This ability was found to correlate highly with other performance measures including test grades and class grades. Again, this may provide an easier way of evaluating program writing ability.
quality of the software produced by the programmer since, in the real world, this is what will count. We are encouraged by the variety and quality of measures that have been developed. Good performance measurement techniques should not be a stumbling block for future research.
3. SUMMARY-WHAT 2.8 Evaluating
Software Quality
The evaluation of programmer abilities involves an evaluation of the quality of the software they produce. Shneiderman [48] states that “software quality measurement is an infant discipline.” However, as evidence to the contrary, several techniques have been developed, most notably Boehm, Brown, and Lipow’s metrics, Glib’s software metrics, Halstead’s software science, Gordon’s metric and McCabe’s complexity measure. Each of these techniques takes into account a large number of factors such as ease of use, maintainability, and software portability. For any given software quality assessment situation, there will be different factors that are relevant. To quote Weinberg [65], “Each program has an appropriate level of care and sophistication dependent upon the uses to which it will be put.” For example, the portability of the software (ease of transfer to another computer) will be considerably less relevant to a student in an introductory computer course than to an author of software packages to be commercially marketed. The task, in any software quality analysis, therefore, is to select those measures that are relevant. Boehm, Brown, and Lipow identify 51 candidate metrics which will be a crucial factor in any study. Less attention has been given to assessing the relationship of software quality to the psychological complexity of the software. Curtis, Sheppard, Milliman, Borst, and Love [ 171 conducted an experiment correlating several software complexity measures with percent of program statements recalled after viewing a program for several minutes. A significant correlation was found for several measures indicating that objective measures can predict the ease of understanding a program. In fact, some very simple measures accounted for most of the variance (e.g., number of lines of code operator/operand comments). Through this type of research we may gain a better understanding of the causal mechanisms linking elements of software to human comprehensibility. To researchers studying programmer performance, it is imperative that we have valid criterion measures. Usually, these criterion measures should reflect the
WE DO AND DON’T KNOW
The search through the literature indicates much of what we do and do not know about human factors in software engineering. This knowledge (and lack thereof) can be summarized as follows: 1. There is tremendous variability in programming performance, even among experienced programmers. We do not know if these differences are attributable to differences among individuals’ speed of information processing while programming, the nature of the information processes themselves, or some combination of the two. 2. Careful attention must be paid to the goals presented to the programmer regarding software to be developed, since there are definite tradeoffs in the allocation of programming effort. It is imperative, therefore, that we consider stated goals when evaluating software. 3. We can simulate an individual programmer’s performance, at least on a small subset of tasks. We do not know the scope of the task required to model all of a programmer’s ability, although it is estimated to require simulating 104-lo5 specific actions. 4. Personality variables are expected to affect an individual’s ability to develop software, although no studies have shown which variables actually do influence performance. 5. A simple model of a computer when explained to novices can aid learning to program. 6. Some contemporary coding practices are worthwhile and some are probably not. More evidence seems to be required before the software development community can be convinced of the findings, however. 7. Syntactic errors do not appear to be the bottleneck in program development. Rather, logic or semantic errors require greater effort to correct. 8. There are many quantitative models for software cost estimates. However, few of them consider human variability and none consider causal mechanisms for human variability (although not all models should necessarily be this specific).
12 9. There are many good measures of software quality and complexity. The selection of a particular metric should consider design goals (as discussed in #2 above). Depending upon the particular goals, reasonable metrics can be selected for software development project management. 10. Reconstruction of a program from memory correlates highly with other measures of program comprehension, making it a useful technique for experimentation. Laboratory techniques are now available for relatively quick (i.e., one year) studies of languages and coding practices. Modeling techniques are available to simulate individual programmer performance. Software quality measurements are available for validation. All of these are potentially fruitful areas for further research. What is perhaps needed most is a general theory of programmer behavior. A robust theory would reduce the otherwise infinite number of experiments required. In the meantime, experimental techniques for studying specific software engineering questions are available. The human factors research community now has the tools for testing the human software development process and, ultimately, improving it.
REFERENCES 1. A. T. Arblaster, M. E. Sine, and T. R. G. Green, Jumping to Some Purpose, Computer J. 22 (1975). 2. B. Boehm, Software and Its Impact: A Quantitative Assessment, Datamation 19 49-59 (1973). 3. V. R. Basili, and R. W. Reiter, A Controlled Experiment Quantitatively Comparing Software Development Approaches, IEEE Transactions on Software Engineering SE-7 (May 1981). 4. B. Boehm, J. Brown, and M. Lipow, Qualitative Evaluation of Software Quality, Software Phenomenology Working Paper of the Software Lifecycle Management Workshop, 1977, pp. 81-94. 5. J. Bonar, K. Ehrlica, and E. Soloway, Collecting and Analyzing On-Line Protocols from Novice Programmers, Behavioral Research Methods and Instrumentation 14, (April 1982). 6. S. Boies, and J. Gould, Syntatic Errors in Computer Programming, Human Factors 16, 253-257 (1974). 7. J. B. Brooke, and K. D. Duncan, Experimental Studies of Flowchart Use at Different Stages of Debugging, Ergonomics 23, 1057-1091 (1980). 8. R. Brooks, Towards a Theory of the Cognitive Processes in Computer Programming, International Journal of Man-Machine Studies 9, 737-75 1 (1977). 9. R. Brooks, A Model of Human Cognitive Behavior in Writing Code for Computer Programs, doctoral dissertation Carnegie-Mellon University, Pittsburgh 1975.
K. R. Laughery,
Jr. and K. R. Laughery,
Sr.
10. Business Week, 1980. 11. S. R. Card, T. P. Moran, and A. Newell, The Psychology of Human Computer Interaction, Lawrence Erlbaum Associates, Hilldale, NJ, 1983. 12. W. Chase and H. Simon, Perception in Chess, Cognitive Psychology 4, 55-8 1 (1973). of Computer 13. E. Chrysler, Some Basic Determinants Programming Productivity, Commun. ACM 21, 472483 (1978). 14. E. Chrysler, The Impact of Program and Programmer Characteristics, Proceedings of the National Computer Conference, 47, Montvale, N.J.: AFIPS Press, 1978, pp. 581-587. 15. J. E. Cooke and R. B. Bunt, Human Error in Programming: The Need to Study the Individual Programmer, IFOR 13 (1975). 16. B. Curtis (ed.), Human Factors in Software Development Tutorial. Initially presented at COMPSAC 81, held November 16-20, 1981 in Chicago, IL. Available from IEEE LC8 I-841 80, EHO 185-9. 17. B. Curtis, S. A. Sheppard, and P. Milliman, Third Time Charm: Stronger Prediction of Programmer Performance by Software Complexity Metrics, IEEE Transactions on Software Engineering SE-5, 96-104. (March 1979). 18. F. R. Donati, The Evaluation and Appraisal of Programmer’s Performance, J. Systems Management (Ott 1971). 19. H. E. Dunsmore and J. D. Cannon, Analysis of the Effects of Programming Factors on Programming Effort, J. Systems and Software 1 (Feb 1980). 20. J. R. Ehrman, The New Tower of Babel, Datamation (March 1980). 21 B. Esterling, Software Manpower Costs: A Model, Datamation (March 1980). 22. R. L. Glass, Persistent Software Errors, IEEE Trans. Software Engineering SE-7 (March 198 1). 23. J. Gould, Some Psychological Evidence on How People Debug Computer Programs, International J. Man-Muchinestudies 7, 151-182 (1975). 24. J. Gould and P. Drongowski, An Exploratory Study of Computer Program Debugging, Human Factors 16, 258-277 (1974). 25. T. R. G. Green, IFS and THENs. Is Nesting Just for the Birds? Software Practice and Experience 10, 373-381 (I 980). 26. T. R. G. Green, M. E. Sine, and M. Fitter, Behavioral Experiments on Programming Languages, Memo 66 MRC Social and Applied Pschology, University of Sheffield, England, 1975. Languages Ef27. R. S. Lemos, Measuring Programming ficiency, AEDS J. (Summer 1980). 28. D. W. Lewis, A Review of Approaches to Teaching Fortran, IEEE Trans. Education E-22 (1979). 29. T. Love, Relating individual differences in computer programming performance to human information processing abilities, unpublished doctoral dissertation, University of Washington, 1977.
Human Factors in Software Engineering 30. H. Lucas and R. Kaplan, A Structured Programming Experiment, The Computer J. 19, 136-138 (1976). 31. R. E. Mayer, The Psychology of How Novices Learn Computer Programming, Computing Surveys 13 (March 1981). 32. R. E. Mayer, Different Problem-Solving Competencies Established in Learning Computer Programming with and without meaningful models, J. Educational Psychology 6, 725-734 (1975). 33. K. B. McKeithen, J. S. Reitman, H. H. Rueter, and S. C. Hirtle, Knowledge Organization and Skill Differences in Computer Programmers, Cognitive Psychology 13, 307-325 (1981). 34. R. J. Miara, J. A. Musselman, J. A. Navarro, and B. Shneiderman, Program Indentation and Comprehension, Technical Report #1172, Department of Computer Science, University of Maryland, June 1982. 35. R. G. Mills, Man-Machine Communication and Program Solving, Ann. Rev. Information Science and Technol. 223-254 (1967). 36. P. Moulton and M. Muller, DITRAN-A Compiler Emphasizing Diagnostics, Commun. ACM 10, 45-52 (1967). 37. A. Newell and H. Simon Human Problem Solving, Prentice Hall, Englewood Cliffs, NJ 1972. 38. T. D. Persio, D. Isdister, and B. Shneiderman, Experiment Using Memorization Reconstruction as a Measure of Programming Ability, International J. Man-Machine Studies 13, 339-354 (1980). 39. L. H. Putnam and A. Fitzsimmons, Estimating Software Costs, Dalamation (1979). 40. H. R. Ramsey, and M. E. Atwood, A Critically Annotated Bibliography of the Literature on Human Factors in Computer Systems, Science Applications Incorporated Report Number SAI-78-070-PEN, Denver, Colorado, May 1978. 4 I. P. Reisner, Human Factors Studies of Database Query Languages: A Survey and Assessment, Computing Surveys 13, 13-31 (1981). 42. H. Sackman, Experimental Analysis of Man-Computer Problem-Solving, Human Factors 12, 187-201 (Dee 1970). 43. H. Sackman, Man-Computer Problem Solving, Amerback Publisher, Princeton, NJ, 1970. 44. N. Sackman, W. J. Erikson, and E. E. Grant, Exploratory Experimental Studies Comparing Online and Offline Programming Performance, Commun. ACM 11 (1968). 45. B. A. Sheil, The Psychological Study of Programming, Computing Surveys 13 (198 1). 46. S. Sheppard, B. Curtis, P. Milliman, M. Burst, and T. Love, First Year Results from a Research Program on Human Factors in Software Engineering, Proc. Nut. Computer Conf. 48, 73-79 (1979). 47. S. B. Sheppard, B. Curtis, P. Milliman, and T. Love, Modern Coding Practices and Programmer Behavior, Computer 12, 41-49 (Dee 1979).
13 48. B. Shneiderman, Group Processes in Programming, Datarnation (Jan 1980). 49. B. Shneiderman, Software Psychology; Human Factors in Computer and Information Systems, Winthrop Publishers, Cambridge, MA, 1980. 50. B. Shneiderman, Improving the Human Factors Aspect of Database Interaction, ACM Trans. Database Systems 3, 417-439. (1978). 5 1. B. Shneiderman, Perceptual and Cognitive Issues in the Syntactic/Semantic Model of Programmer Behavior, in Proceedings of Symposium on Human Factors and Computer Science, Human Factors Society, Santa Monica, CA, 1978. 52. B. Shneiderman, Measuring Computer Program Quality and Comprehension, International J. Man-Machine Studies 9 (1977). 53. B. Shneiderman, Teaching Programming: A Spiral Approach to Syntax and Semantics, Computers and Education 1, 193-197 (1977). 54. B. Shneiderman, Exploratory Experiment in Programmer Behavior, International J. Computer and Information Sciences 2, 123-143 (1976). 55. B. Shneiderman, (Ed.), Database Management Systems. Information Technology Series, 1, AFIPS Press, Montvale, NJ, 1976, pp. 59-61. 56. B. Shneiderman and R. Mayer, Syntactic/Semantic Interactions in Programmer Behavior: A Model for Experimental Results, Inter. J. Computer and Information Sciences 7, 2 19-239 (1979). 57. B. Shneiderman and D. McKay, Experimental Investigations of Computer Programming, Debugging, and Modification. Paper presented at the 6th international congress of the international ergonomics association, July 1976, College Park, MD. 58. M. E. Sime, A. P. Arblaster, and T. R. G. Green, Reducing Programming Errors in Nested Conditionals by Prescribing a Writing Procedure, Intern. J. of Mun-Machine Studies 9, 119-l 26 (I 977). 59. M. Sime, T. Green, and D. Guest, Scope Marking in Computer Conditionals-A Psychological Evaluation, Intern. J. Man-Machine Studies 9, 107-l 18 (1977). 60. E. Soloway, J. Bonar, B. Woolf, P Barth, E. Robin, and K. Ehrlica, Cognition and Programming: Why Your Students Write those Crazy Programs, Proceedings of the National Educational Computing Conference held in Denton, Texas, June 198 1. 61. W. Tracz, Programming and the Human Thought Process, Software: Practice & Experience, 9, 127-I 38 (Feb 1979). 62. Wall Street Journal, 1982. 63. D. A. Watson, Some Factors Involved in the Establishment of Programmer Performance Standards, Computer Bulletin 13, 192-l 94 (1969). 64. G. M. Weinberg, The Psychology of Improved Programming Performance, Datum&ion (Nov 1972). 65. G. M. Weinberg, The Psychology of Computer Programming, Van Nostrand Reinhold, New York 197 1. 66. G. M. Weinberg and E. Schulman, Goals and Perfor-
14 mantes in Computer Programming, Human Factors 16, 70-77 (Jan 1974). 61. L. Weissman, A ~et~odo~o~~or Studying the Psychological Complexity of Computer Programs, unpublished doctoral dissertation, University of Toronto, 1974. 68. L. Weissman, Psychological Complexity of Computer
K. R. Laughery,
Jr. and K. R. Laughery,
Sr.
Programs: An Experimental Methodology, ACM SIGPLAN Notices 9 (1974). 69. B. E. Wynne and G. W. Dickson, Experienced Managers Performance in Experimental Man-Machine Decision System Simulation, Acad. Management J. 1, 25-40 (1976).