Int. J. Man-Machine Studies
Categories of progr application An introduction to a special issu, RUVEN BROOKS
Schlumberger Laboratory for Comp'uter Science, 8311 North RR 620, P.O. Box 200015, Austin, TX 78720-0015, USA (Received 28 February 1990) Writing or designing computer software requires a broad range of knowledge; consider the following pieces of knowledge that a software developer might use: 9 9 9 9 9 9
the form of a structure definition in CommonLisp; the methods used to pass parameters to a subroutine in a stack language; the way to build a harsh table using arrays; the requirements for writing an LR(1) grammar; the use of baton passing as a control mechanism in distributed systems; the algorithms for filling and justifying text optimally to eliminate widows and orphans; and ~ the correction for the presence of potassium in the drilling mud in natural gamma ray spectroscopy.
The list is in rough order ranging from knowledge highly specific to particular programming environments to knowledge which (while it is necessary to develop particular pieces of software) is more about the application than about the software itself. The first item is clearly dependent on the syntax of a particular programming language, but also involves more general knowledge of programming data types. Stack parameter passing and using a hash table both are knowledge about how to use particular programming language constructs, but these constructs occur in multiple programming languages. How to write LR(1) grammars and the use of baton passing architectures are programming knowledge which is relatively independent of particular programming language constructs; one could, for example, write an LR(1) parser in nearly any programming language. The last two items are examples of knowledge that is specific to particular application domains; it is still "programmer" knowledge, however, in that a programmer or software designer must know it in order to develop applications in the domains of typesetting or of hydrocarbon wireline log interpretation. As these examples illustrate, the knowledge needed to design and develop software is quite diverse. From a behavioural standpoint, this raises the following questions: 9 How is programming knowledge organized7 Is all knowledge represented internally in the same fashion or are there different representations for different types of knowledge? ~ What characteristics of the problem being solved determine when a particular piece of knowledge is invoked? Is some knowledge available only at particular stages of software development? How do individuals differ in the cues that elicit particular pieces of knowledge? ~ What strategies do software developers use to manage their work and to solve problems? How do these strategies interact with software knowledge? 241 0020-7373/90/030241 + 06503.00/0 t~) 1990 Academic Press Limited
242
g.
BROOKS
1. Application domain knowledge The goal of software development is to solve an application problem, not just to create code that performs a certain computation. As the work of Curtis, Krasner and Iscoe (1988) points out, understanding the properties of the application and matching them to appropriate software architectures is a major issue in software design. As the work of Guindon (1990) points out, the requirements as presented to the software designer or programmer are usually incomplete and imprecise. Part of what takes place during the design is that the designer supplies the missing information by inferring it or by adding it from personal knowledge of the application. In many cases, determining the requirements for the software is itself, part of a larger application design process. In the case studied by Visser (1990), a mechanical engineer designs a multiple step machining process based on the properties of the machine tools and of the part to be manufactured. This process design, in turn, serves as a set of specifications for what the software will do. (In this case studied by Visser, the design of the steps to be taken in a machining process nearly determines the steps to be executed by the software.) In some situations, the process even goes backwards; for systems with embedded microprocessors, properties of the software design are part of what is used to determine what size of computer will be needed. Although the use of application knowledge is clearest during the specification and design of software, it probably permeates nearly all software development tasks. Designing an effective test suite for a software system, for example, requires knowledge of which paths through the system are most likely to be executed; in turn, this requires knowledge of how the system will be used. A similar argument applies to tasks such as performance tuning. Since most of these tasks are done using only code and not the original requirements documents, processes similar to those seen by Guindon in elaborating requirements, might also take place to re-create application domain knowledge that was used for creating the code in the first place. The work of both Guindon and Visser thus suggests that application knowledge may play a much more important role in successful software development than has previously been assumed.
2. Program structure knowledge Four of the articles in this issue, Guindon, D6tienne and Soloway, Rist, Robertson, and Yu, use or propose constructs to describe the organization of programmer's or software designer's knowledge about the organization of computations. In the Guindon work, these are called "design schemas", in the other three, they are referred to as "plans". Although both of these constructs share a common flavour, they may, in fact, represent very different kinds of knowledge. In the case of "design schemas", the construct specifies functional decomposition of a system into subsystems. Schemas are applied hierarchically, breaking subsystems down into still smaller pieces. As described by Guindon, though, schemas only specify the functions to be performed by a subsystem; they do not describe the sequence of computations to be performed or the data structures to be used in performing them.
CATEGORIES OF PROGRAMMING KNOWLEDGE
243
In contrast, the plan constructs used in the other three articles describe specific computational steps that must be implemented in order to produce an executable program. Even among these three, there are differences in the way plans are used, particularly in the ways plans are composed. In D6tienne and Soloway and in Robertson and Yu, the plans have a fiat structure of a sequence of steps that are concatenated to achieve a goal. Rist, on the other hand, assumes a more complex plan structure with three, distinct composition mechanisms: concatenation, interleaving, and hierarchy. Design schemas and plans are not mutally exclusive; both constructs are probably needed to describe the program structure knowledge used by software developers. Design schemas provide a good acounting of how large computations are structured into manageable pieces; they account for the construction of programs at the level of modules, procedures and functions. Plans, on the other hand, describe knowledge at the level of the internal construction of each of the individual components.
3. Interpersonal communication knowledge Effectively, software is developed as much for human comprehension as it is for machine execution. In addition to the need to understand programs for maintenance and modification, many software development processes involve transformations of formalism, such as going from specification language to programming language, that are performed manually. Successful software developers must therefore have knowledge that enables them both to understand software that others have developed and to construct their own software in a way that will permit other software developers to understand it. D6tienne and Soloway (1990) and Soloway, Adelson and Ehrlich (1988) argue that interpersonal communication knowledge, which they refer to as "rules of discourse", plays a very important role in the comprehension of computer programs by creating expectations of the likely surface features in the program text that go with particular program plans. What still needs more exploration is the role of interpersonal communications in the software generation process. Do they only appear at the level of translating plans into code, or are they taken into account in the selection of plans or other, higher level constructs?
4. Problem-solving strategy knowledge In addition to plan knowledge, strategy knowledge plays, an important role in determining programmer behaviour. This is knowledge that guides the software developer in deciding which parts of the problem to work on next and which goals to pursue. Guindon notes three different types of strategic knowledge in design. The first is the use of explicit design methodologies such as Jackson System Development method. This type of strategic knowledge is used to "provide operator sequence knowledge and control knowledge for the application of these operations". This sort of strategic knowledge is highly specific to software design. The second type of strategic knowledge is what Guindon refers to as "heuristics" which are used to guide the search for information about the problem structure and
244
R. BROOKS
for a design solution during global or high-level design. The heuristics observed by Guindon are very general and relatively weak and would be expected to be applicable to nearly all types of engineering design, but with different priority orderings. For example, the heuristic of "consider a simpler problem" will probably assume a more dominant role for engineering design tasks that are decomposable, such as those in electrical engineering, than it would in something like mechanical engineering linkage design in which changing a single link may change the entire behaviour of the linkage. Guindon's third type of strategic knowledge is the set of the criteria for evaluating possible solutions. The primary example she cites is the use of the criterion, achieve high reliability, by one designer to radically prune the space of possible designs. Guindon views the adoption of particular criteria as relatively invariant personal characteristics of designers; a particular individual might always use the same set of criteria across all designs, or, at least, all designs sharing a few general characteristics. An alternative possibility would be that these criteria are given explicitly or implicitly in the requirements specification; for example, the requirements for a transaction system might explicitly state: "customer records must be retrieved in one second or less". Under these conditions, such criteria might well get incorporated directly into the pattern part of schema: "if retrieval performance must be very fast, then use a hashing scheme". Rist discusses how different strategies in generating and instantiating coding plans can lead to different final programs. He sees strategies as playing an important role in two different stages in code generation. The first is in guiding the coding plan generation process by determining the order in which parts of the plan are created. In this regard, these strategies appear analogous at the code level to the role design methodologies play at the global design level. The second area in which Rist sees strategies as playing an important role is in the generation of the actual code from the plan, whether it can be in deciding the general order of coding or in deciding how to combine multiple actions into a single programming language statement. Although Rist does not mention the possibility explicitly, much of what is referred to as "programming style" may be a consequence of this sort of strategic knowledge. Drtienne and Soloway identify another kind of strategic knowledge that is used by programmers to understand an existing program. They found that combinations of four strategies could be used to describe the behaviour of their programmer subjects in attempting to understand both programs that exhibited the structure of well-known plans and those in which this structure was obscured. Strategies ranged from those with strong preconditions and high information yields to those with weaker preconditions and smaller information yields. Why is it useful to differentiate between strategic knowledge and structure or plan knowledge? The answer is that strategic knowledge is knowledge about process. This statement has two consequences: on the one hand, while the use of strategic knowledge strongly impacts the final artifact that is produced, the strategic knowledge itself, on the other, can rarely be deduced from the final form of the artifact. The problem used by Rist was carefully chosen to use plans with parallel, independent components that could be combined in either of two orders; this made it possible to analyse the completed programs to discover which strategy was used.
CATEGORIES OF PROGRAMMING KNOWLEDGE
245
In general, however, these conditions would not hold and definite inferences about strategy from the final form of the program will not be possible. In turn, the situation has implications for experimental work in this area. Since the final form of programs is so difficult to relate back to strategic knowledge, experimental work in this area is more likely to obtain significant results if it is based on data collected during a software process, than if the data consists only of overall effort measures for the process, or of the final artifacts created during the process. Verbal protocol data of the type used by Dgtienne and Soloway or Guindon are good examples of useful data for studying strategic knowledge, but other types of data may also be appropriate. 5. W h a t w e don't k n o w If we were to go into an organization engaged in large scale software development, we might find individual software developers or programmers engaged in a range of activities. A partial listing would include: (1) analysing a potential computing application to determine what kinds of computer programs should be used to support it; (2) writing requirements for the computations that a particular program is to perform; (3) checking a set of requirements for completeness and accuracy; (4) doing a high level design for the overall structure of a large program; (5) designing a single module which is part of a large system; (6) coding and testing a single module; (7) locating and repairing a software defect; (8) deciding which versions of a set of modules to use in assembling a large system; (9) designing a test suite for a large system; (10) finding performance bottlenecks in a program; (11) verifying that the implementation of a system meets all of the specifications; and (12) deciding how to add a new function to an existing system. The ordering of this list is probably the order in which these activities might be first observed on a large project, but, across the life of a project or software system, they may occur simultaneously on different parts of the system. In many cases, they are done by different individuals. Most work to date in this area has concentrated on coding and debugging of modules; the Visser work on writing requirements and the Guindon work on high level design are among a few pieces of research in their respective areas and, to date, there do not appear to be any studies on application analysis, the design of test suites, or performance analysis. Since some of these other activities are among the most problematic and costly for software developers, they deserve behavioural investigation. Beyond the properties of individual software developement tasks, a further question that must be addressed is, to what extent is the grouping all of these activities under "software development" or "programming" just an accident or organization structure, with the knowledge and skills required being largely disjointed? Is there, in fact, a single core of cognitive skills or strategies that they all
246
R. BROOKS
involve which justifies the grouping? For those activities that are closely involved with the executable code, there clearly must be common knowledge; debugging a module is usually not possible without knowledge of the programming language in which the module is written and of typical programming cliches used in that language. As attention moves to aspects of a system that are further removed from the executing code, arguments for the existence of common knowledge cannot be made on the basis of the use of the same notational system, since multiple notations may be used. Is the kind of high level design studied by Guindon the same as the program design studied by Rist? Although Rist places his work in the same framework as Guindon, most software developers would probably claim that individual module coding was a well-defined task that didn't really belong to the "ill-structured" category at all. Defining the relationship between the different tasks involved in software development and the different types of information required to accomplish them remains a challenge for future work.
References CURTIS, W., KRASNER,H. & IscoE, N. (1988). A field study of the software design process for large systems. Communications of the A.C.M, 31(11), 1268-1287. DETmNNE, F. & SOLOWAY, E. (1990). An empirically-derived control structure for the process of program understanding. International Journal of Man-machine Studies, 33, 323-342. GUINDON, R. (1990). Kowledge exploited by experts during software system design. International Journal of Man -machine Studies, 33, 297-304. RIST, R. S. (1990). Variability in program design: the interaction of process with knowledge. International Journal of Man-machine Studies, 33, 305-322. ROBER'rSON, S. P. & Yu, C. (1990). Common cognitive representations of program code across tasks and languages. International Journal of Man-machine Studies, 33, 343-360. SOLOWAY, E., ADELSON B. & EIIRLICII, K. (1988). Knowledge and processes in the development of computer programs. In M. T. H. Cm, R. GLASER& M. J. FARR, Eds, The Nature of Expertise, pp. 129-152. Hillsdale, N.J.: Lawrence Eribaum Associates. Vmsnl~, W. (1990). More or less following a plan during design: Opportunistic deviations in specification. International Journal of Man-machine Studies, 33, 247-278.