An identification and investigation of software design guidelines for using encapsulation units

An identification and investigation of software design guidelines for using encapsulation units

An Identification and Investigation of Software Design Guidelines for Using Encapsulation Units Daniel Joyce Villanova University, Villanova, Pennsylv...

1MB Sizes 2 Downloads 7 Views

An Identification and Investigation of Software Design Guidelines for Using Encapsulation Units Daniel Joyce Villanova University, Villanova, Pennsylvania

A study was made to identify the types of conceptual units that should be captured within encapsulation units. Based on this study, guidelines to aid in the architectural design and implementation of software systems, in order to make optimal use of encapsulation units, were enumerated. An experiment was performed to investigate the consequent effects of guidelines and encapsulation unit use during design and coding on the adaptive maintenance cost of a program. Results of the experiment were inconclusive as to the relative effect of guideline and encapsulation unit use on the initial phases of the maintenance process. However, there were indications that programs designed with the aid of the guidelines and coded with encapsulation units were less likely to be corrupted during maintenance changes. It is argued that in the long run these programs would be more easily and successfully maintained. This article describes the guidelines, gives an example of their application, and analyzes the information extracted from the experiment.

1. ENCAPSULATION UNITS

The high cost of software production and maintenance has induced researchers to look for ways to simplify the design process and facilitate the production of clear, easily maintained code. One promising avenue towards this goal is the study of the syntax, semantics, and use of programming language constructs. In the past few years there has been a marked increase in the importance attached to the use of high-level modularization constructs. A particular high-level modularization construct, the encapsulation unit, has received a great deal of attention. Proponents of the encapsulation unit claim that its proper use will result in programs that are easily understood and maintained. Thus, the proper use of the encapsulation unit should effectively decrease the lifetime cost of software. Address correspondence to Daniel Joyce, Mathematical Sciences Department, Villanova University, Villanova, PA 19085.

The term “encapsulation unit” is used in this paper to describe a type of modularization mechanism provided in some recent languages. The units allow a programmer to introduce a level of program abstraction higher than that achieved with the standard functions and procedures found in high-level languages. Examples of encapsulation units of some current languages include “packages” in Ada [19] and “modules” in Modula-2 [20]. An “encapsulation unit” is defined to be a syntactically supported feature of a high-level language that: 1. Can contain standard high-level objects such as constant, type, and variable declarations along with local procedures. 2. Can provide procedures that can be called from outside the unit by other program units. 3. Can include local variables whose values persist between calls by other program units to the unit. 4. Can include local variables and procedures that can only be accessed or called from within the encapsulation unit. 5. Requires explicit control over the interface information between it and other program units. The main impetus for the development of encapsulation units appears to have been the desire to support the concept of data abstraction. Most of the articles authored by the designers of the various languages that provide encapsulation units cite examples of how the programmer can use the encapsulation unit to provide an abstract data type. An encapsulation unit could narrowly be defined as a “data abstraction facility.” However, encapsulation units can be used to provide more than just data abstraction. Many alternate uses of these units are envisioned. For this reason the more general term “encapsulation” is used in this paper as opposed to the term “data abstraction. ” The most important feature of an encapsulation unit that allows it to do more than just provide abstract data types is the persistence of values of its local variables between calls to its procedures and functions. By 287

The Journalof 0

Systems and Sofhvare 7. 287-295 (1987) 1987 Elsevier Science Publishing Co., Inc., 1987

0164-1212/87/$3.50

288

D. Joyce

employing this feature, an encapsulation unit can be given an internal state. It can therefore control its response to calls based upon not only the call statement and the call parameter values but also on the previous history of call sequences and parameters. 2. THE GUIDELINES The design of a software system should be tailored to ensure proper and complete use of the modularization facilities of the implementation language. Consider the following points: Proper use of a language’s modularization facilities is perceived as being “good.” Translation of a design into code is aided by use of a programming language that supports the structures identified during the architectural design process

U81. Evidence suggests that knowledge of the relationship between a language’s modularization features and an architectural design technique can enhance a designer’s understanding of the technique, resulting in a better design process and structure [2]. Based on the premise that the architectural design of a software system should encourage proper and complete use of a language’s modularization mechanism, a set of guidelines pertaining to the use of encapsulation units was identified. The prime impetus for the guidelines was a desire to reduce the future cost of program maintenance by encouraging the development of understandable and modifiable programs. The guidelines are a synthesis of many current software engineering ideas that the software designer should keep in mind during the architectural design process. The guidelines are certainly not complete, but they should furnish a foundation upon which further ideas can be built. The guidelines are not intended to be unbendable rules, nor are they intended to provide an exhaustive design methodology. However, if a problem is first approached with consideration of the guidelines, major design units should be identified and the architectural and detailed stages of design will hopefully proceed more smoothly. Some of the guidelines help identify a specific design unit that should be considered for inclusion in the design and that can be easily implemented with an encapsulation facility. Other guidelines are general concepts to be followed in order to take full advantage of the properties of encapsulation units. Guideline One-Use

Information Hiding

The term “information hiding” was coined by David Parnas in the early 1970s [14, 151. Parnas espoused the importance of program units being designed so that the

only information they can use about other units are the calling conventions and expected actions. All other information should be hidden within the units. Pamas emphasized that the hiding should be syntactically enforced, i.e., that attempted use of nonessential information be caught during compilation. Encapsulation units can be used for this purpose. The information hiding guideline is to be interpreted as a general philosophy to be followed while designing a system. When dividing a system into units the designer should keep in mind at all times that only essential information should be used about a unit by other units. The extent of what is considered essential information should be effectively minimized. Interface information should be explicitly enumerated at the design stage so that the designer can ensure adherence to this guideline and make decisions about the overall design strategy based on the amount and type of interconnections. Early enumeration of interface information also helps aid in the smooth translation of design into code. Guideline Two-Use

Data Abstraction

Data abstraction refers to the use of abstract data types (ADTs) [7, lo]. An ADT can be considered an extension of the traditional concept of type, because the implementor defines a set of values for the ADT, together with the set of legal operations. A typical example is a user defined stack type along with user coded procedures for operations on the stack such as push and pop. An ADT restricts direct access of a set of objects (structures or simple variables) to a limited number of procedures/functions. These supply the operations on the ADT. Obviously the concept of abstraction is involved. Knowledge of the structure of the ADT’s data objects and implementation details of the operations are of no direct concern outside the realm of the ADT. Thus the user of the ADT need only concern himself with the underlying essential properties of the data. Data abstraction helps limit the scope of changes in a program. Changes to the ADT’s structure or operation implementation details, which do not affect the specification of the ADT, can be made with no explicit effect upon the rest of the system. Thus, data abstraction use increases the modificability of programs [ 111. To use the data abstraction guideline, the designer should identify an abstract view of major data objects, possibly including the input and output data streams, as early as possible in the design process. For each data object he should use an encapsulation unit whose sole purpose is to provide the data abstraction. Procedures within the encapsulation unit that can be called from outside the unit should be used to provide the operations on the data structure. All implementation detail should be effectively hidden.

Guidelines for Using Encapsulation Units

289

DEFINITION MODULE INTSTACK; EXPORT QUALIFIED STACK,PUSH,POP,EMPTY,INIT; TYPE

STACK;

PROCEDURE PROCEDURE PROCEDURE PROCEDURE

(* opaque type . . . structure hidden within corresponding implementation module *) PUSH(INT: INTEGER; VAR S: STACK); POP(VAR S: STACK):INTEGER; EMPTY(S:STACK):BOOLEAN; INIT(VAR S:STACK);

END INTSTACK

Figure 1.

ADT

stack.

Figure 1 contains a short example of how an ADT stack of integers could be defined in Modula-2. The encapsulation facility of Modula-2 is the module. The “definition part” of a module consists of the information about constructs within the module that must be made visible to other system units, i.e., the required interface information. Guideline Three-Model

dependencies on resources that are outside the system. This includes machine dependencies, input/output details, etc. In this way, if resources change or if the system must be ported to a different environment, the necessary code changes should be easily identified. A similar method for reducing the cost of maintenance is to anticipate requirements changes that might be necessary during the life of the system and to structure the original design to facilitate such changes. If a particular requirements change is expected, it may be possible during the design stage to collect within a single encapsulation unit all parts of the system that would be directly affected by the change. In this way the future change would be more easily implemented. A related way to facilitate future system maintenance is to collect within an encapsulation unit all parts of a system related to a particular design decision. Then during the development of the system any necessary restructuring of the design decision would be facilitated [14].

Real World Objects

If a program system is a model of a real world process, and there exists in that process objects with associated operations, then the program should contain encapsulation units that model these objects and operations. This can be accomplished in some cases by using an encapsulation unit to provide an abstraction of an object as an ADT. The operations supplied on the ADT should coincide with the real world operations that are automatically associated with the real world object. Such a design should facilitate the later understanding of the program by a maintenance programmer because he already may possess knowledge about the objects and operations. This design technique has recently attracted a lot of attention and is called “object oriented design” [5]. (See Figure 2 for an example of the definition part of a Modula-2 module that provides a simple ADT “bank account.“) Guideline Four-Isolate Resources, Requirements Likely to Change, and Major Design Decisions One way of limiting the scope of changes needed during maintenance is to isolate within encapsulation units any

Guideline Five-Capture Transformations

“Data independence’ ’ is a term from the realm of database concepts which means “immunity of applications to change in storage structure and access strategy” [6]. The same benefits provided by data independence within a database system can be obtained by a form of data independence within an application program. In many cases, programs can be viewed as a series of transformations on the input data that result in the output information. If each specific transformation in the series could be hidden in a unique encapsulation unit, then a form of data independence would exist for the application program. In other words, one unit would be immune to changes in the structure and/or access strategy (operation implementation) of another. An example of a series of transformations, each of which could be provided by a unique encapsulation unit, is shown in Figure 3. The goal of data independence within an application program is attained by capturing within an encapsulation unit all the details necessary to transform data from one form to another. The extent of the transformation should be determined by using standard design considerations. DEFINITION MODULE EXPORT QUALIFIED ACCOUNT,OPEN, TYPE

Figure 2.

ADT bank

account.

Major Data

BANKACC; CLOSE,

WITHDRAW,

DEPOSIT,

QUERY;

ACCOUNT;

PROCEDURE PROCEDURE PROCEDURE PROCEDURE PROCEDURE

OPEN(VAR ACC:ACCOUNT); CLOSE(VAR ACC:ACCOUNT); WITHDRAW(VAR ACC:ACCOUNT; AMOUNT:REAL); DEPOSIT(VAR ACC:ACCOUNT; AMOUNT:REALb: QUERY(ACC:ACCOUNT):REAL;

END BANKACC

290

WORD LIST

D. Joyce

_

ORDERED -WORD LIST

r-l

TRANS 2

+

If possible, only one basic transformation should be provided by each unit. The result of such a design will be a system that is easily maintained because of the independence of the system’s design units. For example, if a particular transform is deemed inefficient, it could be replaced by a more carefully designed though functionally equivalent transform without affecting the rest of the system. Also, if the structure of the data storage is altered at some point, the effects of the alteration would not reach beyond the next transformation point. Both the data flow design technique [ 121 and the Jackson design technique [8] can lead to program decompositions that isolate data transformations and thus provide data independence at the design stage. There are many reasons why the encapsulation unit is an ideal language mechanism for implementing a data transformation design unit. The strong information hiding capability that the encapsulation unit provides heightens the data independence aspect of the transformation. The fact that data local to an encapsulation unit persists from one call to the next is valuable because the unit can then act like a buffer in cases where the transformation is not a simple one-to-one mapping. The fact that the encapsulation unit can contain its own local procedures and functions will aid in the clear description of the implementation of complex transformations. Finally, the explicit control over interface information required by the encapsulation unit provides a natural means of documenting the input and output data of the transforms. Guideline Six-Generalize An intrinsic part of the value of design with encapsulation units is that library units, i.e., units that have already been designed and tested, can easily be used. The first step in the architectural design of a program is the identification of the units to be used. The next step is the determination if any of the identified units or similar ones already exist on the system. One is more likely to find the required units if a general view is taken of their function than if a narrow specific view is taken. For example, if a unit is required to provide a specific type of tree structure, it is possible that a pre-existing unit that provides for the general tree structure can be utilized. In any case, the services of a system librarian or some sort of design unit information database would be beneficial in locating and using pre-existing units. Aho, Hopcraft, and Ullman [I] exhort a programmer to “be a toolsmith.” They believe that when designing a

WORD, FREQUENCY LIST

Figure 3.

program unit one should determine if the unit could be written in a more general way with a little extra effort. The resultant program would then serve as a tool, i.e., as a program unit with a variety of uses, which could be used by others more easily than if it had been designed for a specific task. This approach is essentially the converse of what is described in the previous paragraph. When searching for a reusable unit one should take a general view of what one needs in order to increase the chances of finding something. Conversely, when designing a unit one should do so in a general way, so that the unit created will be more easily used in other applications. It is this approach that is suggested by the sixth guideline. In addition to the facilitation of library use, adherence to this guideline may sometimes simplify the detailed design of an encapsulation unit. A general view of a unit’s function may lead to the recognition of suitable, well-known, data structures and algorithms for its implementation. 3. AN EXAMPLE The purpose of this section is to demonstrate the use of the design guidelines by means of a sample problem and its solution. The sample problem is defined along with an overview of a solution derived from standard functional decomposition techniques. Next, an outline of the steps that were followed in using the guidelines to help solve the problem is described. Finally, a comparison of the two solutions is presented. The problem involves modeling the actions of customers entering a bank’s system of teller queues. The solution is to generate statistics about bank activities such as the average amount of time a customer spends in the bank. The problem is described in detail on page 190 of the text, Data Structures Using Pascal, by Tenenbaum and Augenstein [ 171. In short, a bank system with four teller queues is to be modeled by a program. Input to the program will consist of the entry time and required teller service time for each person in a list of people who will “enter” the bank. The input is assumed to be ordered by entry time. A person entering the bank should go to the shortest teller queue. From the solution detailed in the textbook it is obvious that the authors used a functional top-down refinement approach. As a result, the code for implementing the queue and list operations is distributed throughout the program. For example, the procedure that inserts an element into a queue is contained within another

291

Guidelines for Using Encapsulation Units procedure that handles the arrival of a customer, this being the only place where an insert into the queue operation is required. This is exactly the kind of procedural hierarchy one would expect to be created from the standard approach. The final syntactic decomposition of the textbook solution, along with statement count information, is shown in Figure 4a. If the guidelines are brought to bear on the Bank Simulation Problem, a solution significantly different than that produced by Tenenbaum and Augenstein can be created. Such an alternative solution and the steps followed during the design process are described below. For this solution, the Modula-2 local module facility was used as an encapsulation unit. The first step in designing the program was to conceptualize the highest level of program decomposition. The overriding guideline to be followed for this step, in this example, is guideline 3, Model Real World Objects. Thus, the decomposition of the problem was based on the construction of software modules that model the bank queues, the bank itself, and the line of customers outside of the bank, these being the primary real world objects involved in the problem. Following guideline 1, Use Information Hiding, and guideline 2, Use Data Abstraction, each of the models was developed as an Abstract Data Type, i.e., each of them provided only a set of operations to the rest of the system, with the implementation details hidden from view. The choice of using a module to model the line of customers outside the bank was also influenced by guideline 5, Capture Major Data Transformations, because the module will transform the input lines to person records. Consideration of each of the identified data abstrac-

4. Bank simulation solutions. Indentation indicates syntatic subprocedure hierarthy. Boxes indicate module boundaries. Arrows indicate exported procedures. Numbers in brackets indicate number of executable statements. (a) Functional decomposition; (b) Guideline based. Figure

tions is done separately. By keeping in mind the problem definition, it is not difficult to refine the description of each abstraction. The description should include a summary of the data encapsulated by the abstraction and a list of the required operations on this data. These operations, including high-level descriptions and probable interface details, should be carefully noted even in the early stages of design. As an example, the module used as a model of the bank provides procedures for having a person enter the bank and leave the bank, for determining the next time a person should leave the bank, and for determining whether or not the bank is empty. As design progresses, the operation descriptions and related interfaces will become more precise and refined. The design process continues with the development of a high-level pseudocode solution to the problem. This solution should make use of the operations lists already identified for each module. Coinciding with the refinement of the pseudo-code would be the refinement of the module’s operations lists and definitions. During refinement, the pseudo-code very quickly takes the form of the actual program code that ultimately will be used. Once the definitions of the operations lists have been completed and the interface details have been finalized, detailed design of each of the modules can proceed. Since these modules provide models of real world situations, the driving concern during detailed design is to make sure that the appropriate real world situation is always properly modeled by the values of the module’s local data structures. This is easy to do because values of local module variables persist between calls to the module. All local variables are initialized to their correct starting values by the initialization code of the module.

GETNODE [‘I FREENODE [Z] 0 EMPTY [l] PLACE [ll] PUSH [5] INSAFTER [a] POPSUB [7] ARRIVE [22] INSERT [IO] DEPART [9] REMOVE (121 MAIN [45]

Fib + + +

ENQUEUE [ll] DEQUEUE [la] NEXTTIME [l]

BANK MODULE + ENTER [12] + LEAVE I81

MAIN [23]

[d]

292

D. Joyce

The code for any operation performs its function by returning to the caller any necessary values and by leaving the module in the correct state, i.e., by updating the values of any pertinent local module variables. Thus the module will always correctly model the real world situation. The decomposition of the guideline based solution along with statement count information is shown in Figure 4b. It is obvious that the functional decomposition solution and the guideline based solution to the problem differ greatly in their basic program structure. Further clarification of this difference is provided by a comparison of the two solutions based on the values of the program metrics shown in Table 1. The most notable metric difference is the number of references to global variables from within procedures: 35 in the functional decomposition solution and none in the guideline based solution. All variables that must be shared by several procedures, in the guideline based solution, are hidden within the appropriate module. It is believed that this greater locality of variable reference makes the guideline based solution easier to understand and change than the standard solution. 4. THE EXPERIMENT An experiment was conducted at Villanova University to investigate the effects of guideline and encapsulation unit use during the design and coding stages of the software life cycle on the maintenance stage. Twentyseven subjects were matched on performance in a preexperimental task and then randomly divided into three groups. Each subject was given a program to study. One experimental group’s program was coded with the use of encapsulation units from a guideline based design, another’s was coded from the same guideline based design but without the use of encapsulation units (the units were mimicked with global variables, procedures,

and comments), and the third’s was the result of using a top-down, functional refinement design. Subjects’ performance on several tasks such as answering questions about the program and performing adaptive maintenance were measured. The experiment was repeated for three separate programs, with group members switched each time, providing supplemental within-subject data. The statistical analysis of the experiment was inconclusive. An analysis of variance applied to eight performance related dependent variables for each of the three phases of the experiment only produced a few scattered statistically significant results, from which no conclusions could be drawn about the value of guideline use or encapsulation unit support. Furthermore, a within-subjects correlation analysis showed that the relative performance of an individual, in terms of the time taken to understand the original program, design the maintenance related changes, and debug their updated programs, was consistent across groups. Figure 5 shows the correlation between a subjects performance while in one group to his performance while in another on these three time related measures. Also shown in the figure is the probability significance level of the correlation, i.e., the probability that the observed correlation between matched groups is due to chance. A more detailed description of the experiment and the statistical analysis can be found in [9]. In view of the correlation results, it might be concluded that guideline use and language support did not have any positive effect on the maintainability of the programs. However, a detailed investigation of the programs produced by the subjects to implement the assigned adaptive maintenance, uncovered some impor-

Figure 5. Within-subjects statistics. XX: no guidelines; GX: guidelines without language support; GL: guidelines with language support. Dependent Variables: UNDT: time to understand original program; MNDT: time to design maintenance updates; EXTT: time to debug the revised program,

Table 1. Bank Simulation Solutions

Metric Modules Procedures

Statements Statements/Procedure Parameters” Global Variables Local Variables Local Module Vars Global Var Reference@ Module Var Referencesb Lines of Declarations a Function reham value counts + 1.

bFrom

within

procedures.

Functional Decomposition

Guideline Based

0

3 15 115 5.2 18 4 12 9 0 34 78

11 139 8.5 20 14 14 0 35 0 48

c MNDT

C0rr

Prob

XX-GX

.56

.Ol

XX-GL

.55

.02

GX-GL

.49

.04

Guidelines for Using Encapsulation Units tant and surprizing relationships among the solution approaches used by the subjects of the three groups.

293 Table 2. Solution Approach Counts

corrupt 4.1 Effect of the Design Basis An analysis of the final programs produced by the subjects during the first stage of the experiment showed the most obvious and interesting differences across the groups. The Bank Simulation Problem and its solutions described previously were used during this phase of the experiment. Part of the maintenance task assigned to the subjects was to update the program so that whenever it was advantageous, a bank customer could “jump” bank queues. This update was not trivial, due to the need to handle the special case of jumping into an empty queue. There were four basic approaches taken by the subjects: l

l

l

l

Solution 1: “CORRUPT DATA.” Change certain data values so that they are not equal to what they should represent during the entire course of the program’s execution. However, by the time the program is finished running, all values balance out and thus correct output is achieved. Solution 2: “CORRUPT MODEL.” Insert the code that tests for and implements the jump before the code that implements the departure of the customer that makes the jump advantageous. In this case there is no possibility of having to jump into an empty queue, and the problem is greatly simplified. Solution 3: AUXILIARY ENQUEUE. ” Create alternate enqueueing code to be used only when a person is jumping queues. Solution 4: “REVISE ENQUEUE.” Revamp the current enqueueing code to be more general in its actions so that it can be used to handle a new person entering a queue or a person jumping in from another queue.

The first two solutions seriously weaken the integrity of the program design. Both solutions work, but in each it is a case of two wrongs making a right, or two inconsistencies canceling each other out, resulting in functional correctness. Future maintenance programmers could have trouble in dealing with either the inconsistent variable use or the incorrect modeling of the order of events. However, in terms of the effort required to perform the updates, both of these solutions are cheaper to implement than the third or fourth solution. Any of the four solution approaches could have been used within each of the original programs. A study of Table 2 reveals an interaction between the basis of the original design of the program and the solution approach adopted by the subject. Of the nine subjects who had a functional decomposition based original design, seven used one of the “corrupt” solutions, whereas only two of the 18 subjects whose original designs were based on

Group

Solution

Other Solution

No guidelines No S”Ppofi

7

2

1

8

Both

1

8

the guidelines did the same. A chi-square test confirms that the differences between the groups are significant at the .Ol level. It is likely that the structure of the guideline based programs dissuaded subjects from using the corrupt solution approaches or perhaps even kept them from discovering these solutions in the first place. 4.2 Effect of Language Support In addition to the solution comparisons made for the Bank Simulation assignment, final programs from all three phases were studied to see if the use of language support for the guidelines had any obvious effect on the update implementations made by the subjects. For this purpose, only the two groups that were given guideline based programs were compared. The only difference between the original programs given to these groups was the use or nonuse of the Modula-2 module facility. Within the programs given to the “without language support” group, global variables, procedures, and comments were used to mimic the existence of modules. Therefore, both groups were given programs in which there were obvious “boundaries” between parts of the program identified as design units. An investigation focusing on whether or not these boundaries were respected and preserved during the maintenance process was conducted. Five potential boundary transgressions were identified for study. The transgressions involved the misuse of a design unit’s local variables by code that was outside the unit. Note that even though a variable is hidden within a module, a maintenance programmer with access to the module can make the variable visible to the outside and therefore has the potential to transgress the boundary. Table 3 shows for each of the five cases the number of subjects from each group who improperly accessed the hidden variables and who properly accessed the hidden variables. These were a total of nine subjects in each group. The cumulative totals shows that the group without language support misused the variables in 53% of the cases, while the group with language support only misused the variables in 24% of the cases. A chi-square

294

D. Joyce

Table 3. Counts of Subjects Who Improperly and Properly Accessed Hidden Variables Without language support group Case

Improper

Proper

1 2 3 4 5 Total

2 4 6 3 9 24

I 5 3 6 0 21

With language support group Improper 3

Proper 6

1

8

5 0 2 11

4 9 7 34

test confirms that the differences between the group’s totals are significant at the .Ol level. It appears that the use of the module facility helped preserve the integrity of the original design of the programs. It should be noted that the “improper” solutions represent easier solutions in terms of the amount of coding necessary to perform an update. Thus it appears that the use of an encapsulation facility to support the modularization of a program will help ensure that the intended modularization of the program will not be ignored during maintenance updates for the sake of quick and easy fixes.

5. CONCLUSIONS During the course of the research described in this paper, many programs were developed based upon the guidelines and encoded with the aid of encapsulation. It was felt that the identification of the major design units for these programs was greatly facilitated by the application of the guidelines and that translation of the design into code proceeded smoothly due to the strong support for the design concepts supplied by encapsulation. The numerical data collected during an experiment did not provide the expected corraboration of the value of the guidelines arid the use of encapsulation. In retrospect, many factors can be identified that may have served to bias the experiment against the groups working with the guideline and encapsulation related programs. Consider the following: 1. Subjects were not given any instruction in the design philosophy behind the guideline based programs. Nor did the subjects have any experience in working with such programs. 2. The only experience the subjects had with encapsulation units was provided by two short preliminary projects. 3. The maintenance tasks assigned during the experi-

ment were explicitly designed not to favor the guideline based programs. For example, tasks requiring code changes that could be completely isolated within a single design unit were not used. This is in direct contrast to one of the underlying contentions of the guideline based design philosophy, namely that many typical maintenance tasks will be restricted to a single design unit. 4. It is commonly understood that the major benefits of encapsulation will be derived when working with large programs [4]. While the programs used in the experiment were much larger than those typically used in classroom based experimentation, averaging 870 lines including comments, they may have still been too small for the benefits of the guidelines and encapsulation to become observable. 5. There are many claims as to the advantages of using encapsulation units other than those directly addressed in this research. These claims include the facilitation of: library use, sharing of high-level design constructs across programs, team programming, translation of design into code, and the development of program proofs. Even though the analysis of the dependent variable data from the experiment did not provide any proof of the value of the guidelines and encapsulation, a detailed analysis of the subjects’ final programs did indicate that programs based on the guidelines emerge from the initial maintenance process with fewer inconsistencies than programs not based on the guidelines. Furthermore, it was shown that use of language support in the form of an encapsulation unit helps preserve the integrity of the original design of a guideline based program during maintenance. These results take on added significance when it is considered that about 70% of the overall cost of software is spent on maintenance [16] and that one of the major goals associated with maintenance is the “preservation of the structural integrity of the initial systems design’ ’ [ 131. Belady and Lehman [3] created a model for large program development based on a study of data from updates to O.S.360. Two “laws” that they developed are relevant to the observations about the preservation of program integrity during the maintenance of guideline based, encapsulation supported programs. Their Law of Continuing Changes states: A system that is used undergoes continuingchange until it is judged more cost efficient to freeze and re-create it. and their Law of Increasing Entropy states: The entropy of a system (its

unstructuredness) increases

295

Guidelines for Using Encapsulation Units with time, unless specific work is executed to maintain or reduce it.

From these two laws it can be deduced that a software system will continually be changed, becoming less structured with each change, until such a time when it is too complicated to be of further use, at which time it will be recreated. If the increase of “entropy” of a software system can be slowed, the program will remain in service for a longer time, and therefore be more cost effective. Belady and Lehman suggest that such a scenario could be obtained by originally using “a design methodology that expresses the understanding and intention of the design unambiguously.” Results of the detailed study of the subjects’ final programs indicate that this goal is approached by the use of the guidelines and encapsulation. However, since only one stage of maintenance was performed on any program during the experiment, the effects of “entropy” reduction were not observed through the statistical analysis of the experimental dependent variables. While the data collected during the experiment did not clearly support the use of guidelines and encapsulation, neither did it clearly show that more traditional techniques are any better. Considering all the biases within the experiment against the guidelines and encapsulation, considering that many of the benefits of encapsulation were not used to advantage within the experiment, and considering the results of the detailed study of the subjects’ final programs, it is reasonable to conclude that the set of guidelines and the encapsulation unit are both valuable tools for the development of software. It is suggested that further research efforts could provide more convincing corroboration of the value of guideline and encapsulation unit usage. The following approaches rectify perceived shortcomings of the current experiment. In all cases it is suggested that a large number of subjects be used. 1. Expand the current experimental design to include a sequence of maintenance tasks to ascertain if the observed deterioration of the nonguideline based programs has any associated long-range costs. 2. Repeat the current experiment using programmers who have been instructed in the philosophy of the guidelines and who have ample experience in the use of encapsulation units. 3. Repeat the experiment using large programs and using maintenance teams. 4. Repeat the experiment using program systems from the “real world,” employing the actual history of their maintenance tasks.

REFERENCES 1. A. Aho, J. Hopcroft, and J. Ullman, Data Structures and Algorithms, Addison Wesley, Reading, MA, 1983, p. 27. 2. V. Basili et al., Monitoring an Ada Software Development Project: Newsletter 2, Ada Letters (August 1982). 3. L. Belady and M. Lehman, A Model of Large Program Development, IBM Systems J. 15, 225-252, (1976). 4. V. Berzins, M. Gray, and D. Naumann, AbstractionBased Software Development, Commun. ACM29,402-

415, (1986).

5. G. Booth, Software Engineering with Ada, Benjamin/ Cummings,

6. C. J. Date,

New York, 1983.

Vat I: An Introduction

to Database

Systems,

7.

third ed., Addison-Wesely, Reading, MA, 1982, p. 13. J. Homing, Some Desirable Properties of Data Abstraction Facilities, Proceedings of the Conference on Data:

Abstraction, Definition, and Structure, SIGPLAN Notices 11, 60-62 (1976). 8. M. Jackson, Principles of Program Design, Academic Press, New York, 1975.

9. D. Joyce, An Identification and Investigation of Software

10. 11.

12. 13.

14.

Design Guidelines for using Encapsulation Units, Ph.D. thesis, Temple University, 1986(7). B. Leavenworth, The Use of Data Abstraction in Program Design, IBM Research Report, RC7637(33034) (1979). T. Linden, The Use of Abstract Data Types to Simplify Program Modifications, ACM SIGPLAN Notices (Feb. 1976), pp. 12-23. G. Myers, Composite/Structured Design, Van Nostrand Reinhold Company, New York, 1978. G. Pa&h, The World of Software Maintenance, Tutorial on Software Maintenance, IEEE Cat EH0201-4, p. 8, 1983. D. Pamas, On the Criteria to Be Used in Decomposing a System into Modules, Commun. ACM 15, 1053-1058,

(1972). 15. D. Parnas and D. Siewiorek, Use of the Concept of Transparency in the Design of Hierarchally Structured Systems, Commun. ACM 18, 401-408, (1975). 16. I. Somerville, Software Engineering, Addison-Weseley, Reading, MA, 1982. 17. A. Tenenbaum, and M. Augenstein, Data Structures Using Pascal, Prentice-Hall, Englewood Cliffs, NJ, 1981, p. 190. 18. A. Wasserman, Principles of systematic data design and implementation, Tutorial on Software Design Techniques, 3rd Ed., IEEE Cat EHO 161-0, (1980). 19. P. Wegner, Programming with Ada: An Introduction by Means of Graduated Examples, Prentice/Hall Inc., Englewood Cliffs, NJ, 1980. 20. N. Wirth, Programming in Modula-2, Springer-Verlag, New York, 1983.