Assessing the maintenance process through replicated, controlled experiments

Assessing the maintenance process through replicated, controlled experiments

The Journal of Systems and Software 44 (1999) 187±197 Assessing the maintenance process through replicated, controlled experiments Giuseppe Visaggio*...

282KB Sizes 1 Downloads 53 Views

The Journal of Systems and Software 44 (1999) 187±197

Assessing the maintenance process through replicated, controlled experiments Giuseppe Visaggio* Department of Informatics, University of Bari, Bari, Italy Received 1 June 1997; accepted 1 April 1998

Abstract This work describes a controlled experiment comparing maintenance process derived from two di€erent paradigms: Quick Fix (Q.F.) and Iterative Enhancement (I.E.). It has been repeated twice with undergraduate students and once with professional developers (I.E.). The ®rst time served to improve the material for successive replications. The experiment aimed to ascertain the quality of the maintenance process in terms of the correctness and completeness of the modi®cations made and whether Q.F. processes are more timely than I.E. From the results of the experiments it would seem that on the one hand, the Q.F. is not appreciably more timely than I.E., while on the other, it results in lesser correctness, completeness and traceability. Thus, Q.F. damages the comprehensibility of the system more then I.E. does. In any case, the software quality must be safeguarded after maintenance, as even I.E. is not entirely free from harmful e€ects. Ó 1999 Elsevier Science Inc. All rights reserved.

1. Introduction Owing to the high cost of maintenance, this phase in the life-cycle of a software system has received close attention from the software engineering community. In Basili (1990) there are three paradigms: Quick Fix (Q.F.) makes the necessary changes to the code and then to the accompanying documentation; Iterative Enhancement (I.E.) modi®es the highest level document a€ected by the changes and then propagates these changes down as far as the code; Full Reuse (F.R.) builds a new system using components from the old system and others available in the repository. This study compared the Q.F. and I.E. paradigms. F.R. was not assessed partly because it is still relatively scarce in the industrial environment and also because suitable technology for managing the repository and reuse of software components is still lacking. The maintainer often prefers Q.F. type process models to I.E. type and adduces the urgency of the modi®cation to justify his choice. Unfortunately, being pressed for time, he often fails to update the documentation coherently with the modi®cations made to the code. This leads to internal inconsistencies being generated and renders the documentation inadequate for *

Corresponding author. E-mail: [email protected].

understanding the system. Since comprehension is essential for carrying out maintenance eciently and ef®caciously (Von Mayrhauser and Vans, 1995), the quality of the maintenance process tends to degrade and with it, the quality of the existing system. In short, a decline of the system rapidly ensues. The aim of this study is to make an objective assessment of whether the characteristics outlined below, generally credited to Q.F. type processes (Basili, 1990), really belong to them: · e€ectiveness in carrying out small changes, i.e. those involving few instructions, together with reliability and lack of damage to the quality of the system in this context; · greater timeliness than the other paradigms in rendering the modi®cations operative. Hence, the goal of the present study was to analyze the characteristics of the Q.F. and I.E. as regards their relationship with the level of comprehensibility of the resulting system from the maintainer's viewpoint. The study was performed by means of three controlled experiments. The ®rst two (Experiment I and Experiment II) were carried out with undergraduate students from the course on Software Engineering at the University of Bari and the third (Experiment III) with professional maintainers in a Public Administration from now on referred to as the Company, which maintains software developed by third parties. To persuade

0164-1212/99/$ ± see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 8 ) 1 0 0 5 6 - 0

188

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

the software engineers of this Company to participate, their anonymity and that of the Company had to guaranteed. Experiment I served to hone the tools used, while Experiment III served to verify the repeatability of the experiment in a production environment and to make a ®rst comparison between the results obtainable in a academic or a production milieu. In the rest of this paper the acronyms Q.F. and I.E. are used to indicate both the paradigms and the corresponding process models used. Section 2 of this study contains a description of the experiment, the experimental material produced, its design, the preparation and execution of the processes and the metrics used to assess the results of the experiment. An overview of Experiment I is given in Section 3, leading on to the reason behind the changes to the maintenance requirements introduced in the second and third replications. In Section 4, the results of the last two experiments are described as regards the dependent variables: completeness, correctness, timeliness and traceability. The ®nal section draws the conclusion and outlines the direction of future work. 2. The experiment Experiment I and Experiment II were carried out by students in the third and fourth undergraduate years following the course on software Engineering over two successive academic years. They had studied the maintenance paradigms used as references and each of the derived process models. The requests for change implemented during the experiment referred to two di€erent projects of the same application, named EDIGEST (System for Managing a Publishing House). The system had been produced by the student following the previous academic year's course on Software Engineering. None of the students taking part in the experiment belonged to the two study groups which had produced the systems used in the experiment. Experiment I and Experiment II di€ered as regarded both the requests for change the students had to satisfy and the extent of their relative impact. Experiment III was conducted using the same materials as had been used in Experiment II. The participants in this last experiment were all professional developers whose activities largely involved software system maintenance. Their working experience in the maintenance ®eld ranged from a minimum of two to a maximum of six years. It was possible to select as participants only those with degrees in computer science and, fortunately, they were all familiar with the programming language used in EDIGEST. In any case, they had been informed that this was the language involved in the experiment and had independently brushed up their knowledge of it.

2.1. Material The material used comprised the documentation of the project model and the implementation model of three di€erent systems. The ®rst is a System for Managing Mini-football Fields (MAFOOT) and was used during training for the experiment, while the other two are di€erent versions of the same system, named EDIGEST. The project model includes: · the structure chart describing the use and call relationships between the modules and the data they exchange in calls; · the speci®cations of each module in structured language; · the dependency diagrams between the data showing the independent data, the dependent data and what data they depend upon, with what cardinality; the partial and total dependencies are also represented (Smith, 1985); · the normalized tables and navigation paths between them, obtained from the preceding dependency diagrams; · the data dictionary with the semantics of all the data handled by the modules and of those present in the dependency diagrams, i.e. those constituting the tables; · the table of cross reference among the tables and the modules that use them. The implementation model includes: · the code of all the modules present in the structure chart, suitably commented, written in PASCAL; · the full data dictionary including the semantics of the local data of each module and the internal representation of all the data. The local data are volatile and not therefore present either in the data bases or in the interfaces between the modules. 2.2. Design The controlled experiment had four independent variables: Maintenance paradigm: Q.F., I.E.; Maintenance rounds: two, using the two di€erent projects for implementing EDIGEST, and a di€erent maintenance paradigm for each group; Types of maintenance performed: Corrective and Adaptive, which are the maintenance types most frequently requiring timely accomplishment; Team selection: two 15-members groups (A and B), randomly chosen. There were four dependent variables: Correctness: how many components needing modi®cation were correctly identi®ed; Completeness: how many components needing modi®cation to satisfy the requirement for change were correctly modi®ed;

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

Timeliness: how rapid the execution of the software modi®cation was; Traceability: how far the modi®cation maintained the consistency between the design and the implementation. The experimental design involved a partial factorial plan whereby each group executed both corrective and adaptive maintenance with each paradigm, alternately; one group used the Q.F. paradigm in the ®rst session and the I.E. in the second, while the other group did the reverse. Table 1 shows the organization of the experiment. The maintenance paradigm constituted the main treatment. The other independent variables assessed di€erent threats to the internal validity of the experiment. The two di€erent rounds made it possible for all the student to use both paradigms, while avoiding their getting tired. Because the maintenance paradigm and the software system were di€erent in the second round, the student could not rely on the knowledge acquired in the ®rst round to improve his performance in the second. It should be noted that although the two systems used in the two rounds had the same objective, their design and hence their structure were di€erent, and thus from the point of view of analysis of the realization of the changes and of their impact, it was as if two di€erent systems had been used. This was done to reduce the e€ects of the independent variable maturity. The same types of maintenance were performed in both rounds to ensure that the data collected for each person were consistent in both trials. Each person used the processes belonging to the two paradigms in a different order, so as to reduce the e€ects of presentation. Data collection was performed with the same procedure throughout to prevent the results from being a€ected by any changes in measuring the e€ects of the observables and calculating the dependent variables. Thus the e€ects of changes in the instrumentation variable could be avoided. The participants making up the two groups were chosen entirely randomly and had analogous characteristics in terms of knowledge of the process model, techniques and systems used. There was therefore no selection bias. 2.3. Preparation and training The students involved in the experiment had a knowledge of the system models mentioned in the

189

previous section, of the techniques used to construct them and the best way to use them. The professional developers were given a preliminary seminar in which the reasons for the experiment were explained and their anonymity was guaranteed both as regards publication of the results and making them known within the Company. To ensure that the professional developers, too, were suciently conversant with the design techniques and representation models of the project used by EDIGEST, four hours of reinforcement lessons were held. All participants were given a two-hour lesson on the techniques for analyzing the impact of the changes and the maintenance paradigms used in the experiment (Arnold and Bohner, 1993; Abbattista et al., 1994; Lanubile and Visaggio, 1995). After these lessons, they trained in the use of the two paradigms using the ®rst system, carrying out two rounds of practical in exactly the same conditions as the experiment to follow. At the end of the training, the participants conducted a trial experiment on a system called MAFOOT. The quantitative and qualitative results were analyzed to understand what diculties the participants had encountered. After this analysis, a further two-hour lesson was found necessary in all the experiments as reinforcement of the maintenance process and the method for carrying out the modi®cations. 2.4. Execution In each round, the people executing the Q.F. were ®rst given the components of the implementation model and when they declared they had ®nished, they asked for the project components, to be able to propagate the modi®cations on to them. This served to check the process adopted, the time required for modifying the code and those for propagating the modi®cations to the documentation. Those executing I.E., on the other hand, received ®rst the components of the project model and then those of the code. In Experiment I each round lasted a maximum of three hours. The time limit was eliminated in the successive replication, for reasons which will be examined below. Obviously, those ®nishing earlier could hand in their work as soon as they wished. There was a week's interval between the two rounds, during which maintenance was not discussed with the participants.

Table 1 Organization of the experiment Maintenance paradigm

I round

II round

M1: Adaptive

M2: Corrective

M3: Adaptive

M4: Corrective

Iterative Enhancement Quick Fix

A B

A B

B A

B A

190

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

2.5. Metrics

3. Overview of Experiment I

Modi®ed Component Set (MCS): the set of components modi®ed by the participants, Expected Component Set (ECS): the set of components to be modi®ed for correct, complete execution of the change required, Correct Component Set (CCS): the set of components belonging to the MCS which were modi®ed correctly, Completeness: COMP ˆ #(MCS \ ECS) /#ECS, Correctness: CORR ˆ #(CCS \ ECS)/#ECS, where #(MCS \ ECS), #(CCS \ ECS), #ECS indicated the cardinality of the corresponding sets. The completeness factor expresses how well the analysis of the impact was made. A score of 1 was given when the components individuated corresponded to all those which had to be modi®ed, and only those. Correctness expresses how correctly the modi®cation was made. If CORR ˆ 1 then the modi®cation was correct. If components which did not belong to the set to be modi®ed were individuated, such a modi®cation was judged incorrect. To assess the timeliness of the modi®cation, the following metrics were used: · Tc time needed for the code modi®cations with the Q.F. process to become executive · Td time needed for completing the design modi®cation in the Q.F. process · Tq , Ti , total times needed for executing the modi®cations with the Q.F. and I.E. processes, respectively. To measure the degradation of the system, metrics on the quality of the project or code would not have been particularly signi®cant because the changes required involved only few instructions per module and the quality of the modi®ed components or the structure could not be signi®cantly altered. Hence, the quality parameter assessed was Traceability between design and code. In fact, if traceability is not maintained, the design becomes inconsistent with the code and the former cannot be used to understand the latter, creating a vicious circle leading to degradation of the software system. The metrics used to assess traceability were: · Ns , number of subjects maintaining traceability · N, total number of subjects · Tr ˆ Ns /N ´ 100, percentage of correct traceability

This paragraph presents an overview of the results of Experiment I and justi®es the changes introduced in Experiment II and III; the results of the University experimentation are then compared with those obtained in the productive environment. To represent the data distribution, the diagram used was box-and-whiskers or box-plot illustrated in Fig. 1 Each data set is represented by a box: the upper and lower borders correspond to the 25°(Q1 ) and 75°(Q3 ) percentile, respectively while the horizontal segment inside the box correspond to the 50° percentile (median) of the data set. The length of the box is equal to the interquantile range (Q3 ) Q1 ). The vertical segments at the edges of the box are called whiskers and represent the tails of the distribution; they extend from the box to the points corresponding to the theoretical upper tail value and the theoretical lower tail value. The ®rst is point Q3 + 1.5 ´ (Q3 ) Q1 ), and the second, point Q3 ) 1.5 ´ (Q3 ) Q1 ). The theoretical values are approximated to the nearest real value of the data set. The values outside the interval [Q3 ) 1.5 ´ (Q3 ) Q1 ), Q3 + 1.5 ´ (Q3 ) Q1 )] are called outliers and are represented by points. 3.1. Completeness Fig. 2 shows the box-plot of the completeness metrics for each of the paradigms. It seems that analysis of the impact of the modi®cation is invariably more reliable for the processes derived from the I.E. paradigm than from the Q.F. The measures for correctness are shown in the box-plot in Fig. 3. There are no substantial di€erences between the results of the two paradigms. These results can be explained by the fact that once a component to be modi®ed had been identi®ed, its correctness after the change depends on the programmer's ability rather than the maintenance paradigm. It should be noted that although the modules to be modi®ed were always quite

2.6. Impact of modi®cations To understand the experiment fully, the extent of the impact of the changes made needs to be known. Table 2 shows the extent of the impact of the changes made in Experiment II and III. The reason why these impacts changed is explained below. The experiment with the professional developers was performed using the same changes as in Experiment II at the University.

Fig. 1. Box-plot diagram.

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

191

Fig. 2. Completeness in Experiment I.

Fig. 3. Correctness in Experiment I.

small, the correctness of the modi®cation was often unsatisfactory. Smallness of the module does not seem to be a guarantee of correctness of the modi®cation. The mean times necessary for each requirement for change are reported in Fig. 4. Comparison shows that the difference between Ti and Tc reduces as the extent of the impact of the request for change increases. Thus it seems that the Q.F. paradigm is only more timely when the extent of the impact is small. The values of traceability Tr are shown in the graphs in Fig. 5, corresponding to each request for change carried out with both Q.F. and I.E. It can be observed that R is invariably higher for

I.E. These results suggested that if the impact of the modi®cations were greater and more di€erentiated, the two paradigms would give more widely divergent results, which would be easier to assess. For this reason, the modi®cations to be e€ected were revised for Experiment II and Experiment III, as shown in Table 2. Furthermore, many participants in Experiment I declared that if they had more time they would have been more certain of the correctness of their modi®cations. The time limit was therefore eliminated and the participants were left free to decide when the correction was complete and tested, or to stop when they were tired.

Fig. 4. Mean times in Experiment I.

192

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

Fig. 5. Traceability in Experiment I.

Table 2 Extent of the changes Experiments

Components

Modi®cations M1

M2

M3

M4

2 modules

3 modules

2 modules

3 modules modi®ed

Dependency Diagram, 1 map and 1 module

2 modules, the structure chart modi®ed

3 modules

2 modules modi®ed, 1 module created

2 modules

I

Project Code

1 table and 3 modules 3 modules

II and III

Project

3 modules modi®ed, 1 module created, the structure chart modi®ed 3 modules modi®ed and 1 module created

Code

4. Results of the second and third replications In this section, to assess how far the dependent variable is a€ected by an independent variable, one-way analysis of variance is made for each of the independent variables. For each independent variable, it is possible to formulate the null and alternative hypotheses as follows: H0 : there is no di€erence between the various treatment conditions expressed by independent variables with respect to the dependent variables. Ha : there is a di€erence between the various treatment conditions expressed by independent variables with respect to the dependent variables. The results of the analysis of variance are summarized in the tables (from Table 3 to Table 12) in which: SS represents the sum of squares and F the value of statistics used to the null hypothesis. The F distribution depends on the degrees-of-freedom pair (dfmodels , dferror ) and on the error probability a of rejecting H0 when H0 is true. In our cases: we established a ˆ 0.001; this means that we had probability 0.1% of making an error; dfmodels and dferror equal 1 and 206, respectively, for H0 to be rejected Fa P 10.83 (Hatcher and Stepanski, 1994; Hogg and Tanis, 1994). Two-way analysis was used to assess how a combination of variations of paired independent variables explain the variation of the dependent variable. This can help to understand if the in¯uence of each treat-

ment on the dependent variable is independent. The results of the last analysis are expressed by df, SS and F. In our cases we established a ˆ 0.001; this means that we had probability 0.1% of making error; dfmodels and dferror equal 1 and 200, respectively; for H0 to be rejected Fa P 10.83. 4.1. Completeness Fig. 6 shows the box plot of the completeness results for Experiment II. It can be seen that with the I.E. paradigm, 70% of the students obtained a completeness score of between 0.8 and 1; of these, 52.5% reached a completeness score equal to 1. Conversely with Q.F. paradigm only 50% of students obtained more than 50% completeness. To verify that this di€erence in completeness between the samples was really due to the maintenance paradigm, one-way analysis was performed, which yielded the results shown in Table 3. This demonstrates that the independent variables which signi®cantly in¯uence completeness are: the paradigm and the maintenance type. In turn, to verify that the dependency of completeness on the paradigm is independent of that on the maintenance type, two-way analysis was then made, obtaining the results in Table 4. These con®rm that the independent variables do not have a mutual in¯uence. Hence, the dependency of the completeness of the paradigm is statistically ascertained.

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

193

Fig. 6. Completeness in Experiment II. Fig. 7. Completeness in Experiment III.

Table 3 One-way analysis of Completeness in Experiment II

Table 5 One-way analysis of Completeness in Experiment III

Independent variable

SSModel

SSError

F

Independent Variable

SSModel

SSError

Paradigm Round Maintenance type Group

3.919 0.809 2.320 0.013

13.396 16.506 14.994 17.302

60.26 10.09 31.88 0.16

Paradigm Round Maintenance type Group

3.99 0.026 0.505 0

5.27 9.235 8.756 9.26

F 155.97 0.58 11.88 0.02

It could be hypothesized that the dependency on the maintenance type is not direct but derives from the extent of the impact; in fact, the adaptive modi®cations had greater impact than the corrective ones. The completeness ®gures for Experiment III are summarized in Fig. 7. As can be observed, there are no outliers and the completeness is concentrated around the highest values obtained in the previous experiment, even for the Q.F. paradigm. This can be explained by the greater experience of the developers, particularly in the use of maintenance processes of the Q.F. type, together

with a greater care and awareness of the importance of the impact analysis of the modi®cation, whichever paradigm was used. Table 5 shows that in this case, too, the di€erent results for completeness can largely be attributed to the maintenance paradigm. I.E. was found most ecacious as regards completeness, even for those people who already possessed good skills in the use of the Q.F. type paradigm. This is despite the fact that in practice the latter tends to be used virtually exclusively. Two-way analysis was also used in this case, yielding the results shown in Table 6, to verify that the paradigm and the type of change are independent and that the

Table 4 Two-way analysis of Completeness in Experiment II

Table 6 Two-way analysis of Completeness in Experiment III

Independent variable Paradigm Round Maintenance type Group Paradigm ´ Maint. type Round ´ Maint. type Maint. type ´ Group Error Total

SSModel

SSError

F

Independent variable

SSModel

SSError

F

3.919 0.809 2.32 0.013 0.466 0 0.079 9.708

3.919 0.809 2.32 0.013 0.466 0 0.079 0.048

80.73 16.66 47.81 0.27 9.61 0 1.62

Paradigm Round Maintenance type Group Paradigm ´ Maint. type Round ´ Maint. type Maint. type ´ Group Error

3.99 0.026 0.505 0 0.106 0.177 0.114 4.341

3.99 0.026 0.505 0 0.106 0.177 0.114 0.022

183.83 1.20 23.27 0.03 4.87 8.16 5.25

Total

9.26

17.315

194

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

Fig. 9. Correctness in Experiment III.

Fig. 8. Correctness in Experiment II.

4.3. Timeliness

relationship observed between the paradigm and the completeness does not depend on the type of change. 4.2. Correctness Fig. 8 shows a summary of the data for correctness in Experiment II. They are similar to those for completeness but less good because even when impact analysis was correctly performed, the changes made to each component could be incorrect. From the results of oneway analysis shown in Table 7, it is clear that the difference between the values for correctness is due to the two paradigms. As to the dependency of the results on the Maintenance Type, the same applies as for Completeness. The same considerations are valid for the professional developers. As can be seen from Fig. 9, there is less di€erence between completeness and correctness than for the students. Table 8 shows the results of the oneway analysis. Again, this result is due to their greater experience and skill in using the tools, which improves the quality of their product.

Table 7 One-way analysis of Correctness in Experiment II

Fig. 10 compares the timeliness of the changes with both paradigms in Experiment II. It shows that with I.E. 50% of the subjects concluded the changes required within 90 min whereas 25% took between 230 min and a maximum of 370 min. With Q.F., on the other hand, 50% of the subjects took 120 min, while all ®nished within about 315 min. From the above, it is apparent that in fact the median times for executing the changes with I.E. were markedly lower than those for Q.F. The greater range of times for I.E. could be due to the more complete impact analysis which induced many subjects to modify a greater number of components. The reason given for choosing Q.F. is normally that the modi®ed program becomes executive much faster. A modi®cation made with Q.F can be run straight after modifying the code whereas with I.E., the whole change must be completed. For this reason, comparison between Ti and Tc becomes meaningful. The Fig. 11 showing this comparison demonstrates that the median value for both metrics is alike: 90 min versus 85 min. Thus, the time taken for a change made with Q.F. to become executive is not noticeably shorter than that for I.E.

Table 8 One-way analysis of Correctness in Experiment III

Independent variable

SSModel

SSError

F

Independent variable

SSModel

SSError

Paradigm Round Maintenance type Group

4.385 0.465 2.404 0.01

14.285 18.204 16.267 18.659

63.23 5.27 30.44 0.12

Paradigm Round Maintenance type Group

4.024 0.061 0.604 0

5.176 9.138 8.595 9.199

F 160.15 1.38 14.48 0.01

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

195

Table 9 One-way analysis Ti ) Tq in Experiment II Independent variable Paradigm Round Maintenance type Group

SSModel 132.48 776.94 1700315.56 1765.56

SSError 2038589.96 2037945.5 338406.88 2036956.88

F 0.01 0.08 1035 0.18

Table 10 One-way analysis Ti ) Tc in Experiment II Independent variable Paradigm Round Maintenance type Group

Fig. 10. Timeless (Ti ) Tq of Experiment II).

SSModel 39821.56 2450.94 1485172 4051.56

SSError 1826909.96 1864280.58 381559 1862679.96

F 4.49 0.27 801.83 0.45

statistically signi®cant di€erences (Tables 11 and 12) between the two paradigms in this case either. 4.4. Traceability

One-way analysis shows that there is no statistically signi®cant di€erence (see Tables 9 and 10) between the values either Ti and Tq or Tc . The same measurements in Experiment III yielded the results shown in the graphs in Figs. 12 and 13. The times taken to execute I.E. were shorter and less spread out than in Experiment II. This can be imputed to the maintainer's greater experience, which allowed them to make even changes with a greater impact within more reasonable time-spans. Comparison between Ti and Tc prompts the same considerations as above. Consequently, one-way analysis does not reveal

The values for Tr obtained in Experiment II are shown in Fig. 14. Tr is invariably better for I.E. The greater traceability for M2 may derive from the 1:1 ratio of the modi®cations to the code and those to the project. The low values for Tr with Q.F. imply that after changes the code loses its consistency with the documentation and so the latter can no longer be used to understand the software system better. In short, the quality of the system degrades. It should be noted that even for I.E., Tr was not equal to 100. This means that modi®cations trigger system degradation even with this paradigm. This is an open problem which will have to be solved.

Fig. 11. Timeless (Ti ) Tc of Experiment II).

Fig. 12. Timeless (Ti ) Tq of Experiment III)

196

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

Table 11 One-way analysis Ti ) Tq in Experiment III Independent variable Paradigm Round Maintenance type Group

SSModel 1617.31 3672.48 1045689.92 52

SSError 1213172.52 1211117.35 169099.9 1214737.83

F 0.27 0.62 1273.87 0.01

Table 12 One-way analysis Ti ) Tc in Experiment III Independent variable Paradigm Round Maintenance type Group

SSModel 12648.48 1993.92 931836.94 536.33

SSError 1097225.75 1107880.31 178037.29 1109337.9

F 2.37 0.37 1078.19 0.10

This same experiment conducted with Experiment III con®rmed the same tendency, as shown in Fig. 15, but yielded worse values for Tr. This deterioration in the Tr values could be due to undisciplined behavior on the part of the professional developers when using the paradigms, particularly Q.F. Although they had promised to execute the maintenance processes correctly and completely, during the experiment the habit of rushing through the updating of the documentation without particular care probably prevailed, in the interests of greater apparent eciency. 5. Conclusions The results of these experiments impinge on some common assumptions made about the Q.F. and I.E. paradigms and provide some indications as to how to improve the maintenance process. The Q.F. is only more timely than I.E. when the impact of the modi®cation involves few components. This is true even when comparing the time necessary for the modi®cation to become operative, which for Q.F. is only the time necessary for changing the code but for I.E. means the whole time necessary for changing all the system representation models, i.e., the entire process.

Fig. 13. Timeless (Ti ) Tc of Experiment III).

Analysis of the impact of the modi®cation is invariably more reliable for the processes derived from the I.E. paradigm than from the Q.F. This implies that modi®cations performed with the latter process will, at the testing phase or during running, reveal defects due to the poor analysis of the impact. Apart from reducing the perceived quality of the application, these defects will require further e€ort for reworking the change. This further e€ort should be added to the original e€ort spent on the same modi®cation to obtain a true idea of how long the request for change really takes to achieve completion. Furthermore, the processes derived from the I.E. paradigm degrade the maintainability of the system less than those derived from the Q.F. It may be concluded that even if the Q.F. paradigm is sometimes more timely, great attention must be paid to avoid degradation of the system, which is very often more harmful than the greater time spent on the modi®cation. Thus, to avoid rapid deterioration of the system it is wise to assess the extent of the change required very carefully, as well as whether timeliness is really indispensable. It should be noted that although the modules to be modi®ed were always quite small, the correctness

Fig. 14. Traceability in Experiment II.

G. Visaggio / The Journal of Systems and Software 44 (1999) 187±197

197

Fig. 15. Traceability in Experiment III.

of the modi®cation was often unsatisfactory. Smallness of the module does not seem to be a guarantee of correctness of the modi®cation. Another problem that has been left open is the fact that the processes derived from the I.E. paradigm are not entirely free from harmful e€ects on the system either. The maintenance process should therefore provide for periodical, automatic revision of the internal consistency of the process models and their quality. In these way, the harm caused by the Q.F., whose use is sometimes indispensable, and even that which could be provoked by I.E., may be repaired. A side issue which was interesting to note was the reaction of the professional developers when we presented the results of their experiment. They virtually all agreed that the results of the direct comparison between the two types of paradigm were highly signi®cant and had convinced them that I.E. should be preferred. This implies that experiments can be help to spread culture and con®rms how important it is to let professionals in the ®eld know about the results obtained when experimenting with modifying processes they commonly use. Acknowledgements I would like to express my gratitude to the referees for their constructive comments which helped to improve this paper. I would like to thank Dr. Piernicola Fiore and the undergraduate student Mauro Fumarola for their diligent and ecacious assistance in all the preparatory operations for the experiment, as well as in the collection and analysis of the data. I am also grateful to all the students in my course on Software Engineering for having taken part in the experiment and to all the professional developers who participated in its replication I would like to thank Ms. Mary V.C. Pragnell B.A. for contribution as technical-writer and Ms. Claudia Capurso for her careful and accurate typing of the text.

This work has been partially supported by the funds of the CINI (Consorzio Interuniversitario Nazionale per I'Informatica) under the project ``Informatics in Public Administration'', theme N°2, PROGRESS.

References Abbattista, F., Lanubile, F., Mastelloni, G., Visaggio, G., 1994. An experiment on the e€ect of design recording on impact analysis. In: Proceedings of International Conference on Software Maintenance, Vic., Canada, 1994. IEEE Computer Society Press, Silver Spring, MD. Arnold, R.S., Bohner, S.A., 1993. Impact analysis ± Towards a framework for comparison. In: Proceedings of Conference on Software Maintenance, Montreal, Canada, 1993. IEEE Computer Society Press, Silver Spring, MD, pp. 292±301. Basili, V., 1990. Viewing maintenance as reuse-oriented software development. IEEE Software, January. Hatcher, L., Stepanski, E.J., 1994. A Step-by-Step Approach to Using the SASÒ System for Univariate and Multivariate Statistics. Ó1994 SAS Institute, INC. Hogg, R.V., Tanis, E.A., 1994. Probability and Statistical Inference. Macmillan Publishing Company, New York. Lanubile, F., Visaggio, G., 1995. Decision-driven maintenance, software maintenance:. Research and Practise 7, 91±115. Smith, H.C., 1985. Database design: Composing fully normalized tables from a rigorous dependency diagram. Communications of the ACM 28 (8). Von Mayrhauser, A., Vans, A., 1995. Program comprehension during software maintenance and evolution. IEEE Computer, August. Giuseppe Visaggio after graduating in Physics at the University of Bari, continued to work in the University environment and became Professor at the Informatics department in the University of Bari. His research interests are in maintenance focusing particularly on processes, quality improvements and legacy systems. He is Chief of Research at the software Engineering Research Laboratory (SER-Lab), in the Informatics Department at the University of Bari. SER-Lab hosts several basic research projects and carries out controlled and on the ®eld experimentation. For many years he has worked as a member of the Program Committee for IEEE International Conference on Software Maintenance (ICSM), Workshop on Program Comprehension (WPC) and Workshop on Empirical Studies of Software Maintenance (WESS). He acted as Program Chair of the ICSM in 1997 and as Program chair of the WPC in 1998. Over the next three years he will serve the ICSM in the Steering Committee. He is member of the IEEE Computer society, ACM and AICA (the Italian Computer Society).