International Journal of Project Management 18 (2000) 243±250
www.elsevier.com/locate/ijproman
Testing in the software development life-cycle: now or later
p
Faizul Huq* Information Systems and Management Sciences, College of Business Administration, University of Texas at Arlington, Arlington, TX 76019, USA Received 28 July 1998; received in revised form 25 February 1999; accepted 12 March 1999
Abstract The most critical and visible principle of TQM in the Information Systems (IS) environment is the focus on zero defects in a software development project to achieve customer satisfaction. This paper presents results from a simulation study that suggests that greater performance towards zero defects can be achieved through testing after each phase (concurrently), in the development life-cycle of a software as opposed to testing after the coding of the system has been completed. The simulation model is developed using four benchmark software development projects, and is validated using the Software Life-Cycle Management Model (SLIM), developed by Quantitative Software Management (QSM). Three performance measures are used in comparing the two testing philosophies: time required to produce the software, the eort expended and the cost. The results indicate that the two approaches do not dier signi®cantly on the ®rst two measures, however, the costs are found to be dramatically dierent. This dierence in costs leads to the conclusion that concurrent testing is superior. The cost superiority of concurrent testing can be attributed to the distribution of hours worked throughout the life-cycle, as opposed to being backloaded in the maintenance of the system when the costs are much higher than the earlier phases in development. 7 2000 Elsevier Science Ltd and IPMA. All rights reserved. Keywords: Software development; Concurrent testing; Life-cycle cost; Simulation; TQM; Reliability
1. Introduction The advantages of implementing Total Quality Management (TQM) in the manufacturing environment are well known. Accordingly, TQM policies should be applied to the software development process. Ensuring the quality of a product throughout the development lifecycle could have a profound eect on the money that a company spends on software development tasks, consequently eecting their bottom line. Zadrozny and Tumanic[1] state that there are several principles of TQM that can be applied to the Information Systems (IS) environment. It seems that the most critical and highly visible of these principles is the focus on zero defects to achieve customer satisfaction. The purpose The author wishes to acknowledge Dennis Liebold's contributions to early versions of the paper. * Tel.: +1-817-272-3528. E-mail address:
[email protected] (F. Huq). p
of this study is to identify how zero defects may be achieved through concurrent testing, and the bene®ts of testing after each phase in the development lifecycle versus testing only after the coding of a system has been completed. Code inspections and testing, have become an integral part of the software development life-cycle over the past decade. However, as stated by Blakely,[2] there is a lack of evidence that the time and eort expended on such tasks have played any role in reducing defects and improving overall software quality. Barnett and Raja[3] and Jones,[4] however, believe that quality can be built into the software throughout the development life-cycle through a series of progressively integrated tests, including unit testing, string testing, integration testing, system testing, and acceptance testing. Those in favor of the application of TQM in software engineering believe that there must be a shift from the product focus that introduces quality through
0263-7863/00/$20.00 7 2000 Elsevier Science Ltd and IPMA. All rights reserved. PII: S 0 2 6 3 - 7 8 6 3 ( 9 9 ) 0 0 0 2 4 - 1
244
F. Huq / International Journal of Project Management 18 (2000) 243±250
inspections, to a process focus that introduces quality into each software development activity. As stated by Franz and Shih,[5] companies such as Hewlett-Packard have experienced great success in detecting defects early in the development life-cycle by using the inspection process. This process has a great impact on reducing overall costs. Objectives of the inspections must be clearly stated for success to be achieved. Objectives, at a minimum, should include checking for consistency in coding practices and ensuring the supportability of the ®nal product. As stated by Zadrozny and Tumanic,[1] successful quality assurance programs are based on four things: a quality improvement plan, de®ning software metrics and implementing procedures to capture them, monitoring quality indicators and improving the software engineering process. Jones[4] believes there are ®ve steps to producing quality software: including establishing a software quality metrics program, establishing tangible software performance goals, establishing meaningful software quality assurance, developing a leading-edge corporate culture, and determining the strengths and weaknesses of software development for the organization. This research will attempt to provide some insights into the software development process and what eect does trying to instill quality throughout this process have on overall performance. 2. Research methodology Simulation of a hypothetical and generic software development project is used as the research methodology. The results are generalizable across multiple development platforms and development languages. Four benchmark projects are used to achieve the desired level of con®dence in the developed simulation model before conducting the experiments and analysis. Since the aim of this study is to show how the cost, eort, and schedule of a software development project are eected by incorporating testing into each phase of the software development process, it is necessary to ensure that the developed model can reproduce results for cost, eort, and schedule of an industry-acknowledged software estimating tool. The tool used to validate the developed simulation model is the Software Life-Cycle Management Model (SLIM), oered by Quantitative Software Management (QSM). The performance measures used are: (1) the time required to produce the system (months); (2) the eort expended (man±months); and (3) the cost. The reason for using these measures is to identify the phases of the development life cycle in which the most time, eort and money are expended. By having this information, management will be able to identify project cost drivers, enabling them to implement controls to
reduce the amount of variation in actual performance from expected performance. This study seeks to determine where it would be appropriate in the software development life-cycle to implement testing procedures and to determine the eects on time to deliver, eort required for production and cost of production. Two scenarios are used in this simulation. In the ®rst scenario, the testing procedures are performed only at the end of the development life-cycle when the coding has been deemed completed by the programming sta. This stage is commonly referred to as the `operations stage'. The second scenario is one in which testing procedures are implemented throughout the development life-cycle. In this scenario, testing will occur at the end of the functional design, twice at the end of the main build according to Grady,[6] and throughout the operations stage (after implementation) until the system is 95% reliable according to Putnam.[7] The following sections describe the benchmark software development project used in the simulation, the simulation program, data collection and analysis procedures. 3. Benchmark project description There are three phases in the benchmark project development process. It is assumed these phases will be the feasibility study, the functional design and the main build (coding). The phases of the development process occur in the order in which they are stated. Maintenance, although recognized as a critical phase in the successful design and implementation of any software project, is not included in the benchmark. It is also recognized that there are certain amounts of overlap between the phases mentioned above, as presented in Fig. 1. In the ®rst phase, the feasibility study is conducted to determine the viability of a project and how it should be done if justi®ed, as concluded by Ahituv, Neumann, and Riley.[8] The main purpose of the feasibility study is threefold: (1) to pro-
Fig. 1. Phases and overlap.
F. Huq / International Journal of Project Management 18 (2000) 243±250
duce an outline of all possible alternatives for a solution to the problem the proposed system will solve; (2) to outline reasons for recommending the chosen alternative; and (3) to outline the methodology by which successful implementation will be achieved. After management has reviewed the feasibility report and approved the project, the functional design will begin. There is no overlap between the feasibility study and the functional design. The functional design phase develops a technically feasible, modular design within the scope of the requirements outlined in the feasibility study as Putnam[7] concluded. This step is commonly referred to as systems design and includes such activities as designing system output, system input, system ®les, databases, system processing methods, and preparing speci®cations from which programmers and procedure writers will work. The functional design phase is followed by the main build, also referred to as the coding phase. The functional design and the main build can overlap if the organization adheres to a software development methodology in which the functional design does not need to be fully complete before the main build begins. This puts in practice the idea of writing the least time-consuming (easiest) requirements early in the process to lay the foundation for the more timeconsuming and complex requirements. The amount of overlap is company dependent. Forty percent overlap is used for the benchmark software estimate, meaning that 60% of all requirements are completed before the main build begins. The ®nal phase included in the benchmark software estimate is the main build. In this phase, a working system is produced that meets the user's requirements in terms of performance and reliability. Various levels of testing, which are also company dependent, are performed during this phase. The software development process is complete at the culmination of this phase, at which time, the system has reached full operational capability. It is assumed that the amount of eort expended in the feasibility study and the functional design are functions of the amount of eort expended in the main build. Eort expended in the main build is a function of total development time. Details on the calculation of eort required in the main build can be found in Putnam.[7] Putnam states that the feasibility study will take 13% of the time required for main build completion, while the functional design will take 31% of the time required for main build completion. It is assumed that the amount of time taken to complete the feasibility study and the functional design are functions of the amount of time taken to complete the main build. Time required for the main build is a function of eective source lines of code (ESLOC) and development team productivity. Details on the calcu-
245
lation of time required in the main build can be found in Putnam.[7] Industry average percentages of 45% of the main build time for the feasibility study, and 70% of the main build time for the functional design time were used. It is assumed that the programmers on the benchmark project have a productivity index (PI) of 18. A PI of 18 is chosen because, as concluded by Putnam,[7] the average PI for business system development is between 16 and 20. The productivity index is a management index that represents overall process productivity achieved by the development team during the main build, as stated by Putnam.[7] The productivity index is a function of the ESLOC produced, the amount of eort (man±months) expended, a special
Fig. 2. Development life-cycle with testing after coding.
246
F. Huq / International Journal of Project Management 18 (2000) 243±250
skills factor that accounts for the learning curve that will occur on large projects, and the amount of time (months) taken to produce the ®nal product. It is assumed that the labor rate for the development team, including all levels of management, programming sta and support personnel is an average of $65 per hour. The number of hours representing a productive work±month are 173. These ®gures will also vary by company, with hours in a productive work± month subject to the company's holiday, sick-time, leave-time and training policies. The fewer the hours in
a productive work±month, the more eort the development project will take. The driving factor in the software development process is the number of ESLOC to be produced. Four dierent levels of ESLOC, ranging from 50 000 to 250 000 are used to validate the simulation model.
4. Simulation model The study was executed using a discrete event simulation model of the software development process. SLAMSYSTEM 4.5 simulation language was used to develop the code for the model. Fig. 2 is a ¯owchart of the overall simulation logic for the development process where testing is performed only at the end of the coding phase. The development process begins when someone within an organization determines there is a problem that could be solved by a computer system. A system is then created for review. If the project is approved, a feasibility study will be performed. The results of the feasibility study are analyzed to determine if the proposed system represents a viable solution to the problem. If the feasibility study shows that this is a viable solution, the functional design is performed, followed by coding and testing. Upon completion of testing, the product's reliability is deemed acceptable or unacceptable. If the product is acceptable (95% reliable), implementation takes place. If the product is unacceptable, it continues through an iterative testing and re-coding process until it reaches an acceptable level. The key idea to note in this model is that testing is performed only after the initial coding has been completed. Fig. 3 is a ¯owchart similar to Fig. 2, with the exception that testing is implemented at the culmination of each phase in the development life-cycle, not just after the coding phase. Two models are used in this analysis to allow comparisons between each testing methodology to be made to determine the eects on schedule, eort expended, and cost.
5. Validation
Fig. 3. Life-cycle with testing after each phase.
The results of this analysis prove the developed simulation model is 99.3% accurate when compared to the results of the SLIM model at the various levels of ESLOC. Fig. 4 shows comparisons of aggregate (from all 4 test runs) development, time, eort, and cost results generated by the SLIM model versus aggregate ®gures for time, eort, and cost generated by the developed simulation model.
F. Huq / International Journal of Project Management 18 (2000) 243±250
6. Analysis of simulation results In the following sections, results of the simulation are analyzed. Simulation parameters for the lines of code to be produced, productivity index, phases included and overlap, eort percentages, and time percentages will remain the same for both models. The only factor that changes in the two approaches is that testing takes place at the end of the development lifecycle for the ®rst model vs. testing at the end of each phase in the development life-cycle for the second model. 7. Consequences of testing at the end of the development life-cycle Results from the model with testing at the end of the coding phase are presented here. Grady[6] states that the defect frequency for any software development project will be 7 defects for every 1000 lines of code. Given that the simulation project was to produce 50 000 lines of code, the number of errors that exist, using Grady's[6] estimate, by the time the testing phase is reached will be 350 (50 0000.007). Putnam[7] states that a software has reached full operational capability when it is 95% defect-free. Given Putnam's[7] asser-
247
tion, the number of defects that must be repaired before the system has reached full operational capability has to be 333 (35095%). Grady[6] also suggests that errors may be found and ®xed in the following time frames: 25% of defects will be found and ®xed in 2 defect 50% of defects will be found and ®xed in 5 defect 20% of defects will be found and ®xed in 10 defect 4% of defects will be found and ®xed in 20 defect 1% of defects will be found and ®xed in 50 defect
h per h per h per h per h per
Using these ®nd and ®x rates, the total number of hours to repair 333 defects will be 2098 h. Jones[4] states that it costs 100 times the normal rate per hour to remove defects that are detected near or at system implementation time. This is a result of the eect of correcting a defect after a good deal of work has already been completed will have on the system as a whole. Therefore, if a defect is detected later in the development life-cycle, the cost of ®xing the defect will be higher because system rework may be required. Given an hourly rate of $65.00 (consistent with vali-
Fig. 4. Simulation results vs. SLIM results.
248
F. Huq / International Journal of Project Management 18 (2000) 243±250
dated model), the cost per hour to ®x a defect would be $6500, resulting in a $13 363 350 maintenance bill. This price is above and beyond the cost ®gures for the phases of the development life-cycle that have already been completed. Assuming the sta available for defect ®nding and ®xing is 4 (this to be company speci®c), the amount of eort expended, given a 173 hour work±month, would be 12.1 man±months. The time required to ®nd and ®x the defects would be 3.03 months. In summary, the total time required to produce a 95% reliable product using the methodology of testing only after the coding phase in the development lifecycle would be 13.87 months. Total eort expended would be 163.2 man±months. Total cost would be $15 298 000. 8. Consequences of testing at the end of each phase in the life-cycle In this model the same defect ®gures from the previous model are used. The major dierence between the previous model and the model employed here is that testing will now be performed at the end of each phase, with the exception of the feasibility study. After the coding phase, there will be two (2) testing phases. This is consistent with the description that Grady[6] gives in his discussion on the software development methodology employed at Hewlett-Packard. Testing will resume until the product has reached full operational capability (95% defect-free). Error detection rates, according to Grady[6] will vary, depending on which phase of the life-cycle has just been completed. Fifty-®ve percent of all defects in the system will be found after the functional design phase (33355%=183). These usually occur as a result of misunderstood requirements. At this point, the cost to ®nd and ®x the defects would be the same labor rate that was applied to the functional design, which is $65/h. Using the same defect ®nd/®x rates as outlined in the previous section, total hours required for ®nding/®xing functional design defects would be 1213, at a cost of $78 828. These additional hours will also have an eect on the schedule and eort expended on the project. Seven extra man±months of eort would be worked and the schedule would slip by 1.75 months (assuming 4 persons available to ®nd/®x). In the ®rst testing phase after the coding phase has been completed, 60% of the remaining defects will be found and ®xed.[6] Ninety-four of the 157 remaining defects would be found/®xed during this phase. Cost to repair those defects will now double to $135/h as a result of the eect the ®xes will have on the system as a whole. Total hours expended in this phase will be 593, costing a total of $77 150. Eort expended on
repairs will be 3.42 man±months, with the delivery schedule slipping 0.87 months. In the second testing phase after the coding phase has been completed, 60% of the remaining defects will be found and ®xed.[6] Remaining defects total 63, so 38 defects would be found and ®xed during this phase. Cost to repair those defects will now increase to $650/ h as a result of the eect the ®xes might have on the rest of the system. Total hours expended in this phase will be 238, costing a total of $154 791. Eort expended on repairs will be 1.38 man±months, with the delivery schedule slipping 0.34 months. The maintenance phase will continue after implementation has occurred. There are 25 remaining errors in the system, with 95% (24) of those being found/®xed in the maintenance phase, there will be only 1 error left in the system. One hundred and ®fty (150) hours will be devoted to reaching this eciency level at a cost of $6500/hour ($65100), resulting in a total cost of $972 563. Eort expended during this phase will be 0.87 man±months. It will take only 0.22 months (with 4 people) to reach this eciency level. In summary, the total time required to produce a 95% reliable product using the methodology of testing after each phase in the development life-cycle would be 11.34 months. Total eort expended would be 189.47 man±months. Total cost would be $4 083 000. 9. Policy comparisons Comparisons can be made between the dierent testing models to determine the eects of testing throughout the systems development life-cycle. The time comparison in Fig. 5 shows that more time is spent on the functional design and the main build in the model that integrates testing after each phase. This result was expected. The major ®nding when viewing this graph is the dierence in time spent in maintenance. The ®rst model had a total time of 3.03 months, while the second requires only 0.22 months. This is a result of the errors being found throughout the system life-cycle and not all at once, as in the ®rst model. The
Fig. 5. Time.
F. Huq / International Journal of Project Management 18 (2000) 243±250
Fig. 6. Eort.
total time ®gures for model 1 and 2 are 13.87 and 11.34, respectively. The eort comparison shown in Fig. 6 shows essentially the same results as shown in the time comparison. More eort is expended in the functional design of model 2 because time is being taken to ®nd/®x errors. In the main build, the eort expended in model 2 is very close to that expended in model 1. This is a result of the 2 testing phases employed after the coding phase. As expected, eort expended in the maintenance phase of model 2 is signi®cantly less than that expended in model 1. The overall result for eort is that more eort is expended in model 2 (189.47 man± months) than in model 1 (163.2). The cost comparison presents the most compelling data. As expected, as a result of more eort being expended during the functional design and main build phases, the cost in these phases is higher in model 2 than in model 1. The most visible dierence is in the cost of the maintenance phase. Because signi®cantly less hours are being expended in maintenance in model 2 (1948 less than model 1), the cost saving is tremendous. The critical point that should be made here is that if testing is performed only once during the life-cycle, the cost to develop these 50 000 lines of code product will be $15 298 000. If testing is performed throughout the life-cycle, total development cost will be $4 083 000, which is a cost reduction of $11 215 000.
10. Conclusions The following conclusions may be formulated as a result of this analysis: 1. Testing is a major event in the software development life-cycle. It plays a signi®cant role not only in determining the end product's reliability, but how much it will cost. The time of delivery and
249
amount of eort expended during the development life-cycle are not dramatically dierent from one model to the other. The cost is so dramatically dierent because the hours worked are distributed throughout the life-cycle. In model 1, hours are back-loaded in the maintenance of the system where the cost is orders of magnitude higher than in earlier phases. In model 2, the hours are distributed more evenly across the life-cycle, leaving less maintenance hours to be worked when costs are extremely high. 2. Testing should occur as often as is realistic during the development software. The consequences of not performing tests could be detrimental not only to the project at hand, but to the organization as a whole. This simulation was based on one project. The results of not testing after each phase in the development life-cycle could be much more dramatic in software development shops where several systems are being developed simultaneously. 3. A conclusion, more of a cautioning note, must also be mentioned here. There is a chance that work force eorts to minimize errors may be eected negatively as a result of concurrent testing. 4. Several factors in¯uence the software development life-cycle. Organizations must be able to identify those factors that are going to eect the time to delivery, the amount of eort expended, and ultimately the cost of the system and learn to control them. A great deal of research has been done over the last decade about the behavior of the software development life-cycle. This study has rearmed earlier ®ndings that the software development process is in¯uenced by many factors, and is something that should be tightly controlled. Further research is needed to identify eective ways to control the variables that aect the software development lifecycle.
References [1] Zadrozny MA, Tumanic RE. Zero-defects software: the total quality management approach to software engineering. Chief Information Ocer Journal 1992;4(4):10±6. [2] Blakely FW, Boles ME. A case study in code inspections. Hewlett-Packard Journal 1991;42(4):58. [3] Barnett WD, Raja MK. Applicaton of QFD to the software development process. International Journal of Quality and Reliability Management 1995;12(6):24±42. [4] Jones, Capers, ``Applied software measurement: assuring productivity and quality.'' New York: McGraw-Hill, Inc., 1991. [5] Franz LA, Jonathoan CS. Estimating the value of inspections for early testing for software projects. Hewlett-Packard Journal 1994;45(6):60.
250
F. Huq / International Journal of Project Management 18 (2000) 243±250
[6] Grady RB. ``Practical software metrics for project management and process improvement.'' Englewood Clis, NJ: HewlettPackard Professional Books, 1992. [7] Putnam LH, and Meyer W. Measures for excellence: Reliable
Software On Time, Within Budget. Englewood Clis, NJ: Yourdon Press, 1992. [8] Ahituv N, Neumann S, Riley HN. Principles of information systems for management. Dubuque, IA: Wm. C. Brown Communications, Inc., 1994.