Mk~ror/rct,v~~~. Reficrh.. Vol. 36.No. 5.pp.645-650.1996 Copyright c 1996 Elsevrer Science Ltd Printed in Great Britain. All rights reserved 002662714/96 $15.00+.00
Pergamon 0026-2714(95)00157-3
MODELLING
AN IMPERFECT IN SOFTWARE
DEBUGGING RELIABILITY
PHENOMENON
P. K. KAPUR Department
of Operational
Research,
SAID Department
of Information
University
of Delhi, Delhi, India
YOUNES
Engineering, Faculty of Electrical University of Aleppo, Syria
and Electronics
Engineering,
Abstract-Several Software Reliability Growth Models (SRGMs) have been developed in the literature assuming the debugging process to be perfect and thus implying that there is one-to-one correspondence between the number of failures observed and errors removed. However, in reality it is possible that the error which is supposed to have been removed may cause a failure again. It may be due to the spawning of a new error because of imperfect debugging. If such a phenomenon exists then the Software Reliability Growth IS S-shaped. In this paper, we develop a model which can describe imperfect debugging process and has the inbuilt flexibility of capturing a wide class of growth curves. Earlier attempts of modelling such a process were able to capture only a particular curve. In other words, if a failure observation phenomenon is exponential then the error removal IS again modelled by an exponential growth curve. Applicability ofthe model has been shown on several data sets obtained from different softwaredevelopment projects..
1. INTRODUCTION
The rapid advancements in the technology of computer hardware production have made computer systems easily available for a wide range of applications in our daily life. The cost of computer hardware has been declining while on the contrary, the cost of computer software is increasing. The production of computer software systems is seen to be the most prominent industry for this decade and the foreseeable future. Due to the vitality of computer software systems and the manifold increase in their development cost, it is of utmost importance to develop high quality software systems. The quality of the software system is described by many metrics such as: complexity, portability, maintainability, availability, reliability, etc. The software reliability which is a user oriented metric is considered to be the only metric which can physically quantify the quality of the software system. The software reliability is defined as the probability that the system will work without failure for a specified span of time under a given usage environment. The software failure is the departure of the software output from the system specification. The software failures are the manifestation of the software errors which are introduced by the system analysers, designers, programmers and managers during the different phases of software development cycle. In order to detect and remove these errors, the software system is tested. The quality of the software system in terms of its reliability is measured by the removal 645
of these errors. The Software Reliability Growth Model (SRGM) is defined as the mathematical relationship between the number of software errors removed and the testing time. The SRGM can be used to monitor the error removal process and to measure and predict the reliability of the software system and some other operational metrics such as the testing cost and the software release time. The first attempt to model the software error removal process is attributed to Jelinski and Moranda [l]. Their model has a simple structure and assumptions. Musa [2] proposed the Basic Execution Time Model which has assumptions similar to those of the JJM Model. This model made a major contribution to the understanding of the error removal phenomenon and its relation to the calendar and execution time. Goel and Okumoto [3] (G-0) proposed the first Non Homogeneous Poisson Process (NHPP) SRGM. They assumed that the failure observation (error removal) phenomenon follows NHPP. But for the assumption of NHPP, G-O model is similar to the Basic Execution Time model of Musa. In all the three models, the cumulative number of errors removed grows exponentially with the testing time. The exponential growth curve is due to the assumption that the error removal intensity is linearly related to the remaining number of software errors. In many software development projects it was observed that the relation between the cumulative number of errors removed and the testing time is S-shaped. The causes of S-shapedness are many and have been disscussed
646
P. K. Kapur
by Yamada (jr t/l. 141, Ohba [S], Bittanti et ~1. [6] and others. Yamada (‘I trl. [4] ascribed the Sjhapedness to the time delay between the failure observation and its subsequent error removal. Ohba [5] attributed it to the mutual dependency between the \oftuarc errors. Bittanti c’t (I/. [6] attributed it to the incrcasc in the error removal rate as the testing progrcsscs. All the above-mentioned models assume the error removal process (error debugging) to be perfect. i.e. Mhen an attempt is made to remove an error (the cause of the failure) the error is removed with ccrtaint!. This assumption may not be realistic. Due to the complexity of the software systems and the incomplctc understanding of the software requirements specilication; structure. the testing team may not be able to remove the errors perfectly and the original error is replaced by another error. The new error may generate new failures w,hen this part of the \oftware system is traversed during the testing. The error can be removed perfectly when the testing team properly understands the nature of the error and takes the necessary steps to remove it. The multiple removal of the original errors and their successors. i.e the errors which replaced the original error. slows down the removal ofthc original errors and give rise to S-shaped sro\z th cut-LL’. The concept of impcrfcct debugging was first introduced in softwjare reliability models by Goel 171 when he Introduced the probability of imperfect dcbugginp to J M model. Kapur and Garg [8] inlroduccd the imperfect debugging in the G-0 model. They asumcd that the error removal rate per remaining error i’\ reduced due to the imperfect debugging and thus the number of failures observed by time infinit! is more than the initial error content. Although. the last two models describe the imperfect debugging phenomenon. yet the Software Reliability Growth cur\c of these two models is always exponcntial. Moreo\,er, both the models assume that the probability of imperfect debugging is independent of the testing time. Thus they ignore the role of the Icarninp process during the testing phase by not accounting for the experience gained with the progress of software testing. Actually. the probability of imperfect debugging is supposed to be maximum at the early stages of the testing phase and is supposed to reduce with the progress of the testing phase. In this paper we propose an SRGM based on NHPP. The model describes the error removal phenomenon under imperfect debugging environment. The learning process is taken into consideration by assuming that the probability of the imperfect debugging is dependent on the number of errors rcniaining(reinoved). The proposed model has a flexible structure. It can describe difliirent growth curves ranging from pure exponential to highly S-shaped. The model is tested on real softwsare error data obtained from various software development projects. The data sets are cited from Brooks and Motely [9]. Ohba [S] and Misra [IO]. The performance of the model is compared to the imperfect
and S. Younes
debugging model proposed by Kapur and Garg [8]. We give below the definition of some of the key terms relevant to this paper. Error drhugging process: the process of analysing the cause of the software failure, locating the erroneous part of the software system and taking the necessary steps to remove the software error. Inzpwfect dehu~gging: occurs when the error debugging process does not lead to the removal of the software error. Originul errors: are the errors latent in the software system at the commencement of the testing phase. Spmvned mvrs: are the errors which have replaced the original error due to imperfect debugging.
2. MODEL
ASSUMPTIONS
(1) The error removal/failure observation phenomena follows NHPP. (2) The software system is subject to failures at random times caused by software errors remaining in the software. (3) The expected number of failures observed in time (t, t + (it) is proportional to the expected number of errors remaining in the software. (4) On the observation of a software failure, the effort to remove the cause of the failure (the error) may not be perfect and thus another version of the error may replace the original error. (5) The probability of imperfect debugging is decreasing with testing time and is proportional to the number of errors remaining in the software. (6) The imperfect error debugging does not increase the initial error content (the number of original errors).
3. NOTATION The followmg
notation is used:
h
The initial error content at the bcgining of the testing phase (The number of original errors). The error removal rate per remaining errors. The initial Imperfect debugging rate per remaining error. The expected number of original errors removed in time (0. t). The expected imperfect debugging probability at time ‘t’, where
P(t) = (h’ ~ w,(t))/N: P(0) = I
served
i,(t) = i.,(t) =
P(z)
= 0.
expected number of failures obin time (0, t). The error removal intensity h,(t)4(t) function. intensity 6m,(t);cS(t)The failure observation function. 1% ( f 1
The
and
647
Modelling an imperfect debugging phenomenon 4. MODEL ANALYSIS AND FORMULATION Under assumptions (2) and (4) the expected number of original errors removed in time (t, f + 6~) satisfies the following differential equation:
= h[N -m,(r)]
- cf(t)[N
-m,(t)].
If we consider the failure observation rate ‘7’ to be equal to the error removal rate ‘b’ (for the sake of simplicity), equation (4) is written as ;
b>c>O
(5)
where
or (1)
m,(O)=0 The tirst term h[N - m,(t)] represents the intensity of errors debugged, while the negative term cP(r)[N - m,(t)] represents the intensity of imperfectly debugged errors. In other words, the intensity of errors perfectly removed is the intensity of errors debugged minus the expected intensity of the imperfectly debugged errors. Because of the increased experience of the testing team with the progress of the test (the learning process), the rate of imperfect debugging is expected to decrease as the test progresses and becomes negligible towards the end of the testing phase. As given before. the probability of imperfect debugging P(t) = [N - m,(t)]/N decreases with the increase in the number of errors removed. The product [cP(t)] gives the instantaneous imperfect debugging rate. This rate is maximum at the beginning of the testing phase (equal to c) and it tends to its minimum value (equal to zero) when all the original errors are removed. Further, if the imperfect debugging does not exist, i.e. c = 0, equation (2) is written as fn,(t) = N( I - emh’) which is the mean value function of Gael-Okumoto model [3]. In other words, if the imperfect debugging does not exist, the reliability growth curve is exponential. Solving equation (1) under the boundary condition m,(O) = 0, we get
,,I
r
(f)
=
N
!”-:?!_-?f “’ (b - c) + ce-*’
and
in,(x)=jNp)ln(,I)J.
It may be noted that the value m,(x) is greater than N. This implies that the number of failures observed by time infinity is more than the initial error content ‘N’. This increment is due to the imperfect debugging, as the imperfectly debugged errors spawn new version of their own. The intensity function of the error removal is given as ;.,(t)
= 6m,(t)/6t
The inflection
Nb’(b = c(h -,j-t
(6)
point is given by (7)
The inflection point exists when b < 2~. The model in (2) generates an S-shaped growth curve if the initial rate of imperfect debugging ‘c’ is more than half the rate of error removal. This indicates that the error removal will slow down leading to an S-shaped growth curve. The intensity function of the failure observation phenomenon is given as Nb2emh’
(8) The failure observation model given in equation (5) has an inflection point at t = Y;. Thus, the failure observation growth curve is exponential.
5. PARAMETER
1’ ;[N - no,]& j_0
c)emhr me _,,,]’
(2)
From equation (2) we can find that m,(O) = 0 and nr,( z) = N. This indicates that the original errors (the initial error content) are removed completely after long time of testing. According to assumption (3), the number of failures observed in time (t, t + 6~)satisfies the following equation
nt,(t) =
-
(3)
where ‘;” is the failure observation rate per remaining errors. Solving equation (3) under the condition /n,(O) = 0, we get
ESTIMATION
The Maximum Likelihood Estimation (MLE) method is used to obtain the parameter estimates of the models given in equations (2) and (6). If the data is given in form of errors removed in a given span of time, then the model in equation (2) can be used. Otherwise, if the data is of failure observation, then the model in equation (6) can be used. The error removal data should be of those errors which are reported to be removed when the original errors and the consequent spawned errors are perfectly removed. If the error removal (failure observation) data is grouped into k points (ti, yi); i = 1,2, . . , k, where yi is the cumulative number of errors removed (failures observed) at time ti and ti is the acumulated test time spent to remove (observe) yi errors (failures). Since the errar removal (failure observation) is assumed to
P. K. Kapur and S. Younes
648
follow NHPP (assumption l), the models in equations (2) and (6) are the mean value of the NHPP. Thus the likelihood function (f,) is given as:
L = fi [,n(t,) - fn(t; - I)]““-“‘-” i=l
e[n,(r,I)-mu,)l
(9)
(y, - y; - I)!
where rn(t,) is equal to m,(ci) if the data are of error removals while m(t,) is equal to m,(ti) if the data are of failure observation. Taking the natural logarithm of equation (9) we get L,!. = i (,I; - Jli - I) In[m(t!) - f~(ti - l)] ,= I
- fn(t,) - i
In[(y; -I’,
- I)!].
(10)
AIC was first proposed as SRGM comparison criterion by Koshgoftaar and Woodcock [12]. It is defined as AIC = - 2*(Value of log likelihood function at its maximum) + 2* (Number of parameters fitted when maximizing the log likelihood function). A low value of AIC indicates better fitting and predictive validity [ 121. To keep the paper self contained, we give below a brief description of Kapur and Garg Imperfect Debugging model (K-G model). The mean value function of K-G imperfect debugging model is given by
i=l
m,(t) = N(l - emph’) The MLE of the parameters of the models m,(t) and r+(t) can be obtained by maximizing the function in equation (10) under the following conditions:
where m,(t) is the mean number of errors removed in time (0, t) and “p” is the probability of perfect debugging. The mean number of failures observed is given by m,(t) = N (1 - edph’). P
The DNCONF subroutine of IMSL/MATH Library [I 11 is used to obtain the MLE of the parameters of the respective model. This subroutine provides accurate parameter estimation in a short time.
6. MODEL
VALIDATION
To check the validity of the model, it is tested on three software error data sets obtained from various software development projects. The software error data are cited from Brooks and Motely [9], Ohba [S] and Misra [IO]. Since the cited data are of software errors removed in a given time, the model in equation (2) is used to describe the error removal process. The performance of the model is compared to the imperfect debugging model of Kapur and Garg [S]. The comparison criteria are: (I) The Mean of Square fitting Errors (MSE). (2) The Akaike Information Criterion (AIC). The MSE is defned
as:
MSE=j=’
.~ k
(11)
where the term (c:_, [m(fi) - yi]‘) is the sum of square of differences between the actual number of errors removed (failures observed) by testing time ri(i = 1,2,. . . , k) and its estimated value h(ti). The lower MSE indicates less fitting error and thus better performance.
The value of perfect debugging probability is supposed to be known. In all the existing software error data sets, this value is not provided. The estimation of “p” from the error data is also not possible since the parameters estimates tend to be unstable. Thus the value of “p” is assumed. This value is the lower limit of the perfect debugging probability. In this paper, we have assumed (p = 0.75).
7. DATA
7.1. DSI
ANALYSIS
[Z]
The software system is of real time command and control system. The software size is 317 KLOC (Kilo Lines Of Code). The software system was tested for 35 months. The software error data is given in the form (yi, t,); i = I, 2,. , 35, where t, is measured on the basis of CPU hours. The estimation results and the comparison criteria of the proposed model and K--G model are given in Table 1. From Table 1, the proposed model estimates the initial imperfect debugging rate c = 0.001. This indicates the presence of imperfect debugging phenomenon in this project. The condition h < 2c is not satisfied and as a result the error removal growth curve (the reliability growth curve) is exponential. The values of MSE and AIC of the proposed model are lower than those of K-G model. Thus, the proposed model represents the error removal process better. Figure I graphically illustrates the fitting of the proposed model and K-G model as compared with the actual error removal data.
Modelling
Table
w(t) K-G
I347 1438
debugging
649
phenomenon
I. DSI 123
b
Model
an imperfect
0.22e - 2 0.168e - 2
0.001 0.755
MSE
AIC
1872 3169
662 696
300 E e 2so 5 ; 200 ‘,
150
sz’
100
A Estimated (K-G)
50
o-
Table 2. DS2 [I I]
5
10
15
20
25
30
35
40
45
50
Time (CPU hours) b
(5,
MSE
AIC
0.084 1 0.0309
0.062 0.86
I53 207
234 250
Model 352 455
m,(r) K-C;
Table 3. DS3 [9]
Model
a
b
G,
MSE
AIC
I%( f ) K-G
477 471
0.269e - 3 0.3564e - 3
0.0 0.75
30 30
179.2 179.2
w,(r): The proposed model. K-G: Kapur and Garg Imperfect Debugging model [7]. (p): The probability of perfect debugging (K-G model).
Fig. 2.
The model estimates N = 352, which is close to the reported value (if we consider the data collection error). K-G model estimates N = 455, which is clearly an overestimation. The values of MSE and AIC of the proposed model are lower than those of K-G model. Thus the proposed model estimates the error removal process better. The poor performance of K-G model as compared to that of the proposed model can be attributed to the fact that K-G model fits the error removal data by exponential growth curve while the error removal data pertains to S-shaped growth curve. Figure 2 graphically illustrates the fitting of the proposed model and K-G model as compared with the actual error removal data.
7.3. D&s3 [9] + Observed A Estimated 0 Estimated (K-G)
0
I
I
I
500
loo0
I500
Time (hours) Fig. I.
The software system is of PL - I data base system. The software size is 1317 KLOC. The software system was tested for 19 weeks. The software error data is given in the form (y;, ti), i = 1, 2,. , 19, where ti is measured on the basis of CPU hours. The data report (Ohba [S]) gives the total number of errors removed after a long time of testing to be 358 errors. If the time of the test is large enough, this value can be approximately considered to be the number of original errors (N = 358). This value will be used as an additional comparison criterion. From Table 2, the proposed model estimates c = 0.062. The condition h < 2c is satisfied and thus the error removal growth curve (the reliability growth curve) is S-shaped. In this project, the value of the initial imperfect debugging rate “c” approaches the error removal rate “h”.
The software system is of real time command and control system. The data is given in the form (yi, t,); i = 1,2,. .38, where ti is measured on the basis of CPU hours. From Table 3, the proposed model estimates c = 0.0, thus the error removal process is perfect and the error removal growth curve is exponential. In this project the proposed model reduces to the exponential model of G-O [3]. K-G model estimates the same value of “N”, while the error removal rate of the proposed model is higher than that of K-G model. Both the models give the same value of MSE and AIC. Figure 3 graphically illustrates the fitting of the proposed model and K-G model as compared with the actual error removal data.
250 ,-
l Observed A Estimated 0 Estimated (K&i)
I
1500
Time (CPU hours) Fig. 3.
2ooa
2500
650
P. K. Kapur
and S. Younes
2. .I. D. Musa, A theory of software reliability and its ;;;;F;tion. IEEE Trans. Sofiware Engng SE-I, 3 12--327
8. CONCLUSION In this paper. we have developed a new way of modelling an imperfect debugging phenomenon in software reliability which may give rise to S-shaped growth curve. The S-shapedness may be attributed to the spawning of a new errors during the debugging process. WC have interpreted the probability of imperfect debugging in terms of increased experience (learning) of testing team while detecting and removing the software errors. Another point worth mentioning about the model is its inbuIlt flexibility. It can describe an imperfect debugging phenomenon as exponential or S-shaped growth curve depending on the severity of spawning oferrors. At the same time, it can describe the situation where the imperfect debugging phenomenon does not exist as seen in IX3 [9].
3. A. L. Gael and K. Okumoto, Time dependent errordetection rate model for software reliability and other performance measures. IEEE Trans. Relish. R-28, 206-21 I (1979). 4. S. Yamada et al., S-shaped reliability growth model for software error detection, /EEE Trans. Relish. R-32, 475-478
5. M. Ohba,
supported
REFERENCES I. %. .Ielinsl\l and P. B. Moranda, Software reliability research. In Srtrri\/ic~tr/ Covtputc~ Prr-formcrncr Evaluation (1-d.W. I rcibcrgcr). pp. 465- 497. Academic Press, New \ orb
Software
reliability analysis models. IBM J. (I 984). 6. S. Bittanti, P. Bolzern, E. Pedrotti, M. Pozzi and R. Scattolini, A flexible modefling approach for software reliability growth. In Sofiware Reliability Modelling and Idmtificafion (Eds G. Coos and J. Hartmanis), pp. 101~140. Springer, Berlin (1988). 7. A. L. Gael. Software reliability models: assumptions, limitations and applicability. IEEE Trans. Software Enqny SE-II. 1411~1423 (1985). 8. P. K. Kapur and R. B. Garg, Optimum software release policies for software reliability growth models under imperfect debugging Recherche Operarionnelle 24(3), Rrs. Dec.. 28. 428-443
295-305
.~(,~r~o~~./rilf/c,~i~~,~~/-This research work i, partly by ICCR (India) and Alcppo University (Syria).
(1983).
(1990).
9. W. D. Brooks and R. M. Motefy, Analysis of discrete software reliability models, Technical Report, RADCTR-80-84, Rome Air Development Centre. New York (1980). IO. P. N. Misra, Software reliability analysis, IBM S~srrn~s J. 22, 262-279 (1983). Il. IMSL MATH/LIBRARY, FORTRAN Subroutines for Mathematical Applications (1987). 12. T. M. Koshgoftaar and T. G. Woodcock, Software reliability model selection, a case study. Proc. Int. Sq’tnp. on Sqftwarc~ Reliclhilirjs Engny. pp. 183- 19 I (I 99 1).