Reliability Engineering 18 (1987)205-221
A New Model for Software Failure I. P. S c h a g e n National Foundation for Educational Research, The Mere, Upton Park, Slough, Berkshire SL1 2DQ, Great Britain (Received: 4 December 1986) ABSTRACT A new model is proposed for the occurrence of errors during software testing, which assumes that software is composed of distinct sections and these sections are activated randomly as testing progresses. This model has been fitted to an extensive set of data, and useful insights can be given as to the number of errors remaining to be detected in future testing phases.
INTRODUCTION In a previous paper 1 the present c o m m o n models for predicting software reliability, based largely on the Jelinsky-Moranda (J-M) model (see, e.g., Refs 2-4), were criticised as not being reasonable representations of the behaviour of real software. In particular, study of the large amount of data available in the report of Angus et al. 5 shows that the assumption of monotonically increasing software reliability is in many cases just not valid. Mechanisms to explain this behaviour were explored using simulation models, and eventually a simple logistic curve was shown to give a reasonable predictive capability for this data. However, in order to gain more insight into the failure behaviour of software it is necessary to do more than just fit an a d h o c functional model to it. The aim is to develop a model based on theoretical assumptions about the nature of the software debugging process, which will still be sufficiently simple to fit easily to observed data and use for prediction. Further study of the data of Angus et aL has led to the development of such a model, which appears to offer hope for the development of a systematic theory of software failure, not relying on the assumption of increasing software reliability. 205 Reliability Engineering 0143-8174/87/$03'50 © Elsevier Applied Science Publishers Ltd, England, 1987. Printed in Great Britain
L P. Schagen
206
INSIGHTS FROM SOFTWARE F A I L U R E DATA The report by Angus e t al. 5 contains an excellent and useful compilation of data based on US defence software. The various software modules passed through four separate testing phases: IT (integration testing), SD (independent testing), ST (systems test) and IN (installation testing). Results are not presented for all modules in all test phases, but many of the modules have failure rates reported for several of these test phases. Figure 1 of Schagen and Sallih I shows profiles of number of errors detected versus time for five of these data sets. From this it can be seen that not all data sets satisfy the assumption of increasing reliability. The latter assumption should lead to convex upwards plots, but only two of the five have this shape, and the rest are S-shaped.
1.5-
APSIAPC
10
×
t k
y,
IT
x---.x
ST
o-- -.-o
IN
U9
5
o
Fig. 1.
50 T i m e (°1o)
10o
S t a n d a r d error rate against time for different phases of module A P S / A P C .
207
A new model for software failure
A look at the error rates is also instructive. To do this we compute a 'standardised error rate' as follows: Given: r times, denoted by t 1, ..., tr; n u m b e r of bugs detected at these time periods = N1 .... , N,; and with to = 0, N O--0; let the standard error rate during the ith time period si =
(N i - N~_,)
t,
(1)
x
ti
--
ti-
1
nr
In Figs 1 and 2 this computed statistic is plotted against the percentage o f elapsed time for the modules A P S / A P C and APS/ZEZ, and for the various testing phases. Although the average value must equal 1.0, it is clear that
15--
APS I ZEZ IC @ .~
X
IT
o
o
SD
x ~ ---x
ST
/
/
fll "qO
~,--o IN
>, 0
\11/ 50
100 Time
(*1.)
Fig. 2. Standard error rate against time for different phases of module APS/ZEZ,
208
I . P . Schagen
quite extensive fluctuations take place about this value, in particular during the integration test (IT) phase. This leads us to suspect that software errors, rather than occurring totally independently, can on occasions occur in clusters. This may especially be the case when the testing enters a previously unexplored region of the software and suddenly uncovers a n u m b e r of bugs which have hitherto been unsuspected. Two insights m a y therefore be gained from this study of the data: (1)
(2)
Software reliability is not always monotonically increasing; i.e., it is possible to have S-shaped curves of n u m b e r of errors detected versus time. It is possible for errors to be clustered, and not to occur totally independently.
A suitable statistical model will allow both these aspects of software failure to be represented.
A SECTIONAL MODEL OF SOFTWARE FAILURE The proposed software failure model assumes that a particular piece of software contains m 'sections' or 'nodes'. These need not represent actual sections of code, but rather different functions of the software or modes of testing the software. Each section i contains a total of n i bugs, and n i is a r a n d o m variable. F o r the present we shall assume n~ has a Poisson distribution with mean/~. Thus, the expected n u m b e r of bugs in total = m/~. For a bug to be detectable, the section in which it lies must be entered or 'activated'. All sections are inactive at time 0. Assume that for each section the time to activation is negatively exponentially distributed; i.e., within a small time period fit the probability of it being activated is 21 fit, where 21 is the rate of activation. The mean time to activate a section is 1/,~1 time units. Once the section within which it lies is activated, each bug has a time to detection which is also negatively exponentially distributed with rate 2z; i.e., within a small time period fit the probability of its detection is 22 fit. Notice that this model allows for the possibility, depending on the values of the parameters, of the behaviour we noted at the end of the last section. Initially, all sections are inactive and few bugs are detected. As time progresses and sections are activated, more bugs are detected and the total profile is likely to have an S-shape. Furthermore, if22 >> 21, then once a section containing m a n y bugs is activated, a sharp peak in the error detection rate is likely to occur. F r o m the outline of this model it is possible to derive an expression for the expected n u m b e r of bugs detected up to any time t. Let p ( t ) = P [ G i v e n bug
209
A new model for software failure
detected by time t]. F o r it to be detected, the section in which the bug lies m u s t have been activated at some time u ( < t), and then the bug detected during the period t - u, with probability 1 - exp [ - 22(t - u)]. Thus
p(t) = f l )~1e-~l" [1 - e- z2(t-u)] du = I t [21 e - a t ' - ,~l e-Z2te-(Zl 30 = 1 - e - ~ ' ' - 2-----L---1( e - a " - e 22 - 2 1
du
-~2')
(2)
There is a special case if 21 = 22, when
p(t)= 1 - e - a " ( l
+ 210
(3)
This detection probability is the same for each bug in the system, thus: expected n u m b e r of bugs detected by time t =
m#p(t)
(4)
As well as the expected n u m b e r of bugs detected by time t, it is useful to be able to derive a formula for the variance in the n u m b e r of bugs detected. This can be done, at least approximately, as follows. The variance in the n u m b e r of bugs detected at time t is comprised of the following three sources of variance: (1) (2) (3)
uncertainty in the n u m b e r of active sections; uncertainty in the n u m b e r of bugs per section; uncertainty in the n u m b e r of bugs found per section.
We can estimate the variances due to each source, and sum them to obtain an estimate of the total variance. (1)
Variance due to uncertainty in n u m b e r of active sections = Var(NA x Nr) where NA = n u m b e r of active sections, N r = expected n u m b e r of bugs found per active section. N o w P [ B u g foundlSection active]
p(t) -
1-e
-~''
(5)
Therefore
Nf =/~p(t)/(1 -- e- z,t)
(6)
L P. Schagen
210
Therefore required variance = N z Var(NA) =/~2p2(t) Var (NA)/(1 -- e - ~1t)2
(7)
Var (NA) is given by the variance o f a binomial distribution with m trials and success probability 1 - e -~'' = m e-
4"(1
-
e - a,~)
Therefore required variance =
(2)
mp2p2(t) e - a"/(1 - e - a,)
(8)
Variance due to uncertainty in the number of bugs per node is given by a Poisson distribution for the number of detected bugs per active node. This has mean N, (as above, eqn (6)) and thus the same variance. The expected number of active nodes is m ( 1 - e-Z1'). Therefore required variance =m(1 - e-~'r) Nr
= map(t)
(9)
(3) Variance due to uncertainty in the number o f detected bugs per active node. The latter is a binomial r a n d o m variable with/~ trials and success probability p(t)/(1 - e - ~ l ' ) . Thus required variance = m(l - e-a~')#
= tulip(t)
p(t) 1 -- e -~qt
( ,t t) 1
1 -- e - a 6
/ ~1
p( t )
(lo
Summing these three variances, we obtain an estimate for the total variance in the number of bugs detected at time t
= m#v(t) where (11) Note that as t approaches ~ , p(t) tends to 1 and so also does v(t). In the limit, the mean and variance of the number of detected errors are both equal to m/~. Depending on the values of the parameters, v(t) may be greater than 1 for certain values of t.
A new model for software failure
211
V A L I D A T I O N OF F O R M U L A E To validate the formulae derived in the previous section, in particular the formula for the variance, a simulation model was constructed and Monte Carlo approximations to the mean and variance were compared to the formulae. The simulation process itself was straightforward---each iteration merely required the generation of the random number of bugs for each section from a Poisson distribution, followed by the simulation of bug detection. In a small time period 6t, if N A sections are active and n A bugs are active but not detected, P[New section activated] = (m -
NA),~, 1
6t
P[New bug detected] = n A22 6t In the former case, NA is incremented by 1 and nA by the number of bugs associated with that section. In the latter case, n A is decreased by 1 and the total number of bugs found is increased by 1. In this way an entire simulation can be carried out until the specified end time. Five hundred such iterations were carried out for each case, and the mean and variance of the number of detected bugs computed at certain time points. The four cases considered were:
Case
21
22
m
#
1 2 3 4
1.0 1.0 0.25 0.25
1.0 2.5 1-0 1.0
8 8 8 4
5 5 5 10
Figures 3-6 contain plots of the results of these simulations for these four cases, plotted as mean and standard deviation in the number of bugs detected versus time. The results predicted by eqns (4) and (11) are also plotted. It is clear that the formulae derived are very good matches to the simulated results, and can be used to predict the mean and variance for this model. F I T T I N G M O D E L TO S O F T W A R E F A I L U R E D A T A The next step is to be able to fit the parameters of this model to real data relating to software failure. Various approaches are possible, and the one chosen is not necessarily the best.
212
1. P. Schagen
Case
40
1
-
~ z,-"~
Mean
~,2 : 1.0 rn
:8
30
3 20
Formula Simulation
1()
~ 0
'
---..~--- . . . .
_
S D
I
I
.l
I
I
2
4
6
8
10
Time Fig. 3.
M e a n a n d s t a n d a r d d e v i a t i o n o f n u m b e r o f e r r o r s d e t e c t e d v e r s u s time: C a s e 1
A new model for software failure
213
Case 2
40
Mean
X1 = 1 . 0 X2 = 2"5 m
30
=8
1~=5
¢. ¢. 15
20 J~
Formula
E -I
Simulation
----
Z
10
__ ._.
0
I
I
2
4
I
i 6
SD.
8
lO
Time
Fig. 4.
Mean and standard deviation of number of errors detected versus time: Case 2.
L P. Schagen
214
Case 3
40
Mean
3C
X 1 : 0.25 //
~,2 :
1.0
0
2O E
Jf
Formula
Z
10
2
4
6
8
10
Time Fig. 5.
Mean and standard deviation of number of errors detected versus time: Case 3.
A new model for software failure
215
"°I I
Case
~,1 = 0 @ 5 Mean
X 2 = 1,0 m
= 4
I~ : 1 0 30
L.
Formula
~. 20
--
0 ¢.
Simulation
J~
E Z
10
0
2
4
6
8
10
Time
Fig. 6.
Mean and standard deviation of number of errors detected versus time: Case 4.
L P. Schagen
216
Assume r sets o f values (Ni, tl), where Ni is the number of bugs after time t~ (No = 0, t o = 0). Let n~ = N~ - Ni_ 1, i = 1,..., r. Now, the number of bugs found in the period (t~_ t, t~) has the expected value
m#(p(ti) - p( tl- 1)) = Ki, say
(12)
Let us assume that this n u m b e r is approximately a Poisson r a n d o m variable, so that e - Ki g~ii P[n i bugs in (ti-1, ti)] (13) Taking logarithms and summing over i, we get the following function to minimise for the approximate m a x i m u m likelihood estimation of 21, 22, m and p: r
F= ~
(K~ - n, In K~)
(14)
i=1
However, estimation using this function was found to be not always stable, and to produce in some cases extreme values for the parameters. Thus a Bayesian approach was adopted, and prior distributions for the four parameters were assumed. These were: 21: 22: m: p:
negative exponential with mean bl; negative exponential with mean b2; Poisson with m e a n b 3 ; N o r m a l with mean b4, variance bs.
The revised function for minimisation then becomes:
F*=F+~x
21
22 +~-mlnb3
+
ln(i)+(~t-b~,)2/(2bs)
(15)
i=l
For fitting the data o f Angus et aL, 5 the following (rather arbitrary) values of the Bayesian prior parameters were used: bl = 0.07
b2 = 0.6
b3 =
5
b4 = 8"0
b 5 = 4"0
RESULTS OF MODEL FITTING Ten different data sets from the report o f Angus et al. were selected as displaying a wide variety o f failure profiles and containing a reasonable n u m b e r of data points. In all cases the model was successfully fitted to the data and reasonable values o f the parameters were estimated. These values are shown in Table 1.
TABLE 1 Results of Model Fitting
Data set
Fitted values
N,
APS/APC/IT APS/ZEZ/SD APS/ASZ/IT APS/ACZ/IT APS/MEZ/IT APS/MIZ/SD APS/DAZ/IT APS/SAD/IT APS/ZBZ/SD SUS/CON/IT
34 73 57 24 29 27 59 50 79 29
m# - N,
21
22
m
/~
0"013 4 0"0896 0"007 3 0"015 3 0"018 1 0"047 1 0"101 7 0,111 3 0"150 6 0.005 1
0-047 2 0.01271 0-094 1 0"101 9 0.100 1 0.245 1 0.1594 0-1940 0"162 9 0.1019
7 14 11 5 6 7 11 10 15 6
7"637 6-454 7.891 7-660 7"105 4'945 5'707 5'417 5'509 9'564
19.46 17-35 29.80 14.30 13.63 7.61 3-82 4.17 3'63 28-38
APSIASZIIT Data Moclel Single realisation
70
60
L~5 o
Mean
2 40
//11 II
7= ~o
//
I
2O
/// $.D. ~G
I 100
50 Time (weeks)
Fig. 7.
Comparison of data and model for APS/ASZ/IT.
Future bugs 35 34 29 8 16 15 64 70 41 21
1. P. Schagen
218
The apparent goodness of the fit between the model and the data varies according to the shape of the data set. Figures 7-9 show plots of three data sets and the fitted models. For APS/ZBZ/SD (Fig. 8) the fit is apparently very good, while for the other two the mean curve does not appear to reflect the shape of the data. However, it is important to distinguish between the mean number of bugs detected that are predicted by the model and the actual number achieved in any particular realisation. For example, both Figs 7 and 9 contain plots of a single simulation of the model to illustrate that this can have a very similar shape to that given by the data. Thus agreement between model and data is not to be judged solely on the curve of mean number of bugs detected. One of the most interesting results of the model is the predicted total number of bugs in the software, which is just rnp. If we subtract N,, the
| 90t--
/
APS ! ZBZ / SD x~x
Data
Mean
Model
60 0 L t.
50 0 .a
~ 4o Z
S.D
10
I lo
0
1 20
I 30
Time ( w e e k s )
Fig. 8.
Comparison of data and model for APS/ZBZ/SD.
I 4o
A new model for software failure
219
SUS I C O N I IT
x
Data Model Single realisation
30
~1
r,_ o L. L.
'3 20 L
,Q E ;3 Z
-
lC
~
'1I
Mean
II S.D.
_
0
f
/ /
100
50
Time (weeks)
Fig. 9.
Comparison of data and model for SUS/CON/IT.
cumulative number detected at the end of the data set, we have the predicted number of undetected bugs in the software. We can use this to compare with the actual number of bugs detected in further testing phases of the same software. This is given in the last column of Table 1. From Table 1 it can be seen that for many of the data sets there is a reasonable agreement between the two quantities. However, several of them (the seventh to ninth in the table) have a gross underestimate of the number of future bugs based on the model parameters. Studying the model parameters, we note that these three data sets have a large estimated value of ,~1, i.e. they assume a high rate of activating software sections. To investigate the relationships between prediction mismatch and the value of h i, a plot was made of the ratio of predicted to actual numbers of future bugs versus ~.1, both on logarithmic scales. This is shown as Fig. 10. Five of the data points lie on a straight line with slope - 1, and the others fall above it. However, we should note that the value of actual number of future bugs is a lower limit, as
1. P. Schagen
220
\ x\ \ 0 -
x
\ \ \ \ X
\
X
X
\ \ \ \
¢.. m
\ \ \
\ -2
\
w
\
\ \ \ I
I
I
-5
-4
-3
\J -2
In X I
Fig. lO.
Relationships between predicted/actual bug ratio and ~ .
there may exist bugs not detected by later testing phases. Therefore, we might use the following relationship, as given by the plotted straight line: Undetected bugs -- (rn/~ - N,) × 0-00---6
(16)
However, this should not be taken too seriously, but mostly as an indication thathigh fitted values of 21 are 'suspicious'. They probably imply that only a small, relatively bug-ridden part of the software has really been tested. After this has been debugged, it is tempting to assume that the software is now bug-free, when the truth is quite different. Therefore, the proposed model may be very useful as a monitoring device to detect this kind of situation during testing.
A new mode~for software failure
221
CONCLUSIONS The simple model proposed here seems to capture some features of the behaviour of real software systems during testing. It could be extended in various ways, and its simplifying assumptions varied to reflect other conditions. It has enabled us to fit and predict behaviour of some of the data sets in Angus et al. 5 and could be used to develop monitoring tools during software testing.
REFERENCES 1. Schagen, I. P. and Sallih, M. M., Fitting software failure data with stochastic models, 9th Adv. Reliab. Techn. Syrup., University of Bradford, UK, 1986. 2. Jelinsky, Z. and Moranda, P., Software reliability research, in Statistical Computer Performance Evaluation (ed. W. Freiberger), Academic Press, New York, 1972, pp. 465-84. 3. Keiller, P. A., Littlewood, B., Miller, D. R. and Sorer, A., Comparison of software reliability predictions, IEEE 13th Ann. Int. Syrup., 1983, pp. 128-34. 4. Barlow, R. E. and Singpurwalla, N. D., Assessing the reliability of computer software and computer networks, Am. Stat., 30(2) (1985), pp. 88-94. 5. Angus, J. E., Bowen, J. B. and Vandenberg, S. J., Reliability Model Demonstration Study, report by Hughes Aircraft Company, 1983.