On Estimating the Number of Defects Remaining in Software Kai-Yuan Cai Department of Automatic Control, Beijing University of Aeronautics and Astronautics, Beging 100083, China
There have been a number of methods used to estimate the number of defects remaining in software. In this paper we present an analysis for the method of dynamic software reliability models, and that of empirical models, particularly of the Halstead model. We then develop a new static model for estimating the number of remaining defects and use a set of real data to test the new model. The new model coincides with the Mills model in a particular case and shows its attraction in its applicability to a broader scope of circumstances. Bayesian versions of the Mills model 0 1998 Elseand the new model are also developed. vier Science Inc.
1. INTRODUCTION
Since the “software crisis” exploded and was widely discussed in late 196Os, much effort has been imposed on software reliability research and practice, and consequently, plenty of noticeable results have been attained [Cai, 1995; Lyu, 19961. As the most important attribute of software quality, software relability has been attracting an increasing amount of attention in the software engineering community. Besides functionality, cost and schedule, according to [Muss and Everett, 19901, software reliability has become one of the main issues that must be taken into account in software development processes. Accompanying this trend, various software reliability metrics, including reliability, run reliability, failure intensity, and MlTF, have been proposed and tens of software reliability models developed [Shanthikuman, 1983; Xie, 19911. In fact, one may attempt to develop a software reliability metric system that comprises management metrics, control technical metrics, semicontrol technical metrics and auxiliary
Address correspondence to Dr. K-Y. Cai Bezjing Univedy of Aeronautics&Astro Dept. of Automatic Control, Beijing 1000083, People’s Republic of China.
J. SYSTEMS SOFTWARE 1998; 40~93-114 0 1998 Elsevier Science Inc. All rights reserved. 655 Avenue of the Americas, New York, NY 10010
technical metrics and apply it throughout the software development phase [Cai, 19951. As commonly understood, dynamic software reliability behavior is heavily dependent on software operational profiles. An identical software system may demonstrate dramatically different reliability behavior in different operational environments. On the contrary, the number of defects remaining in software is a static software reliability metric and independent of software operational profiles. Because of this characteristic and the observation that any software failure is attributed to one or more remaining software defects, this static metric draws much attention particularly from software development personnel. Shooman argued that a commercial software system may contain 3 or 4 remaining defects in one thousand lines of statements and a military software system may contain 1 or 2 remaining defects in one thousand lines of statements [Shooman, 19831. There have been a number of methods of estimating the number of remaining defects and the most famous may be due to Mills’ idea [Mills, 1972; Schick and Wolverton, 1978; Duran and Wiorkowski, 19811. In this paper we discuss some methods, but it is not our intention to present a systematic survey or review for them. An aim of this paper is to present a new model for estimating the number of remaining software defects and show that it coincides with the Mills model. Section 2 considers the estimation method of the number of remaining software defects using dynamic models. Section 3 considers the estimation method of the number of remaining software defects using empirical models. A new model is developed in Section 4 to estimate the number of remaining software defects. A real example is presented in Section 5 to test the new model. In Section 6 we explain why the new model coincides with the Mills model. In Section 7 we present Bayesian versions for the Mills model and the new model. Concluding remarks are contained in Section 8.
0164-1212/98/$19.00 PII SOl64-1212(97jOOOO3-4
94
J. SYSTEMS SOFTWARE 1998; 4ck93-114
2. ESTIMATION METHOD DYNAMIC MODELS
K.-Y. Cai USING
There are many dynamic models which take the number of remaining defects as one of their model parameters and use time between successive failures to predict software reliability behavior. JelinskiMoranda’s model [Jelinski and Moranda, 19721, Schick-Wolverton model [Schick and Wolverton, 19781, and the Moranda Geometric model [Moranda, 19751 are some of them. In fact, there is a class of software reliability models referred as to defectcounting models [Ramamoorthy, et al., 19861, and the Jelinski-Moranda model may be the most famous one among them. Obviously in theory, a defect-counting model can be used to estimate the number of remaining defects. Now let us consider the Jelinski-Moranda model and examine whether it can offer reasonable estimates for the number of remaining defects. Let us use Figure 1 to depict software failure process. We suppose that the ith software failure occurs at time instant I;. and xi=q-q_,. We further suppose {T;:) and {Xi) are random variables and {ti} and {xi} are their realization, respectively. Given the (i - 1)th software failure occurs at time instant ti_l (i = 1,2,. . . , n>, Jelinski and Moranda assume that the hazard rate of software is
The three methods are the maximum likelihood method, least square method I and least square method II. Let us first consider the maximum likelihood method. Since the probability density function of Xi is given by f(xi) = 4lN - (i - l)lexp{-4[N
i = 1,2,...,n
and X1,X,,..., X, are postulated to be independent, the likelihood function becomes as n
Lb
X,1
I,‘..,
i=l
So the estimates of 4, N, i.e., 4, Z’?are determined by the following equations
MTTF, =
( 1
A i n
x*
Xl A
0
/
v
I; Figure 1, Software failure process.
(i - l)Xi
i i= 1
n 1 $f,(i-
’ l)Xi) *
1
For least square method, I, we choose the loss function as 1 “-
qS[N-(i-
l)]
Then we have -n
z
&
i=l
1 [A -
i + l]*
i!l Ej -:r + 1
i=l
With {ti} or {xi}, there are three methods which are commonly used to estimate the model parameters and thus software reliability behavior is predicted.
-
1
i5ltj-(i-l)=fi_
(i
f&[N- (i - 111.
Xi
i==l
i=l
1
n
&
ti_,
where N is the initial number of remaining defects (i.e., the number of defects remaining in software at the start time of software testing), and 4 is proportionality constant. Then the software MTTF posterior to time instant ti_, is determined by
flf(Xi)a
=
S(N, $1 = i z(t) = 4[N - (i - 01
- (i - l)ll
(2-i+
xi i 1)’ )(
= ( ig ILx:+
i=l
Jig
l
(A-i+l)*
@_:+
I
J*
J. SYSTEMS SOFlWARE 1998; 40:93-114
Estimating Defects For least square function as
method
S(N, 4) = i
Eil1:I)’
(ti -
II, we choose
the loss
i=
i$l (ti-j$
c$(N :j
+
1))‘.
Then we arrive at i
A;
&=+L
C Thai
i=l
1
Ai=i j-1
itiBi(
i
Af) =
,. N-j+
1
( itiAi)(
BiEC j=l
iAiBi)
1 (i-j+
1)”
In order to examine whether the Jelinski-Moranda model offers a reasonable estimate for the initial number of remaining software defects, we apply the model to a set of real software reliability data collected by Musa [Musa, et al., 19871. The data set comprises 15 data, i.e., times between successive software failures. The model is applied in a recursive manner, that is, {n,, . . . , xi-J are used to obtain e;timates of model parameters N, 4, denoied by A$, & and then estimate of MTTF,, i.e., M7TFi is in turn determined. Similarly, {xi,. . . , xi_ 1, xi} are used to obtain &+ I, & + 1 and Mf?i7Fi+ 1. Tables 1 to 3 tabulate model results which correspond, respectively, to the maximum likelihood method, least square method I and least square method II, where
xi
RE=
&,n. i=l
Figure,s 2 to 4 show behavior of MiYi?Y versus that of M7TFi, which correspond, respectively, to Tables 1 to 3, where the zolid curves depict MTT and the dashed curves MIT&. From the above tables and figures (particularly the measure RE), we see that the maximum likelihood method and least square method I present
95
relatively reasonable predictions for software M77’F, whereas least square method II doesn’t demonstrate well. However for the estimate of the initial number of remaining software defects, I&, we have the following observations: 1. The estimates are unbelievably large, especially for the maximum likelihood method. 2. The estimates’vary heavily with parameter estimation methods. 3. With increasing i, that is, with more defects being removed, the estimates tend to become larger. With the above observations, can we be convinced that the Jelinski-Moranda model offers reasonable estimates for the initial number of remaining software defects? The answer is obviously negative. In fact, this answer was also noted by other researchers [Xie, 19911. Then what can we say about the estimation method using dynamic models? Based on the exemplification of the Jelinski-Moranda model, we at least have a suspicion with the theoretical claim that the number of remaining software defects can be reasonably estimated by using dynamic software reliability models. In fact, this should not be surprising. There are some common problems associated with dynamic models as embedded in the Jelinski-Moranda model: As pointed out in Section 1, dynamic software reliability behavior (modelled by dynamic models) is heavily dependent on software operational profiles (or testing environment). That is, the realization of {X,, . . . , X,} can be highly diverse according to the testing environment. However the number of remaining software defects is a static metric and should be, in principle, independent of the concrete realization of {X,, . . . , X,,}. Then how can one ensure that a highly diverse realization of {X1,..., X,) would always generate an identical estimate for the number of remaining software defects? Dynamic models normally assume that X,, X 2, . . . , X,, are independent. Obviously, this assumption is highly questionable. The data used by dynamic models, i.e., the times between successive software failures, are not always precisely collected. The dynamic models employ software failure data (i.e., the times between successive software failures), rather than software defect data directly. This means that data type transformation must be conducted to get an estimate for the number of remaining software defects. However, engineering
J. SYSTEMS SOFTWAFW 1998: 40~93-114
K.-Y. Cai
Figure2. M7TFi vs MfiFi of J-M Model with the maximum likelihood method.
.
.
.
.
.
.
.
.
.
1
2
3
4
S
6
7
8
9
.
.
.
.
.
experience suggests that data type transformation may incur substantial loss in precision of estimation. 3. ESTIMATION
METHOD MODELS
EMPIRICAL
.
IO 11 12 13 14 15
[Halstead, 19771. Halstead proposed the defect formula based on his “software science” and activated a number of validation efforts [Fitzsimmons and Love, 1978; Ottenstein, 1991; Schneider, 19811. Let
USING
1, = number of unique or distinct operators
With an attempt to predict the number of defects contained in software prior to the software testing phase, there has been much relevant research from an empirical viewpoint [Shooman, 1983; Lipow, 1982; Gaffney, 19841. Empirical viewpoint implies that the number of remaining defects is related to some static metric of software, e.g., lines of codes, software complexity, and the relationships between the number of remaining software defects and the static software metric are empirical and experimentally validated by a number of software projects, but not proved in theory or models. Empirical models show their attraction to software development personnel in their applicability to the software design phase and that no testing data are required. Among empirical models, the most famous may be Halstead’s formula of the number of remaining defects
appear-
ing in a program 1, = number of unique or distinct operands appearing in a program L, = total usage of all of the opertors appearing in a program L, = total usage of all the operands
appearing in a
program L = L, + L,, and is defined as program length
V = L log& ume.
+ 1,), and is defined as program vol-
E = V/(21,/I,N,)
=
L log,& + 1,)
21
.
2 4L2
Figure3. MITFi vs MfTFi of J-M Model with least square method I.
:’
01:::::::::::: 1
2
3
4
6
6
7
8
9
10
11
12
13
14
15
97
J. SYS’l-EMS SOFTWARE
Estimating Defects
1998; 40:93-114
Figure 4. MTTF, vs MfTe of J-M Model with least square method II. -
I *-
__ -
-_
_ --.-____ -
l
1
:
:
:
:
:
:
:
;
2
3
4
5
6
7
6
9
--
--
-
__
-_
_
__
IQ
11
12
13
14
Table 1. Jelinski-Moranda Model Results with the Maximum Likelihood Method i
xi
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10.000 9.000 13.000 11.000 15.000 12.000 18.000 15.000 22.000 25.000 19.000 30.000 32.000 2moO 40.000
h&q 10.000 9.000 9.500 10.667 10.750 11.600 11.667 12.571 12.875 13.889 15.000 15.364 16.583 17.769 18.286
rj
4,
0 0 2402002 5404520 8106795 12160201 12160201 18240308 18240308 27360468 41040708 41040708 41040708 61561064 61561064
0 0 4.38231e-08 1.73466e-08 l.l4747e-08 7.08927e-09 7.04876e-09 4.36097e-09 4.25815e-09 2.63153e-09 1,6244Oe-09 1.58596e-09 1.46931e-09 9.14167e-10 8.88346e-10
ESS, 0 ~.5000 1.7579 1.8386 1.3826 1.6816 1.4586 1.8062 2.1040 1.9223 2.2661 2.4917 2.3622 2.7467
w 0 0 26.923 3.030 28.333 3.333 28.333 16.190 41.477 iE?s 48:788 48.177 28.923 54.286 RE = 26.676
Table 2. Jelinski-Moranda Model Results with Least Square Method I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10.000 9.000 13.000 11.000 15.000 12.000 18.000 15.000 22.000 25.000 19.000 30.000 32.000 25.000 40.000
10.000 9.000 9.519 10.688 10.772 11.625 11.695 12.597 12.904 13.919 15.030 15.397 16.624 17.809 18.329
0 0 728 1049 1259 1511 1511 2175 2175 2610 3131 3131 3131 3757 3757
0 0
1.44624e-04 8.94532e-05 7.39655e-05 5.71262e-05 5.68194e-05 3.66134e-05 3.57579e-05 2.76236e-05 2.13156e-05 2.08143e-05 1.92848e-05 1.49991e-05 1.45772e-05
0 0 3.4807 1.7473 1.8284 1.3745 1.6731 1.4506 1.7985 2.0964 1.9150 2.2589 2.4842 2.3547 2.7392
_
:::..I
0 0 26.775 2.836 28.186 3.128 35.027 16.022 41.345 44.325 20.892 48.675 48.050 28.764 54.177 RE = 26.547
16
98
K-Y.
J. SYSTEMS SOFTWARE 1998;40:93-114
Table 3. Jelinski-Moranda
i
Model
Results with Least Square Method
MfT&
Iq
3
10.000 9.000 13.000
lO.OO@ 9.000 6.711
: 6 8’
11.000 15.000 12.000 15.000 18.000
9 10 11 12 13 14 15
22.000 25.000 19.000 30.000 32.000 25.000 40.000
1 2
xi
ESS,
0 0 1064
0 0 1.40283e-04
8 6.2894
0 48.380
5.637 4.723 4.205 3.444 3.733
3810 2206 5486 9478 6583
8.05422e-05 5.56238e-05 4.33860e-05 3.066Ole-05 4.07316e-05
4.1329 4.3962 3.8300 3.9854 4.1869
48.759 68.513 64.958 77.042 79.262
3.183 3.017 2.907 2.785 2.708 2.655 2.592
11373 13647 16376 19650 23579 28294 33951
2.76410e-05 2.43054e-05 2.102OOe-05 1.82862e-05 1.56715e-05 1.33168e-05 l.l3686e-05
4.3469 4.6923 4.5381 4.9080 5.1960 5.1141 5.5286
85.531 87.933 84.701 90.718 91.539 89.379 93.520 RE = 67.349
holds
L = l,log,l, + l,log,l,
Halstead proposed two empirical formulas for estimates of the number of remaining defects: Ej =
E2i3/3QQ0
ti = v/3000.
To examine whether Haistead’s software science can offer reasonable estimates for the number of remaining defects, let us consider the Akiyama’s published data, as tabulated in Table 4, which was employed by Halstead to validate his formula [Halstead 19771. The software system comprises 9 modules and was written in Assembly Language. Assuming that each of the S machine language steps includes one operator and one operand, we have L, =s L, = s
and L = 2s.
To compute the number of unique operators, l,, Halstead assumed that it was equal to the sum of the number of machine language instruction types, the number of program calls, and the number of unique program decisions. He further assumed that there were 64 types of machine language instructions and that only one-third of the decisions were unique. So he proposed 1, =
(D/3) +J+
II
&i
With the assumption that there approximately the equation
64.
With 1, and L known, the equation L = Z,log,l, + E,log,l, can be used to compute I, and thus E and
cai
Table 4.
RE, 0
Akiyama’s Published Data
Program Module
Program w
Decisions (D)
SR Calls
MA MB MC MD ME MF MG MH Mx
4032 1329 5453 1674 2051 2513 699 3792 3412
372 215 552 111 315 217 104 233 416
283 49 362 130 197 186 32 110 230
(J)
Number of Defects (Observed) (B) 102 18 146 26 ;: 16 50 80
V can be in turn obtained. Table 5 shows the computational results. From the computational results, as noted by [Halstead 19771, E213/30J10 is rather close to B and the estimaion equation N = E2i3/3000 seems valid. However were all the initial remaining defects observed? Or can the observed number of defects, B, be treated as the initial number of remaining defects, N? In fact, this is an open problem. A more serious problem is that V/B is far away from 3000. Then if N = B, a more reasonable relationship may be N = V/1000. At any rate Halstead claimed there hold
N = V/E, y = ,+‘/3~2/3 where A represents language level. Halstead sumed AlI3 5: 1 and thus arrived at
as-
N = E2j3/E,.
Then what is wrong? Are we convinced that h113 = 1 and E, = 3000? Let us redo computation for the
99
J. SYSTEMS
SOFTWARE 1998; 40:93-114
Estimating Defects Table 5. Computational Results of Akiyama’s Data by Halstead’s Sofhvare Science Program Module MA
MB MC MD ME MF
MC MH
MX
L
1,
1,
B
E(x 106)
V( x 103)
8064 2658 10906 3348 4102 5026 1398 7584 6824
471 180 610 231 366 322 131 252 433
442 176 574 201 138 287 76 603 357
102 18 146 26 71 37 16 50 80
170.3 15.3 322.6 28.2 100.2 65.5 6.5 58.5 135.9
79.3 22.5 111.3 29.3 36.8 46.5 10.8 73.9 65.7
Akiyama’s data with AlI3 = V/E213 and E, = 1000 as shown in Table 6. We see that AlI3 is close to 0.3 and far away from 1, and setting E, = 1000 is appropriate.* This implies E, or A is software project specific and Halstead’s defect relation cannot be directly applied. If we assume N = V/1000 (E, = 1000) and hIi3 = 0.3, then N = A1/3E2/3/Eo = E2i3/3000. However from Table 6 we observe that V is nearly linearly proportional to N and so is E2i3 to N. Based on this observation we may assume
102 21 157 31 72 54 12 50 88
V/R 717.5 1250.0 762.3 1126.9 518.3 1256.8 675.0 1478.0 821.2
the loss function as S(a, b) = i (L$ - aA$ - b)*. j=l
Let dS -=
-2iNi(G;--a$--b)=O
da
j=
1
as
ab-
V=aN+b
E2’3/3000
(kj - aA$ - b) = 0.
- -2 i j=l
So
and E213 = cN + d.
In order to apply the above equations (we call them modified Halstead’s Formula), we should estimate parameters (a, b) and {c, d}. Suppose there are values of V, E2i3, N for program modules 1, . . . , i, i.e., {VI, Ef’“, N,), . . . , {I/;:,Ef13, A$}. Then we Acan obtain the estimates of a, b, denoted by Bi+i, bi+i, by use of the least square method. That is, choose
j=l ii+,
= j$l
(4
-
R),
where N, = f ,i
Nj
]=I r We have no intention here to claim that Es = 1000 is universally applicable. From Table 5 we note that the V/B values have considerable variance. However using E,, = 1000 is more appropriate than using E, = 3000 in this particular case.
Table 6. Second Collection of Computational Results of Akiyama’s Data by Halstead’s Software Science Program Module
B
ECx 106)
MA
102
170.3
79.3
MB MC MD ME MF MG MH MX
18 146 26 71 37 16 50 80
15.3 322.6 28.2 100.2 65.5 6.5 58.5 135.9
22.5 111.3 29.3 36.8 46.5 10.8 73.9 65.7
V( x 103)
E2’3/1G90 307.2 61.6 470.4 92.7 215.7 162.5 34.8 150.7 264.3
V/E2/’ 0.258 0.365 0.237 0.316 0.171 0.286 0.310 0.490 0.249
100
J. SYSTEMS SOFTWARE 1998;40:93-114
K.-Y. Cai
Then a^. r+ i, &+ 1 can be used to predict or estimate the number of defects remaining in program module i+l
and subject to large variations, they should not be ignored. Therefore the relationships N = aI’ and N = cE213 don’t sound so reasonable. In response to the large variations of (hi) and {&), we may have random models as follows
rj+, = (y+, -hi+i>/a^i+,* Similarly, we have estimates for c, d, respectively, as follows i
(9 - q&?/3
V=aN+b+E1 E2j3 = cN + d + E2
-q?“)
j=l
A ci+1
or
=
N=aV+b+el N = cE213 + d + g2
where &i, c2 are random variables. 2. By comparing {ii> with IN,), it doesn’t seem that the models V = UN + b and E213 = cN + d are so acceptable. We should also note that the results of V = UN + b are somewhat superior to those of E2j3 = cN + d. In order to further demonstrate the preceding linear models between N and V and between N and E2i3 (the modified Halstead’s formula), we renumber the modules and repeat the procedure as that for obtaining Table 7. The resultant data are tabulated in Tab15 8. FiguTe 6 show: the behavior of Ni, Ni by (a^,, bJ and Ni by (I?~,d$, with respect to module index (i) of the horizontal axis. The solid, dashed and dotted curvesA respectiv$y, correspond to the behavior of Ni, N by (a^,,bi) and Ni by (ti, di). Then we can have the following observations:
And the estimate of the number of defects remaining in module i + 1 is
Now let us apply the above procedure to the Akiyama’s data. Let modules MA, MB, MC, MD, ME, MF, MG, MH, MX be, respectively, Modules 1, 2, 3,4,5, 6, 7, 8, 9. Then we can obtain the resultant data as tabulated in Table 7, where N is assumed to be the observed B as tabulaied in TabJe 4. Figyre 5 show; the behavior of Ni, Nj by (a^,, bi) and Ni by (&, d,), with respect to module index (8, which correspond, respectively, to solid, dashed, and dotted curves. From Table 7 and Figure 5 we can see that:
1. The estimates of a,, b {excluding c) are subject to large variations. {bi, dJ are far away from zero and thus cannot be eliminated. 2. Neither of the models, V = UN + b and E213 = cN + d offers reasonable estimates for {Ni), and the former even presents negative estimates for N! Recall the results of Table 7 and Figure 5, we see that the model behavior is subject to software projects, or to software development processes.
1. The estimates of a and c, i.e., {a^,) and {c^i),are rather stable, and the opposiie occurs Afor the estimates of b and d, i.e., {bJ and {di). This implies that there indeed exists some linear correlation between N and V and between N and E2i3. And since {ii) and (&J are far from zero
Table 7. Computational Results of Akiyama’s Data by Modified Halstead’s Formula V
Program No.
Module
1 2 3 4 5 6 7 8 9
MA MB MC MD ME MF MG MH Mx
4 102 18 146 26 71 37 16 z3i
(X103) 79.3 22.5 111.3 29.3 36.8 46.5 10.8 73.9 65.7
~213
(X104) 30.724 6.163 47.037 9.268 21.573 16.249 3.483 15.070 26.433
4
by
tii
by
ii
hi
(a^,,gi)
q
c&
(ti, $)
676.2 691.4 684.8 688.0 651.9 682.9 664.8
10328.6 9727.0 10606.1 5891.7 10821.6 7488.7 12574.0
it
292.4 315.8 311.3 311.5 301.0 310.2 311.0
899.9 - 22.9 572.6 340.9 1767.3 778.6 586.5
97 117 148 34 240 209 97
52 9; 85 27
Fire
101
J. SYSTEMS SOFlWARE 1998;40:93-114
Estimating Defects
5.
Behavior of the number of observed module defects vs that of the estimates of the number of remaining module defects.
250
-
200
'*
150
"
,, I. *_ I \ . , \ , . ,
.
,
.
1
2
3
4
6
5
7
9
8
1. The software can be divided into two parts: part 0 and part 1. 2. There are N defects remaining in the software, where part 0 contains N,, and part 1 has Ni remaining defects. That is, N = No + Ni. 3. At any time, that is, no matter how many remaining defects are contained in the software, each of the remaining defects has the same probability of being detected. 4. Upon being detected, the defect is removed. 5. Each time, one and only one remaining defect is removed and no new defects are introduced. 6. There are n remaining defects removed.
Up to this point, what can we say about the Halstead model? 1. There
is some rationale in the Halstead model. However the model parameters are subject to software projects and not universally constant. In this regard, can the Halstead model be still viewed as ‘empirical’? 2. The relations N = al/ and N = cE213 don’t seem so reasonable. More appealing relations may be and N = cE213 + d + Ed, N=aV++++i where a, b, c, d are parameters to be estimated and pi, Ed are random variables. 3. The value of the Halstead model may lie in the possibility that it presents a qualitative insight into the behavior of the number of defects remaining in software. One may not have much confidence with the quantitative information of the Halstead model. 4. The above assertion seems applicable to other empirical models.
One may question whether these assumptions, in particular, assumptions (3) and (51, are reasonable or applicable to real software. To answer this question, we have the following observations: 1. In a software reliability model, there is always some irrationality associated with its assumptions. For example, the Schick-Wolverton model assumes that the software hazard rate at the start time of testing is zero [Schick and Wolverton, 19781. Moranda Geometric model assumes that the initial number of remaining software defects is infinite [Moranda, 19751. However these irra-
4. A NEW MODEL Let us develop a new static model to estimate the number of defects remaining in software. We have the following assumptions.
Table 8. Second Collection of Computational Results of Akiyama’s Data by Halstead’s Modified Formula Program
V
~213
No.
Module
(x 103)
(X104)
1 2 3
MX MH MG
N, 80 50 16
65.7 73.9 10.8
24.433 15.070 3.483
tii
a^,
hi
- 273.3
87566.7
by
(;i, gi)
150
fii by ti
d:.
(c^i,l&I
133
378.8 3868.3
4 5 6 7 8 9
MF ME MD MC MB MA
37 71 26 146 18 102
46.5 36.8 29.3 111.3 22.5 79.3
16.249 21.573 9.268 47.037 6.163 30.724
879.9 852.9 595.3 562.8 694.4 734.5
7312.7 10203.9 16499.2 19404.9 11116.5 12834.1
34 43 -10 97 79 19
110
358.2 336.1 319.3 302.6 326.4 324.2
2437.1 - 68.7 343.3 1834.7 331.9
139 33 238 250
274.9
81
102
K-Y. Cai
J. SYSTEMS SOFTWARE 1998; 40:93-l 14
150 *.
-50 i
Figure 6. Behavior of the number of observed module defects vs that of the estimates of the number of remaining module defects.
3
2
5
4
‘s’
7
8
tionalities did not prevent them from being applicable to some real software. The assumptions presented here are essentially identical to those of the Mills model (see Section 6). If the Mills model can make sense, it is reasonable to believe that the new model can also make sense. In order to make assumption (3) more reasonable, two measures can be taken in practice:
Obviously {yI) is a series of random variables. Suppose {yJ is a realization of Ix}. Let q.(i) be the number of defects remaining in part j in the time interval (ti, ti+ 1], i = 0,1, . . . , n; j = 0,l. Suppose y, = 0. Then we have
to Figure 7.
=o
t1
t2
Defect detecting process.
N,(i)
= NI -
i
yj
i = O,l,...,n
i=O,l,...,
n.
i
p,(i)
=
No(i) N,(i)
+ cyi
No-i
j=O
+ N,(i)
=
No + NI - i i = 0,l ,...,n
p,(i)
=
4 -
N,(i)
2 yi
j=O
= No + N1 - i
N,,(i) + N,(i)
i=
0,l ,...,n.
With {yJ, i = 1,. . . , n, known, the likelihood function can be determined as follows L(Y I,...,YJ =P{Y1 =y1,...,Y, = P(Y, =y,,...,
=yJ r, = y, I Yl = Y,Po-,
= P{Y, = y3,. . .) Y, =y,
Now with the above assumptions, we can try to estimate N,,, N1 and thus N. First let us use Figure 7 to depict the defect detecting process, where ti represent the time instant of the ith remaining defect being removed. Let Y = 0 if the ith detected defect is contained in part 0 1 1 if the ith detected defect is contained in part 1.
I
= No - i + f: yj j=o
Let pi(i) be the probability of having a defect remaining in part j detected during the time interval (ti, ti+l]; i = 0,1,. . . , n; j, = 0,l. Then
New defects can be occasionally introduced while remaining defects are removed. However the proportion of the number of introduced defects to the total number of defects is normally small. For example, in the example presented in Section 5 (see Table lo), there are only 2 introduced defects among the total 79 defects. Further, in order to make assumption (5) more reasonable, we may disregard introduced defects (if any> and treat them as non-defects.
I
N,(i)
j=O
A. Modules among a program can be randomly chosen for code review or testing, rather than in a predetermined order (see Section 5). B. Modules among a program can be randomly chosen to make up part 0 and part 1. That is, the program is randomly divided into two parts.
I
9
I Y* =y2,y*
= YJ =y1)
x P(Y* = y, I Yl = YJPW, = YJ =P{Yn =y, XP{Y,_1
I Yl =y1,...,Y,_1 =y,_,
=y,_J =Yn-*I***
I Yl =J$,...,Y,-2
P{Y, = y, I Yl = y,)PW, = YJ
...
I
I
t n-l
t
”
... >
t
d
However we note P(X =yi I Yl =y ,,‘.., =
i
x-1
dN1
=yi-J
if yi = 1.
pl(i - 1)
Or P{y, =yi I Y, =y ,,..., = (p&
x-1
+ Yi j=O
=yj-J
- I))‘-Q,(i
i=l
- l)P
,...,n.
Thus L(Y l,...,YJ n = rI(p,(i
In L(y,, . . . , y,)
if yi = 0
- 1)
p&
103
J. SYSTEMS SOFTWARE 1998; 40:93-114
Estimating Defects
- l))‘-y’(pI(i
The estimates of No and N,, denoted by go and respectively, are determined by the following equations
- l)F
l -Yi
i=l
i-1
tio-i+1+
i-l
cyj j=O
-
1 A No+&-ii1
i
=
0
Yt
Yi
i-1
i i=l
Up to this point we can estimate N,,, N1. In fact,
i rj, -
C Yj j=O
-
1 A NOi-i’?,-i+l
= 0.
(4-2)
In L(Y,,...,Y,) i-l
= i$
(l -Yi) ( + Nj - i + 1)
-ln(N,
1
dlnL(y,,...,Y,)
~NO =
i$
(l
-Yi)
i
-
’ No-i+l+
i-l
cyj j=O
1 N,+N,-i+l
The total number of defects remaining in the software is estimated as fi = fro + gi,. Here we should note that an important difference among different approaches to estimating the number of remaining software defects is that they use different input data. The dynamic models based approach uses the times between successive failures as input data. The empirical models based approach uses static software (internal) metrics as input data. The Mills model uses the numbers of seeded defects and detected defects as input data. The new models use the information of detected defects. From this point we can see some advantages of the new model: 1. The input data of the new model are simple and can, normally, be precisely and easily collected. 2. From the input to the output (estimate of the number of remaining software defects), no data type transformation is required. In the dynamic models based approach, the times between successive failures are transformed into the number of remaining software defects. In the empirical models based approach, other types of static metrics are transformed into the number of remaining software defects. 3. The new model can be applicable to a broader scope of circumstances than the Mills model (see Section 6). It can also be applied to other statistical problems such as population estimation.
104
J. SYSTEMS SOFTWARE 1998; 40:93-114
5. A PRACTICAL
EXAMPLE
In this section we use a practical example to show the utilization of the new model presented in the previous section and examine whether the new model can offer good estimates.
5.1 Background
Over the period of February and March 1995, the author carried out a statistical data analysis task for an European collaborative project.* This data analysis task is typical in statistics: given an observed sample of a system variable, find a parameterized probability distribution to fit the observed sample. In order to cany out the required data analysis, the author used the statistical packet Minitab and wrote a Minitab Macro program. This program consists of 19 modules (routines) in total. For brevity, we denote them modules 1,2,. . . ,19. In the Appendix we present a brief description of the data analysis task and the corresponding Minitab Macro program.
5.2 Defect Data Collection After the Minitab program was coded, it was put into static code review and dynamic testing. Normally, defects should be detected and removed from the program in both phases of static code review and dynamic testing. What we are interested in here is to collect the data of detected and removed defects, including the detection order of the defects, the module numbers of defects, and the nature of defects. For the purpose of defect data analysis, we distinguished two types of defects: type S and type D? By type S defects we mean those which can be or should be detected by static code review. They include syntactic defects and some apparent defects. However we note that not every type S defect must be detected in the phase of static code review for various reasons, e.g., lack of patience, caution, and therefore some type S defects are left to the phase of dynamic testing. By type D defects we mean those which cannot be detected or are nearly impossible to detect in the phase of static code review, and thus whose detection is heavily dependent on dynamic testing. They include semantic defects, logical defects, wrong use of algorithms, and so on.
* PDCS2 (Predictably Dependable Computing Systems 21, Basic Research Action Proiect 6362 of the CEC ESPRIT Proaamme. 3 In some cases thl boundary between the two types OFdefects may be blurred. However the blurred boundary should be normaIIy narrow, if not abrupt.
K.-Y. Cai Obviously, if we reviewed the Minitab program in a predetermined order, e.g., the natural one in which modules were written in the program, or from module 1 to module 2, . . . , and finally to module 19, then it would not be reasonable to believe that every defect remaining in the program has equal probability of being detected. The modules which receive earlier review should have higher probability to expose their defects than those which receive later review. To overcome this problem, we followed a ‘random’ review strategy. That is, each time, we ‘randomly’ chose a module (module 1,2,. . . , or 19) from the program for static code review and ensured that each module had equal probability of being chosen. In this way we could reasonably assume that each remaining defect has equal probability of being detected. In order to follow the random review strategy, we first generated a series of samples of random data (integers) which varied from 1 to 19, as shown in Table 9: which consists of 9 samples. Then the modules were statically reviewed in this order: sample 1 + 2 + 3 + e-e, i.e., module 11 -+ 15 + 13 + ..a -_j I3 -+ 6 + 8 -+ 8 + I7 + IO + ... . Table 10 tabulates the resulting defect data collected in the phase of static code review (and that of dynamic testing). In the column of Defect Description, blank means that the detailed information of the corresponding defects was not recorded. In the column of Detection, static corresponds to static code review, and dynamic to dynamic testing. In the column of Module, the numbers denote the corresponding modules where defects were detected. From Table 10 we note that in the first round of static code review (corresponding to Sample 1 of Table 9) 14 defects (defects 1-14) were detected. In the second round of static code review (corresponding to Sample 2 of Table 9) 6 defects (defects 15-20) were detected. In the subsequent three rounds of static code review each round detected one defect.5 Then the Minitab program was switched to dynamic testing. However we note that in the phase of dynamic testing the program occasionally received static code reviews which were independent of dynamic testing, and some defects might be consequently detected (defects 75 and 79). We also note that in the total of 79 defects detected there were 2 defects (defects 77 and 79) which were introduced while fixing a remaining defect or modifying a mod-
4 These random integers were generated by use of the Minitab packet. ‘Up to this point each module had been choosen for static code review at least once.
J. SYSTEMS SOFIWARE
Estimating Defects
105
1998; 40:93-114
Table 9. Random Integers Sample 1
2
3
4
5
6
7
8
9
11 15 13 5 8 4 16 11 3 7 14
8 17 10 10 10 18 18 19 5 14 7
13 3 2 1 4 8 13 15 9 13 10
16 7 2 15 18 2 10 8 13 7 3
12 19 15 3 2 1 6 13 3 4 7
3 13 16 3 16 10 12 8 15 13 1
3 9 18 2 11 7 7 10 12 16 17
8 2 15 6 15 4 3 16 18 10 1
18 5 18 6 6 17 4 18 16 3 6
7 19 18 2 13 6 8
15 2 7 15 12 6 6 8
13 7 17 8 15 12 9 1
16 8 1 13 10 6 10 7
15 7 8 1 18 1 13 5
: 12 14 3 17 17 16
73 6 5 17 7 14 14
15 1 3 12 17 14 17
1: 3 15 14 8 10 2
parts: part 0 comprises modules l-10, and part 1 modules 11-19. The corresponding values of {yi} are then determined as shown in Table 10. With {yi}, we use the Newton algorithm with iteration of an error term [Arden and Astill, 19701 to solve (4-2).6 The results are Using {y,, . . . , y,,l: & = 10.0723, I$ = 10.4485, fi = 20.5208 Using {y,, . . . , y,J: i,, = 10.0644, I$ = 10.3834, i+ = 20.4478
ule. Table 10 displays defects data collected from 2/3/95 to 13/3/95 and no more defects were discovered afterwards.
Using {Ye,. . . , Y,}: rj,
=
11.0682, A, = 10.3756, ni = 21.4438.
This suggests that there were still about 21 type S defects remaining in the program after the phase of static code review. From Table 10 we see that 27 type S defects were actually detected in the phase of dynamic testing. If we discount the introduced type S defect (i.e., defect 791, then the actual value of N should be 26. This means that the new model has generated a reasonably good estimate for the number of remaining defects.
5.3 Defect Data Analysis One can hardly claim that a program has been completely deprived of type D defect. The possibility of the presence of logical defects or other latent defects may always exist. This causes defect estimation methods a problem, because the exact number of defects remaining in a program is not really known. However one may reasonably believe that a program has avoided type S defects if the program can function properly for a reasonably long period and no more defects are discovered. This offers a good alternative to test whether a defect estimation method behaves well. Here we apply the new model proposed in Section 4 to type S defects. We use the defect data collected in the phase of static code review (corresponding to defects 1-24) to estimate and predict how many type S defects should be detected in the phase of dynamic testing. Since what we have are the defect data tabulated in Table 10, we note that the dynamic models based approach cannot apply: we did not get any information about times between successive failures. Neither can the empirical models based approach help: we did not calculate static metrics of the program. Nor can the Mills model help: no defects were seeded over the program. However the new model can apply here. We divide the program into two
6. ESTIMATION
METHOD CAPTURE-RECAPTURE SAMPLING TECHNIQUES
USING
As to estimating the number of defects remaining in software, the most popular and famous idea may belong to Mills [Mills, 1972; Schick and Wolverton, 1978; Duran and Wiorkowski, 1981; Farr, 19821. Mills proposed a defect seeding mechanism based on the following assumptions: 1. There are A$, indigenous defects in the program. 2. N1 induced defects are randomly seeded over the program. That is, the distribution of seeded defects is the same as that of indigenous defects in the program. More accurately, this is to say that at any time, each of the indigenous and seeded
6 We note here that if N, or/and Nr approach infinite @xitive or negative), the left-hand sides of (4-2) approach zero. This suggests that there may be an estimation problem if an iterative algorithm is used to solve the equation. That is, one may get false estimates of N,, or/and Ni whose values are ridiculously large in their absolute value. In order to avoid this possibility, variable transformation can be employed. For example, let N, = lOOO(SinMO)‘, Nr = 1Ooo(Sin M,12 and estimate &, Mr. In this way N, and N, are accordingly estimated. However in this particular case the author did not try variable transformation.
106
K.-Y. Cai
J. SYSTEMS SOFTWARE 1998; 40:93-114
Table 10. Defect Data Collection of Minitab Macro Program No.
(i)
Dates
Module
yi
Detection
1 2 3 4 5 6 8’ 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 2/3/95 3/3/95 3/3/95 3/3/95 3/3/95 6/3/95 6/3/95 6/3/95 6/3/95
13 7 16 16 11 14 14 17 17 2 2 13 13 6 17 10 5 5 12 6 6 10 19 1 1 1 5 5 13
1 0 1 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0
Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Static Dynamic Dynamic Dynamic Dynamic Dynamic
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95 6/3/95
6 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 19 1 2
52
6/3/95
1
Dynamic
53
6/3/95
3
Dynamic
54 55
6/3/95 6/3/95
1 4
Dynamic Dynamic
56
6/3/95
1
Dynamic
57 58 59
6/3/95 6/3/95 6/3/95
8 8 11
Dynamic Dynamic Dynamic
; 0 0 :,
Description of a Defect
Type
Remarks
S i
S S S
S S S S S S S S S S S S S S S S S S
pa-weib is syntactically illegal pa-weib is syntactically illegal
D S S D D
New round checking
New round checking New round checking New round checking
Similar as No. 28 defect
D Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dvnamic
Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic
fre-hm is syntactically illegal fre-hm is syntactically illegal plot-wb is syntactically illegal plot-wb is syntactically illegal pa-exva is syntactically illegal pa-exva is syntactically illegal plot-ev is syntactically illegal plot-ev is syntactically illegal pa-nor is syntactically illegal pa-nor is syntactically illegal plot-nor is syntactically illegal plot-nor is syntactically illegal pa-lnor is syntactically illegal pa-lnor is syntactically illegal plot-lnor is syntactically illegal plot-lnor is syntactically illegal r-test is syntactically illegal r-test is syntactically illegal % should be call to call a subroutine ~13~15 should be local constants, rather than global constants
: S S : S
; S S S S S ; S D S D D
to accommodate module 2
~14, cl5 should be local constants, rather than global constants to transfer c14, cl5 c16-cl9 should be local constants, rather than global constants to accommodate module 4 ) is missing ) is missing ~22, c23 should be local constants, rather than global constants
Estimating Defects
J. SYSTEMS SOFTWARE 1998; 40:93-114
107
Table 10. (Continued) No.
(i)
Date’
60
6/3/95
61
Detection
Description of a Defect
Type
1
Dynamic
D
6/3/95
12
Dynamic
62
l/3/95
6
Dynamic
63 64 65
T/3/95 7/3/95 l/3/95
1 10 14
Dynamic Dynamic Dynamic
66
7/3/95
16
Dynamic
67
T/3/95
17
Dynamic
68
l/3/95
12
Dynamic
69
T/3/95
14
Dynamic
70
7/3/95
16
Dynamic
71
l/3/95
17
Dynamic
72
l/3/95
16
Dynamic
73
7/3/95
17
Dynamic
74 7.5
7/3/95 8/3/95
17 2
Dynamic Static
76 77
8/3/95 8/3/95
2 2
Dynamic Dynamic
~22, c23 should be local constants, rather than global constants ~99, cl00 should be local constants, rather than global constants iteration termination condition IfI < 10e6 is missing Note command is not correctly used Command Let no = 1 is missing ~99, cl00 should be local constants, rather than global constants ~99, cl00 should be local constants, rather than global constants ~99, cl00 should be local constants, rather than global constants Command Let y = uses wrong expression Command Let y = uses wrong expression Command Let y = uses wrong expression Command Let y = uses wrong expression klOO0 should be declared, rather than treated as an implicit constant klOO0 should be declared, rather than treated as an implicit constant x = 0 should be set Condition if i > count(cl3) go to 20 is wrong Wrong use of command sort
18 79
9/3/95 13/3/95
11 17
Dynamic Static
Module
yi
Algorithm used is wrong
‘The abbreviated notation here is written in British fashion. For example, 6/3/95
defects has the same probability of being detected. 3. Upon detected, the defect is removed. 4. Each time, one and only one defect is removed and no new defects are introduced. 5. There are n defects removed. From the above assumption we see that Mills’ mechanism, in nature, follows the capture-recapture sampling techniques in probabilistic statistics. Let 6 denote the random variable representing the number of seeded defects detected during testing. Mills found that the probability of .$ = k is determined by
P{&=k;N,,N,,n}
=
(:)(?k) ’
Remarks
D D S D D
D
D s D D D s
introduced while tixing No. 15 defect induced while modifying Module 4
denotes March 6, 1995.
This is just hypergeometric distribution. So the Mills model is often referred to as the hypergeometric model. By maximizing the prob!bility (likelihood), the estimate of N,, denoted by N,,, is N&n
rj, = Int [
-
k)
k
1
where Znt represents the function converting a real number to the integer which is least less than it. Refer to Section 4, it is easy to see that Mills’ assumptions are essentially consistent with those of the new model. In fact we can divided the defects into two parts: part 0 containing the N, indigenous defects and part 1 containing the ZViseeded defects. Then the two sets of model assumptions coincide. In this way we can recalculate the probability P{ ,$ =
108
J. SYSTEMSSOFTWARE
K.-Y. Cai
1998; 40:93-114
k; IV,,, IV,, n)
as follows
p,(2)
P{5=k;N,,N,,nl =
zW1 =y1,.*.,y,
c
Z;_,
4
N,i-No-2
=?’
For y, = 0, y, = 1, y, = I, we have
=yJ
n (N,,-i+l+
l_Yi
i-l
I
p,(O) = ;
cyj
x:-i+l
j=O
p,(l) = p,(2) =
Yi
4 - C Yj j=l
4 -YI N,+N,-1 Nt -Y1
5
=6
-Y2
N,+N,-2
4 =?’
so
N,,+N,-i+l
L~1,1,0~+L~1,0,1~+L~0,1,1~=~.
Now let us consider a simple case to show whether the two expressions coincide with each other. Let No = 2, Ni = 5, n = 3, k = 2. Then
(;) =lO,(n~k) =2,(Nyq
To
NI-(Y,+Y,)
y,=k
i-l
X
=
This suggests that the new model coincides with the Mills model. In this way there holds the following equation
=35.
i-l
l-y,
make yi + y, + y, = 2, there are three cases:
y, = I, Y, = 1, Y, = 0 y, = 1, Y, = 0, Y, = 1
= (:)(r?k)
y, = 0, y, = 1, y3 = 1.
From (4-l) we note LO, l,O) = p,(O)p,(l)p,(2) Ul,O, uo,
1) = p,mpo(l)p,c2) 1,l)
= p,mplmp,(2).
However for yi = 1, y, = 1, y3 = 0 we have p,(O) = ~
Nl
NI + No
5 = 7
2
Nl - 1 p,(l)
=
p,(2)
=
N,+N,-1
N, -
=?
2+2
.N,+N,-2
2 -5’
For yi = 1, y, = 0, y3 = 1, we have p,(O) = +
p,(l)
=
1
N, - l+Y, N,+N,-1
=?
where y, = 0 and {yi; i = 1,. . . , n} is confined to be zero or one, and No, N,, 12,k(k I II I No + Nl> are arbitrary positive integers. Although the new model coincides with the Mills model, it shows advantages over the latter in the fact that no defects are required to be seeded and that the new model is applicable to a broader scope of circumstances. Let us consider a case. Suppose that 15 defects are secretly seeded into the program by one programmer and the program is then tested by another programmer. After a rather long period of testing, the testing programmer detects 12 seeded defects and no indigenous defects are found. The testing programmer is then convinced that all of the seeded defects have been removed (in fact, he doesn’t know how many defects were actually seeded beforehand). According to the Mills model, the estimate of the number of indigenous defects is zero. Obviously the estimate is not so persuasive. The testing programmer convinces himself that all of the
J. SYSTEMS SOFTWARE
Estimating Defects seeded defects have been removed when there are in fact three seeded defects left undetected. (This was an actual case arising between the author and a Ph.D. student, where the author served as the seeding programmer and the Ph.D. student as the testing programmer). Then how can we be convinced that all of the indigenous defects have been detected? To get another estimate of the number of indigenous defects, we may use the new model. We just divide the program into two parts: part 0 containing, say, 8 seeded defects, and part 1 containing other 7 seeded defects. By determining {yi}, the estimate of the number of indigenous defects is then obtained. Since A$ represents the number of seeded defects and is known, the estimate of the number of indigenous defects, N,, is determined by the following equation
i
i=l
(
1
’ -Yi
&-i+1+
7. ESTIMATION
109
1998; 40:93-114
j-1
-
cyj j=O
USING
*
Iv,+Iv,-i+1
1 0.
=
of software reliability problems from the Bayesian viewpoint. In this section we don’t attempt to present a survey or review of existing Bayesian techniques for software reliability modeling. Instead, we confine ourselves to the development of Bayesian versions of the Mills model and of the new model developed in Section 4.
7.1 Bayesian Version of the Mills Model
In order to get a Bayesian extension to the Mills model, we take prior distribution of the number of indigenous defects as follows m = 0,1,2,. . .
P{N, = ml = &
Given A’,, n being known, according to the Mills model, the conditional probability distribution of random variable 5 is P(N, = m I 5 = k) =
BAYESIAN =
METHODOLOGY
Using Bayesian methodology to deal with reliability problems is not a new idea. There has been much Bayesian reliability research and engineering practice [Martz and Wallet-, 19821. Bayesian methodology shows its attraction under the circumstances of limited availability of reliability data. Further, compared to the conventional (frequency) methodology, fewer reliability experiments are required. Bayesian methodology employs prior or empirical probability distribution to compensate for the shortage of reliability data available. The advantages and disadvantages of Bayesian methodology come from the use of prior distributions. In the field of software reliability engineering, Littlewood and Verrall may be the first ones to take use of Bayesian methodology. They developed a Bayesian (dynamic) software reliability model in 1973 [Littlewood and Verrall, 19731, and thus activated the advent of a class of Bayesian dynamic software reliability models. To estimate the number of defects remaining in the program, Jewel1 suggested series and parallel search strategies of defect inspection processes for Z inspectors to estimate defect remaining in the program [Jewell, 19861. Although Bayesian methodology is controversial, and even the probabilistic methodology in itself is debated in software reliability modeling [Cai, et al. 19911, it may be helpful to pay attention to the alternative treatment
PINo = m, .$ = k} P{5=
k}
Pi5 = k I No = mJP{N, = m} 5 PI 5 = k I No = m)P{ No = m) m=O
j;)(nmk)
.-
1
=
Or PIN, = m I 5 = k)
1 =
??
m!(N, + m - n>! * (Nl + m)!(m
m 2
- n + k)!
maxtn - k,n - Nl).
With this posterior distribution, there may be three choices for the estimate of No: posterior mean, posterior median, or posterior mode or generalized maximum likelihood estimator [Muss, et al., 19871. Here we only consider the last.
110
K.-Y. Cai
J. SYSTEMS SOFTWARE 1998; 40:93-114
Let
7.2 Bayesian Veision of the New Model: Case I In this case we consider Bayesian estimation of with Nr known. Bayesian estimation of Nr with known is just the dual case to this and can processed similarly. As in Section 7.2, we assume prior distribution is
ML(n, k) =-.
+ m - n)!
1
m!(N,
2”
(N, + m)!(m
1
-n
+ k)!
m(m - 1) ***(m - n + k + 1)
=_. 2”
(N, + m)(N,
+ m + 1) a**(NI + m - n + 1)
Given No = m, according to the new model, the conditional joint probability distribution of {Y1,...,lQ is
n-k-l =
-mln2
+
C
ln(m -i)
-
z
m = 0,1,2,...
P{N, = ml = &
In ML(n, k)
No No be the
ln(m + i).
i=N,-n+l
i=O
CL,(Y,,...,Y,)
Obviously, maximizing P{N, = m I 5 = k) with respect to m is equivalent to maximizing In ML&z, k) with respect to m, and
= P{Yl =y1,...,
Y, =y,
IN, =mI
i-l
Yi
i-l
1 -yi
d In ML(n, k) dm n-k-l =
-ln2+
1
C
-m-i
i=O
By using Bayes’ formula, we can get the posterior distribution as follows
- i_Ng~+*Ai*
Q(Y,,...,Y,)
so
the estimate of No, denoted by io, is determined by the following equation n-k-l c
= P{N,
1 --
i=O P&--i
;
=
--ln2=0 *l
HYl
= m I Yl =ylr...,Yn ‘Yl,...,
with fro 2 max(n - k,n - NI).
Or
l_Yi
i-l
i-l
Nl -
c yj
;fi[m:-;ll_+fq j,,Nl:;+lJ i1
1 -Yi
i-l
NI -
m-i+l+vCyj j=O
m+N,-i+l
C Yj
j=O
m+N,-i+l
Let i-l
NL,(y,,
. . . , Y") =
&fJ
Cyi
1-yi
i-l
NI -
j=O
m+N,-i+l
I No = mP{N,
i PWI =y 1,. . . , Y, = y,P{N, m=O
~=N,-~+I No + i
m-i+l+
Y, = y,
=yJ
m+N,-i+l
-C Yj i-0
\ yi
Yi .
= m1 = ml
J. SYSTEMS SOFTWARE 1998; 40:93-114
Estimating Defects Obviously, maximizing PL,( y,, . . . , y,) with respect to m is equivalent to maximizing In NLlo),, . . . , y,> with respect to m, whereas In NL,(y,,
111
Of course, the posterior mean and posterior median of No can be also obtained on the basis of &()I,, *. *7YJ.
. . . , y,)
7.3 Bayesian Version of the New Model: Case 2
i-l
m-i
+ 1 + Cyi
In this case we assume both No and N1 are to be estimated and their prior distributions are
j=O
P(N, = m) =
&
m = 0,1,2,...
and d In NL,(y,,
. . , Y,)
dm
P{N, = k) = & 1
’ -Yi i-l
m-i+l+
-
Cyi
m+N,-i-b1 I
j=O
So the posterior mode of No, denoted determined by the following equation
i i=’
1
by fro, is
respectively. Further, No and N, are mutually independent. Given No = m and N1 = k, according to the new model, the conditional joint probability distribution of {Y1,. . . , Y,} is C&(Y,,...,Y,) = P{Y, =y1,...,
i-1 Cyi
-
j=O
I
Then the posterior distribution is
=
m
m=O
k=O
m c m=O
Or
pw,=y,, ...,Yn=y,,INo=m,NI=k)P{No=m,N,=k) c
PIY, =y,,...
,Yn=y,INo=m,NI=k)P{No=m,N,=k)
=y1,..., Y, =y, I No = m, NI = k)P{N, = m)P(N, = k) m c P(YI=y,,...,Y,=y,,IN,,=m,N, =k)P{N,=m)P{N, =k)’
m,
k=O
1-y*
.-, N,+N,-i+l
PL,(Y,,...,YJ
c
1
1
- In2 = 0.
cc
Y,=y,IN,,=m,N,=kl i-
l -Yi
tio-i+l+
=
k = 0,1,2,...
i-l
Y,
112
J. SYSTEMS SOFTWARE 1998; 40~93-114
K.-Y. Cai
The posterior modes of N, and Ni can be obtained as follows. Let
No + N,, is as follows
l-y,
i-l
Yi
i-l
P{N = r) = &
r= 0,1,2,...
In order to obtain N,, we should first tions of No and Ni Suppose the prior
Bayesian estimates of No, and try to attain the prior distribufrom that of N. distributions of No and N1 are m = 0,1,2,. . .
PIN0 = ml = pJm)
X
m+k-i+l
.
and
Then
k =
PUV, = kl =p,(k)
InNL,(Y,,...,YJ = -(m
Since No and N, are identically distributed, we may signify
+ kh2
p,(m)
m-i+l+‘sy, j=O
-In(m
0,1,2,...
+ k - i + 1) +
= p,(m)
= p(m)
m = 0,1,2,...
Then the prior distribution of N can be expressed in terms of {p(m); m = 0, 1,2,. . .I as follows
yiln
P{N = T) = PIN,, + N, = r) 8
. . . , y.1
In NL,(y,,
= 5 PIN0 = m, No + N, = r)
dm
m=O
1
l -Yi i-l
m-i+l+
-
Cyj j=O
=
m+k-i+l
dk =-
ln2+
5
-
i=l
1
Yi
+
m+k-i+l
*
i-l k-
CYj j=O
I
So the posterior modes of No and N,, denoted by flo and fii, respectively, are determined by the following equations
2
( (.
l
1
-Yi
i-1
i=1 &-i+l+
Cyj j=O
-
*
NO+fiI-i+l
I
-ln2=0
i i=l
Yi
i-1
rj, - CYj j=O
-
1 1 No+&ii1 1
- ln2 = 0. 7.4
P{N,,=m,N,
=r-m}
r =
. . , YJ
d In M&I,.
i m=O
Bayesian Version of the New Model: Case 3
In this case we assume that No and Ni are independent and identically distributed random variables, and the prior distribution of their sum, i.e., N =
C
p(m)p(r m=O
- m).
So the prior distribution of No and N,, i.e., p(m), can be determined in a recursive way. With the prior distribution p(m), the posterior distributions of and, consequently, the estimates of No and N, can, at least in principle, be determined in a similar way as that in Section 7.3.
8. CONCLUDING
REMARKS
One may employ a number of methods to estimate the number of defects remaining in software. In this article we cover the dynamic model based method, the empirical model based method, the method by use of the new model, and the Bayesian method. Although no systematic survey or review is presented, analysis shows that the dynamic model based method doesn’t seem applicable to estimating the number of remaining defects, and the empirical model based method, particularly that by use of the Halstead model, seems to convey more qualitative information than quantitative information about the number of remaining defects. A practical example shows that the new model can offer good estimates for the number of remaining software defects. The input data required by the new model are simple
Estimating Defects and can be precisely collected. In fact the new model exploits more accurate information available than the Mills model. The new model shows its attraction in its applicability to a broader scope of circumstances. It is also applicable to statistical problems other than software reliability modeling. Bayesian method is controversial, however, it presents an alternative viewpoint and thus deserves attention and discussion. Obviously, much further work is required for the existing methods. For example, how do we verify the validity of an estimation method? If one method claims that there are 5 defects remaining in the program and, in the meantime, another method claims that the number of remaining defects is 9, then which claim is persuasive? This is a big open problem. Further, is an estimation method capable of reflecting the defect severity? The most serious challenge to all the estimation methods may be the unfortunate fact that the practical number of defects remaining in the program seems never known. Then how do we meet the challenge? ACKNOWLEDGMENTS Quite a few people including anonymous referees read draft versions of the paper. Their careful comments helped the author greatly improve the paper. This paper was once revised while the author worked with the Centre for Software Reliability, City University, London, UK.
REFERENCES Arden, B. W. and Astill, K N. Numerical algorithms: origins and applications, Addison-Wesley Publishing Company, 1970. Cai, K. Y., Elements of sofhyare reliability engineering (in Chinese), Tsinghua University Press, Beijing, September 1995. Cai, K. Y., Wen, C. Y., and Zhang, M. L., A critical review on software reliability modeling. Reliability Engineering and System Safety, 357-371 (1991). Duran, J. W. and Wiorkowski, J. J., Capture-recapture sampling for estimating error content. ZEEE Transactions on Software Engineering, SE-7, 141-148 (1981). Farr, W. H., A survey of software reliability modeling and estimation. NSWC-TR-82-171 (1982). Fitzsimmons, A., Love, T., A review and evaluation of software science. Computing Survey, 10, 3-18 (1978). Gaffney, J. E., Jr., Estimating the number of faults in code. IEEE Transactions on Software Engineering, SE-IO, 459-464 (1984). Halstead, M. H., Elements of software science, Elsevier, 1977. Jelinski, Z. and Moranda, P. B., Software reliability research, in Statistical computer performance evaluation, (W. Greiberger, ed.), Academic Press, 1972, 464-484.
J. SYSTEMS SOFTWARE 1998; 40:93-114
113
Jewell, W. S., Bayesian estimation of undetected error, in 13teoly of Reliability, (A. Serra, R. E. Barlow, eds.),
North-Holland, 1986, 405-425. Lipow, M., Number of faults per line of code. IEEE Transactions on Software Engineering, SE-g,4 (1982). Littlewood, B. and Verrall, J., A Bayesian reliability growth model for computer software. Applied Statistics, 22, 332-346 (1973). Lyu, M. R. (ed.), The McGraw-Hill handbook of software reliability engineeting, McGraw-Hill, 1996. Mar&, H. F. and Waller, R. A., Bayesian reliability analysis, John Wiley & Sons, 1982. Mills, H. D., On the statistical validation of computer program, FCS-72-6015, IBM Federal System Division, 1972. Moranda, P. B., Prediction of software reliability during debugging. Froc. Annual Reliability and Maintainability Symposium, 1975. Muss, J. D., Iannino, A., and Okumoto, K., Sofhvare reliability: measurement, prediction, application, McGrawHill, 1987. Musa, J. D. and Everett, W. W., Software reliability engineering: technology for 1990s. IEEE Software, 7, 36-43 (1990). Ottenstein, L., Predicting number of error using software science. Performance Evaluation Review, 10, 157-167 (1991). Ramamoorthy, C. V., Prakashi, A., Tsai, W. T., and Usuda, Y., Software reliability: its nature, models and improvement, in Theory of Reliability, (A. Serra, R. E. Barlow, eds.), North-Holland, 1986, 287-320. Schick, G. J. and Wolverton, R. W., An analysis of competing software reliability models. IEEE Transactions on Sofikrare Engineering, SE-4, 104-120 (1978). Schneider, V., Some experimental estimators for developmental and delivered errors in software development projects. Perfonance Evaluation Review, 10, 169-172 (1981). Shanthikuman, J. G., Software reliability models: a review, Microelectronics and Reliability, 23, 903-943 (1983).
80
70 00 50 40 30 20 10 0 1 I
I
1
0.40
0.51
O.sB
Figure 8. Relative frequency density.
114
K-Y. Cai
J. SYSTEMS SOFTWARE 1998; 40:93-114
1 I._L
I
2
5
3
4
I
I
6
11
I
1
I
I
I
12
13
14
I
I
I
I
15
16 I
17
18
I
Figure 9. Structure of the Minitab Macro program.
Shooman, M. L., Software engineering: design, reliabilityand management, McGraw-Hill, 1983. Xie, M., Software reliability modeling, World Scientific, 1991.
Figure 8 shows the relative frequency density of the execution time variable. The data analysis task employed a Minitab Macro program with the structure as depicted in Figure 9. The main functions of the program modules are as-follows:
APPENDIX
Module Module Module Module
In a real-time computer controlled system, the computer needs to periodically sample responses of the controlled object and environment signals (e.g., noises) of concern, and then deliver appropriate control signals to the controlled object according to the predetermined objectives. From sampling signal of the controlled object and environment to delivering control signals, the computer consumes some excecution time. Obviously, the consumed execution time may be varying with the sampled signals in the process of system operation, but must be less than a predetermined time bound. From a statistical viewpoint we can treat the consumed execution time as a random variable. Then a question arises: what is the underlying probability distribution? To answer this question, we need a series of observed values of the consumed execution time. Our statistical data analysis task was to analyze a series of consumed execution times (15,884 data in total) observed on MARS system for the rolling ball problem.7
‘For information about the rolling ball problem for MARS, see: H. Kopetz, et al., “Distributed Fault-Tolerant Real-Time Systerns: The MARS Approach, IEEE Micro, Vol. 9, No. 1, 1989, pp. 25-40. H. Kopetz, et al., “The Rolling Ball on MARS,” Institut fur Technische Informatik, Technische Universitat Wien, Treitlstrasse 3/182.1, A-1040 Wien, Austria, 1991.
1: 2: 3: 4:
Module 5: Module 6: Module 7: Module 8: Module 9: Module 10: Module 11: Module 12: Module 13: Module Module Module Module
14: 15: 16: 17:
Module 18: Module 19:
Main procedure Sort execution tirires in an appropriate order Display various curves of the execution times Split the execution times into two subsets and calculate the mixture probability Estimate Weibull distribution parameters Implement Newton algorithm for solving a single variable equation Calculate values of Weibull parameter function Calculate derivative of Weibull parameter function Calculate values of Extreme Value parameter function Calculate derivative of Extreme Value parameter function Get the relative frequency density Display fitted Weibull distribution Estimate Extreme Value distribution parameters Display fitted Extreme Value distribution Estimate Normal distribution parameters Display fitted Normal distribution Estimate Log-Normal distribution parameters Display fitted Log-Normal Value distribution Test randomness of a sample