Volume 2, Number 6
OPERATIONS RESEARCH LETTERS
March 1984
ON A SOFTWARE AVAILABILITY MODEL WITH IMPERFECT MAINTENANCE
J.G. SHANTHIKUMAR Systems and Industrial Engineering, University of Arizona, Tucson, AZ 85721, USA Received April 1983 Revised July 1983
in this paper we consider a general software availability model and derive compound availability measures, such as the joint probability of software availability and the remaining number of errors. The results given here generalize the results given in Kim et al. (1982). Software availability, imperfect debugging, compound availability measures
I. Introduction During the last fifteen years, several studies have been carried out to study software failure phenomena and to develop and apply software reliability/availability models to predict software system performance. In a recent paper, with over one hundred and thirty references, Shanthikumar [8], has given a review of these software reliability models. Recently Kim et al. [2] have reported an explicit expression for the compound availability measures for the following software maintenance model: the software system alternates between up- and down-states over time. The up times are exponentially distributed with mean l/(r?~) when there are r software errors remaining in the system. The down periods are exponentially distributed with mean 1/(/t o +/t~). During a down period the numer of errors in the software is reduced by one with probability/to/p o +/tt ), and with probability/tt/(/t 0 +/t ! ) the numer of errors in the software remains the same. The purpose of this note is to bring to the attention of the readers a general software availability model available in the literature and present results for a further extension of this general software availability model. In 1978, Okumoto and Goel [5] analysed a software availability model which is a generalization of the model proposed by Trivedi and Shooman [11] (and also of the model of Kim et al. [2]). The model considered by Okumoto and Goel is: the software system alternates between up- and down-states over time. The up periods are exponentially distributed with mean 1/?,r and the down periods are exponentially distributed with mean l / / t r, when there are r software errors remaining in the system. During the down periods the number of errors in the software is reduced by one (perfect debugging) with probability p, and with probability ( 1 - p) the number of errors in the software remains the same (imperfect debugging). Compound availability measures for this model are obtained using a semi-Markov analysis. Clearly this model is more general than the model considered by Kim et al. [2]. in this paper we consider a general software availability model which generalizes that of Okumoto and Goel [5], which indeed is more general than that of Kim et al. [2]. In particular the nature of our generalization is: (i) to allow a variable probability Pr of perfect debugging as a function on the number of remaining errors (r), and (ii) to allow for general time to failure and repair time distributions. For such an extension we derive the compound availability measures. Specifically we consider the joint 0167-6377/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)
285
Volume 2, Number 6
OPERATIONS RESEARCH LETTERS
March 1984
probability distribution of the number of software failures up to time t and the availability (i.e. working) or the unavailability (i.e. under repair) of the software. In the next section we will describe and analyze a general software availability model. Expressions for several compound availability measures will also be given. In Section 3 we obtain the results of Kim et al. [2], OkumoLo and Goel [5] and Shanthikumar [6] as special cases of the results presented in Section 2.
Consider a software system subjected to failures and repairs. This software system alternates between up- and down-periods over time. The up- and down-periods are random and stochastically independent of one another with cumulative distribution functions F, and G, respectively, when there are r errors remaining in the software. Naturally we set Fo(x)ffi 1 - Fo(x)- 1, '¢x >_.O, i.e., the software does not fail after all the errors have been removed. During the down periods started with r remaining errors, the number of errors in the software is reduced by one with probability p, and, with probability (1 - Pr) the number of errors will remain the same. It is assumed that the software just came up at time zero with n remaining errors. Let the pair ( l ( t ) , N ( t ) ) represent the state of the software system at time t, Vt >_.O. l(t) takes a value 1 if the system is up at time t, and 0 otherwise. N(t) is the number of errors remaining in the software at time t. Then 1(0)- 1 and N ( 0 ) - n . Now let (Tk) ~ with 0 - To < 7"i < T2 < ..., be the epochs at which the software is brought to the up conditions after the down periods. We assume that F , ( 0 ) - 0, r - 1, 2,... ,n, so that the probability of two such epoches occurring at the same time is zero. Let P(k, x; r) be the probability that the system is up and the number of errors remaining is k, at x time units after the system became up with r errors remaining. Let Q(k, x; r) be similarly defined but representing the down state of the system. Then the Palm probabilities with respect to time t and epoch T~ are:
P(k,x; r) A- P{ l(t + x)= 1, N ( t + x ) f k l T i f t ,
N(t)ffir},
(1)
Q(k, x; ,)--- P( l(t + x ) ; 0 , N(t + x ) - - kiT1 --" t, N ( t ) = r },
(2)
and for x >_.0, 0 ~ k ~ r ~
e(k,x;r);p, x>_.0,
e(k,x-,;r)dX,(;),
lo"
(3)
kf0,1,2,...,r-1,
and
P(r,x;r)-F,(x)+(1-p,
P(r,x-¢;r) dH,(¢),
x>~O~ r = I, 2 , . . . , n ,
(4)
where Hr(x) = J~F,(x-~) dG,(~-) = F~ * G,(x), x >I 0 is the convolution of F, and G,, and ,F = 1 F,. Now taking the Laplace transform on both sides of (3) and (4), and solving for P(k, x; r) (or using the well-known general form of the solution of renewal equations), one finds
P(k,x;r)ffo~'P(k,x-¢;r-1)dH*(¢),
x>_.O, k = 0 , 1, 2 , . . . , r - 1,
(5)
and
P(r,x;r)ffi
(x),2q
x> 0, r f l , L...,,,,
(6)
where 0o
/'/* ffi
~
p,(1
- ,'..,(,+1)
-e,J
,,,
I-0
= y. (1-p,)'H;'), I-O
286
,
(7) (8)
Volume 2, Number 6
OPERATIONS RESEARCH LETTERS
March 1984
and H,(I) is the I fold convolutions -~fH, with itself and H,(°)(x)-- 1, Vx >i 0. Then using the iterative nature of (5), one finds that the compound availability measure P(k, t; n) for the software system is
P(k, t,.)- ['-F~(t-r)
d~..k+i(~-),
.Io
t>_.O, kffi O, 1, 2 , . . . , n - 1,
(9)
and
t> o,
0o)
¢,.,,=nI.nI_,.....at,
(11)
where
i
and F0*o(x) = I, x >I 0. A similar analysis for the compound unavailabilitymeasure Q,(k, t; n) gives
Q(k,t;n)=~,,k+,(t)-~,,k(t)-P(k,t;n), Q(n,t;n)-F~(t)-H*(t), t>~O.
t>~0,
k=l,2,...,n-1,
(12) (13)
Now let ¢~k be the cumulative distribution function of the first passage time for the software system to reach k remaining errors. I.e., if S k - inf{ t; N(t) = k }
(14)
then
~k(t)--P{$k~t},
t> o, kffi0,1,...,n-I.
(15)
Using a similar analysis as above it is easily verified that ¢Jk(t)ffi~,.k+1(t),
t>~0,
kffi0,1,2,...,n-l.
(16)
Now ¢~k's can be used to find the marginal probability
P{N(t)-k}-q,k(t)-~k_l(t),
t>_.O, k-l,2,...,n,
(17)
and
t> o.
(18)
where we set ~,,(t)= 1, V! >_.0. Note that P(0, t; n ) = P{ N ( t ) = 0} and equation (17) is the same as the sum of (9) and (12), and the sum of (10) and (13). The probabilistic interpretations of these equalities are immediate. Further the probability that the software is working (i.e., software availability A(t)) at time t is: n
A(t)A-P{I(t)-I} - E P(k,t;n), t>~O,
(19)
k-0
and the probability that the software is not working (software unavailability U(t)) at time t is: n
U(t)=P{l(t)=O}-
Y'. Q(k,t,n), t>~O.
(20)
k-!
Note that A(t) + U(t) - 1, Vt >t 0. The above results could not be simplified any further unless specific forms for (G,)~' and (F,)~' are specified. However, it is worth noting that if G, and F, are of the phase type (see Neuts [4]), or of the generalized phase type (see Shanthikumar [9]) or possess Laguerre transforms (see Keilson and Nunn [1] and Sumita [10]), one may compute the required convolutions and hence the compound availability measures very efficiently. In the next section we will consider some special cases. 287
Volume2, Number 6
OPERATIONSRESEARCHLETTERS
March 1984
3. Special eases
Case 1. Suppose F, is exponentially distributed with mean 1/rA, G,(x)= 1, Vx >I 0 and p, = p, r-- 1, 2,..., n. Then after straightforward algebraic manipulations, one obtains
and
Q(k,t;n)=O,
t>_.0,
k = 0 , 1, 2 , . . . , n - 1.
These and other related results using alternate derivations are given in Shanthikumar [6]. Note that in this case we have instantaneous repairs. Then one may use an alternative derivation to obtain P(k, t; n) based on the fact that N(t) is a pure birth process with independent organisms and exponential lifetimes with mean l/p~. The results obtained this way agree with the above. Case 2. Suppose F, and 6;, are exponentially distributed with means 1/~, and 1/~,, respectively. Then taking the Laplace transform on both sides of (9) one finds that
i)(k's;n)~
°°e-stp(k't;n) dt=
,-~+,
P:Xdt'
j-,,
(s
-
Aj)(s
-Bj)
(~+~'~)'
(21)
Re(s) > O, k = 1, 2 , . . . , n - 1,
and
p,x,t,,
P(o,s; ,,)= i-I
/( S -
j-1
(S-
'
/
Re( s ) > O,
(22)
where Aj and Bj are the unique solutions to AjBj = pj~jpj and Aj + Bj = - ( ~ j +/~j), forj = 1, 2 ..... ,. Now taking is heaviside expansion of (21) and (22) and assuming that A~ ~ Bj ~ A t ~ B~, Vj ~ ! >1k, one has (see page 656 of Kuipers and Tillman [3])
P( k, t;
n) =
(A~ + ~.) e~,,+ (Bj+~,.) .,,
I-I p,x,~,, i-k+!
j-k
Ca(j, k)
CA(j,k)
t>_.0, k = l, 2,...,n- l,
(23)
and
east
eS" ] +~.C,,(j. 1)'
t>_.O,
(24)
where
:
rI
} .
(25)
I
(26)
mc,j
and
(
t,,j
):
for j, k - 1, 2,...,n. Similarly from (10) one finds that, if A, ~ B,,
p(n,t; n)= (A. + lt.)e~., + (B. + lt.)ea., ' t>~O. A.-B. B.-a. 288
(27)
Volume 2, Number 6
OPERATIONS RESEARCH LETTERS
March 1984
If at least two of the coefficients among (Aj, Bj)~' are the same, then the above expressions (23), (24) and (27) should be appropriately modified using the corresponding heaviside expansions (see page 656 of Kuipers and Tillman [3]). The expressions for Q(k, t; n) and ~k (t) are obtained similarly. They are: Q(k,t;n)=
p~,~#, •=
I
eA.t ~ + Q ( n , t ; n ) - ~ n An_B.
+ J=*
C4(j,k)
Ca(j,k ) ,
t>~O,
1 2, =
,
.... n -
1,
(28)
eBnt ]
B~-A---~ '
k
t>~O,
(29)
and
4,,,(t)=I + (Ii = kI+ l t>~O,
e',' P'~"P' j-k+! A j C A ( j , k + 1) + B j C n ( j , k + 1)
(30)
kffiO, 1, 2 , . . . , n - 1,
where P0~0P0 ~ 1. Case 2.1 (Okumoto and Goers [5] model). When p, - p , r--- 1, 2,...,n, one sees that (3) is a simplified version of the results (equation (11)) of Okumoto and Goel [5]. Further equations (23), (24) and (27) give the real domain results corresponding to the transform result of Okumoto and Goel. It is worth pointing out that the forms of the results given here are more easily computable than the forms given in Okumoto and Goel. Case 2.2 (Kim et al.'s [2] model). When k r = rk, Itr ffi tto + ttl and p, =/~0/(~to +/tl), r ffi 1, 2:... ,n, one can verify that equations (23) and (27) agree with equation (6) of Kim et al. [2]. Furthermore equation (24) agrees with their equation (6) for po(t). Though their expression for q , _ k ( t ) (equation (8) there), seems different from our results (28) and (29), their equivalence is immediate after substituting the relationship +
-
1
]-0, (31)
into their equation (8).
Conclusion In this note we have formulated and solved a general software availability model and showed that the results given by Kim et al. [2] and Okumoto and Goel [5] are special cases. The analysis presented here can be extended to the study of hardware/software availability models. However, alternate approaches should be used for such combined modelling if time dependent failures are postulated for the software system (e.g., Shanthikumar [8] for time dependent failure models). In our general model, the probability of perfect debugging is allowed to depend on the number of remaining errors. When there are several errors in the software, it is often easier to detect and correct the error that causes software failures. But as the software system becomes relatively error free, it becomes harder to detect and correct an error that causes a software failure. In such a case one may use different values for the probability of perfect debugging when the number of remaining errors is different. The analysis in this paper is restricted to software systems where multiple error generation or multiple error removals are not present. It will be worthwhile to consider the extension of this model incorporating this aspect. 289
Volume 2, Number 6
OPERATIONS RESEARCH LETTERS
March 1984
Acknowledgement The author would like tO thank the referee for his helpful comments.
References 111 J. Kielson and W.R. Nunn, "Laguerre transformation as a tool for the numerical solution of integral equations of convolution type", Appi. Math. and Comp. 5, 313-359 (1979).
121 J.H. Kim, Y.H. Kim and C.J. Park, "A modified Markov model for the estimation of computer software performance", Operations Research Letter I, 253-257 (1982).
131 L. Juipers and R. Tillman, Handbook of Mathematics, International Series of Monographs in Pure and Applied Mathematics, Vol. 99, Pergamon Press (1969),
i4i M.F. Neuts, "Probability distribution of phase type", in: Liber Amicorum Professor Emeritus H. FIorian (Department of Mathematics, University of Louvain, Belgium) 173-206 (1975).
lS! K. Okumoto and A.L. Goel, "Availability and other performance
measures of software system under imperfect maintenance", COMPSAC, 66-70 (1978), 161 J.G. Shanthikumar, "Software performance prediction using a state-dependent error-~currence.rate model", in: Proc. Nineteenth Annual Technical Symposium, 67-73 (1980). [71 J.G. Shanthikumar, "A general software reliability model for performance prediction", Microelectronics and Reh'ability 21, 671-682 (1981). [81 J.G. Shanthikumar, "Software reliability models: A review", Microelectronics and Reliability 23 (1983) to appear. [91 J.G. Shanthikumar, "Generalized phase type distributions", Working paper, Systems and Industrial Engineering, University of Arizona, Tucson (1983). llOi U. Sumita, Development of the Laguerre transform method for numerical exploration of applied probability models, Ph.D. Dissertation, University of Rochester, Rochester, NY (1981). [111 A.K. Trivedi and M.L. Shooman, "A many state Markov model for the estimation of computer software performance parameters", in: Prec. International Conference on Reliable Software, 208-220 (1975).
290