Microelectronics and Reliability, Vol. 16, pp. 135 to 141. Pergamon Press. 1977. Printed in Great Britain
AVAILABILITY PREDICTION BY USING A METHOD OF SIMULATION B. A. BASKER Department of Engineering Production, University of Technology, Loughborough, Leicestershire LEII 3TU, U.K. and P. MARTIN
Department of Mechanical Engineering, Brownlow Hill, University of Liverpool, Liverpool L69 3BX, U.K. Abstract--The work described in the literature on the subject of Availability is based mainly on the assumption of exponential distribution of failure and repair times. If this assumption is not valid, evaluation of Availability itself is a complex problem and has rarely been attempted. This paper describes a method based on simulation to evaluate Availability of units in a system, be it an electrical or a production system. The failure and repair rates of units in the system need not necessarily follow the exponential distribution.
1. I N T R O D U C T I O N
Availability is the combined effect of reliability and maintenance. It is defined as follows: Availability =
Uptime Uptime + Downtime'
(1)
where Uptime is the time the system had been working without failure and Downtime is that necessary for maintenance and is usually assumed to include the following: (i) the actual repair time (ii) the waiting time for repair (due to insufficient number of repairmen employed) (iii) the time waiting for spares, tools etc. (iv) the administrative Downtime etc. In this paper Downtime is restricted to only that arising due to (i) and (ii). The Uptime, the actual repair time and the waiting time for repair can all be probability distributions. In the case of exponential distributions Availability can be represented as follows: A=
MTBF MTBF + MTTR + MTWR
(2)
because of the obvious reason of the cost involved in employing them. An optimum number of repairmen must be determined to minimize the total cost. The waiting time for repair will change according to the number of repairmen employed. So it is also necessary to include the waiting time for repair in the Downtime. In the case of exponential distribution of the variables R and RM, the distribution of waiting time can be determined by applying analytical methods available in queueing theory. In practice the assumption of exponential distribution is not always valid [2]. If the variables R and R M are governed by say, normal distributions, then an analytical method of determining the distribution of waiting time becomes extremely complex. The problem becomes much more complex when there are several different units in the system each governed by a different distribution-hence evaluation of Availability of each unit becomes difficult. This can be done in a simple and elegant way by developing a numerical method. The numerical method described in this paper is based on a simulation technique. 2. A BRIEF SURVEY OF LITERATURE
where, A = Availability Much of the study on the subject of Availability MTBF = mean time between failures reflecting has been made mainly to help the designer to improve reliability R the design of components so that a better performMTI'R = mean time to repair reflecting repairabi- ance is achieved. This Availability is called the intrinlity RM sic (or inherent) Availability and it includes only the MTWR = mean time waiting for repair built-in reliability and repairability characteristics of It has been common practice in Availability studies the units. If these two are known (i.e. R and RM) to assume that there was an unlimited supply of then evaluation of intrinsic Availability is straightforrepairmen[l] to keep the system running. This ward. Evaluation of Availability itself becomes comassumption is made in order to make the Availability plex when the variables R and R~ do not follow the of each unit statistically independent of all others. An exponential distribution and that waiting time is also unlimited number of repairmen cannot be employed taken into consideration. To evaluate the Availability 135
136
B.A. BASKIP, and P. MA~:]'l',
of one unit in the system, the reliability and repairability characteristics of other units have also to be considered. The literature contains only a limited amount of information on this aspect of Availability. However the work of the following authors have been included. Virene[3] in his paper entitled "Waiting line tqueueing) effects on Availability" describes how the variability of time between failures and the variability of maintenance time can cause delays in the case of exponential failure and repair rates. Kenneth Grace Jr. [1] has obtained a method for the evaluation of system Availability. He uses a general topological approach based on the concept of set theory. He makes the assumption that there are unlimited supply of repairmen to keep the system running. Because of this assumption, the steady state Availability of each unit becomes statistically independent of all others. McNichols and Messer Jr. [4] have developed a cost based procedure for allocating the Availability parameters (repair times and failure rates) to the various components that make up a system. The allocation was considered as a cost minimization problem, subject to the constraint of meeting a system Availability requirement. The problem was solved using the Lagrange multipliers approach. The assumption of constant failure rates of components is made. With this assumption of constant failure rate, evaluation of Availability of each unit becomes easier and the problem is to optimize system Availability. They conclude that without the assumption of constant failure rate analytic solutions are not usually feasible and often impossible.
(i) Operating time (ii) Actual repair time (iii) Waiting time that arises due to insullicient number of rcpamnen working. (5) For the purpose of simplification each unit in the system is assumed to be independent of others. That is the failure of o n e unit does not al]i~ct the operation of other units in the system. 4.2 Data requirement,~ The density functions of reliability and repairability of each unit in the system. 4.3 Description of the method A flowchart describing the method is given in Fig. 1. The logic of the method is first explained by perTable 1. Reliability and repairability data of two units considered for describing the hand simulation
Unit
Reliability R Standard Mean deviation It a (hr)
(hr)
1
1O0
l0
6
I
2
120
10
7
1
Read mean, variance etc of reliability and repcfirabi lity functpons
t Determine a breakdown
OF THE PROBLEM
BY
OF
q
--
time I
T
Breakdown hme is added to J cumulative breakdown trine ,
A system consiting of n units is considered. The problem is to evaluate the Availability of each unit when the reliability and repairability functions of each unit are not necessarily exponentially distributed. Availability is evaluated using the formula given in equation (1). The actual repair time and the time waiting for repair are included in the Downtime. The time waiting for repair will vary depending on the number of repairmen working. Hence the Availability will also vary for differing number of repairmen and this is evaluated. 4. D E T E R M I N A T I O N
|
Select cl r~ndom number /
t 3. A S T A T E M E N T
Repairability R u Standard Mean deviation /~ a (hr) {hr)
I
t
i
o random number to obtain o corresponding repair time for that particular breakdown T Determine repair %me
Select
J
I I
[
Repeat 'n ~ times i,-
I
t S e l e c t t h e unit which has .....
failed first and awaiting repair
No
Yes
AVAILABILITY
SIMULATION
4.1 Assumptions (1) The time for a breakdown to occur is a random variable and the time to repair a breakdown is also a random variable. (2) Only one repairman can repair a failed unit at any one time. (3) Repairs are conducted on a first come first served principle. (4) The total time which is used in evaluating Availability is assumed to include only the following:
] [. . . .
Duted ~ _ ~
Cprrr~ieii~iitir~e,:o R~ohwa~ j
t
I repair timeoo.ot,mo + waiting
time
I cumo,oto down,,me
I I
! i _ Cumulative do w nti me Evaluate Upfime Avadablllty = Up?~-eme~ ? i r n e
at're' --
t, m_es
Fig. 1. Flowchart showing the structure of the simulation process.
Availability Prediction
137
Unit/. (/~= I 0 0 , ~ = I 0 )
Unit 2 (/~= 120,~= I 0 ) t = Time t o breakdown (hr) Fz(t) = Cumulative probability of ' t '
t = Time t o breakdown (hr) Ft(t) = Cumulative probability of ' t '
A ~_
I .OC
1.00
09C 0.84 08C
O.901
0 70
0.70
0.80
0.6C
0.60
OSC
0.50
0.4C
0.40
05C
030
0.2C
020
Oil(
010 I
80
70
90
I00
I10
120
130
I00
I10
120 t
t
Fig. 2. Cumulative distribution of time to breakdown of Unit 1.
I 150
Unit2 ( / ~ = 7 , ~ = I) r = Time required to repair a breokdown(hr) G~(T) = Cumulative probability of ' r '
= Time required to repair o breakdown (hr) G l ( t ) = Cumulative probability of 'T' a"
I.OG
1.00
0.90
0.90
0.80
0.8C
0.70
C~7O
060
060
050
0.50
0.40
0.40
030
0.30
0.20
0.20
0.10
o;
140
Fig. 4. Cumulative distribution of time to breakdown of Unit 2.
I)
Unit I ( / ~ = 6 , ~ =
130
OllO 4
5
6
7
I
8
9
O~
5
6
7
8
9
I0
Fig. 3. Cumulative distribution of time to repair Unit I.
Fig. 5. Cumulative distribution of time to repair Unit 2.
forming a hand simulation for a system consisting of two units. Simulation is performed using the event increment method. The reliability (R) and repairability (Ru) functions of each unit are assumed to be normally distributed for the purpose of describing the method. The values of the parameters describing the distribution are as given in Table 1. (a) The density functions of reliability and repair-
ability of each unit are converted to cumulative distributions. The graphs are shown in Figs. 2-5. (b) T w o different sequences of random numbers between 0 and 1 are obtained for each unit to determine the breakdown and repair times. Example. The following is a sequence of ten random numbers between 0 and 1 obtained from a list of random numbers.
Serial number Sequence 1 to determine breakdown times t for unit 1 Sequence 2 to determine repair times • for unit 1
1
2
3
4
5
6
7
8
9
10
0.292
0.500
0.548
0.598
0.205
0.670
0.748
0.576
0.089
0.221
0.105
0.170
0.345
0.160
0.078
0.798
0.819
0.428
0.512
0.003
138
B.A. BASKIdRand P. MARl'IX,
Table 2. Simulated sample of ten breakdown and repair times for unit 1 from Figures 2 and 3 Specific breakdown times Specific breakRandom down time from number Figure 2 0.292 0.500 0.548 0.593 0.205 0.670 0.748 0.576 0.089 0.221
94.75 100.75 101.50 103.00 92.50 104.50 106.75 102.25 87.25 92.50
Repair times Repair time Random from number Figure 3 0.105 0.170 0.345 0.160 0.078 0.798 0.819 0.428 0.512 0.003
4.80 5.10 5.62 5.02 4.65 6.90 6.97 5.85 6.07 3.45
The breakdown a n d repair times obtained for unit 1 using these r a n d o m numbers from Figs. 2 and 3 is shown in Table 2. A different set of r a n d o m numbers is used to determine the breakdown a n d repair times for unit 2. Table 3 shows a simulated sample of ten breakdown and repair times obtained for unit 2 from Figs. 4 and 5. (c) The breakdown times obtained are cumulated. This is to determine the length of time the unit had been operating since the beginning of simulation. This cumulated time is in fact the Uptime which is used to evaluate Availability. Tables 4 and 5 give the cumulated Uptime for units I and 2 for the data shown in Tables 2 and 3. (d) Consider the case of one repairman. Since it is assumed that the repair team operates on a first come, first served principle, the breakdowns are repaired in the order in which they occur. A tabulated summary of the procedure is presented in Table 6 and the history of each unit as the clock time proceeds from 0 hours is shown in Fig. 6. The repairman starts his operation at the end of the first breakdown. Unit 1 had failed at the end of 94.75 hr. The repair takes 4.80hr to complete. The total time the unit had been working (Uptime) is Table 3. Simulated sample of ten breakdown and repair times for unit 2 from Figures 4 and 5 Specific breakdown times Specific breakdown Random time number from Figure 4 0.219 0.428 0.562 0.150 0.816 0.045 0.696 0.009 0.132 0.592
112.50 118.50 122.25 110.25 129.75 103.50 125.25 97.50 109.50 123.01,)
Repair times
Random number
Repair time from Figure 5
0.003 0.385 0.195 0.492 0.523 0.412 0.500 0.608 0.025 0.199
4.45 6.77 6.17 7.00 7.1)8 6.85 7.07 7.30 5.12 6.17
Table 4. Cumulated Uptime for unit I
No.
Specific breakdown time
I 2 3 4 5 6 7
94.75 100.75 10I .50 103.00 92.5(/ 104.50 11)6.75
8
102.25
9 10
87.25 92.50
Cumulated Uptime 94.75 195.5(/ 297.00 400.00 492.50 597.00 7(/3.75 806.00 893.25 985.75
94.75 hr. Since the repairman is available there is no waiting time. The Downtime is just the repair time in this case since there is no waiting time. The Availability of unit t at the end of that cycle is 94.75 94.75 + 4.80
- 0.952
] h u s at the end of the first failure. Clock time CLT = Availability of unit 1 = Unit 1 commences its operation at C L T = Repairman will be free at CLT = Unit 2 is still operating.
99.55 hr 0.952 99.55 hr 99.55 hr
The second failure of unit 1 would occur at CLT = (99.55 + 100.75) = 200.30hr. But unit 2 would have already failed at CLT = 112.50 hr. Again the repairman is available as he had finished the first repair at CLT = 99.55hr and there is no waiting time. F r o m Table 3 the repair time of this failure is 4.45 hr. Thus the Uptime of unit 2 is 112.50hr a n d the D o w n t i m e is 4.45 hr. The Availability of unit 2 at the end of the cycle is
112.50 ............. 112.50 + 4.45
0.962
At the end of the second failure, Clock time CLT Availability of unit 2
= 116.95 hr = 0.962
Table 5. Cumulated Uptime for unit 2
No.
Specific breakdown time
Cumulated Uptime
1 2 3 4 5 6 7 8 9 10
112.50 118.50 122.25 110.25 129.75 103.50 125.25 97.50 109.50 123.00
112.50 231.00 353.25 463.50 593.25 696.75 822.00 919.50 t029.00 1152.00
Availability
÷~ [...
÷~.
÷~
+~.
Prediction
139
+~
÷~.~
+ ~
+~;
÷~
+~
,.~
II
II
II
,,6
~
II
c-4
eq
II ~
~
II
~5
t<
II
II
t<
~ o
".~--~ ~
+~
~..,=
~eN~
~t---
II
2° "~
..
0 ¢'~
~'.~
~"
k
h
I
I
i
I
I
I
I
I
.-d
II
•~ Z
140
B.A. BASKERand P. MAR'IIN Clock t i m e 10 Unit
I
Unit 2
5.<4 [>~-~7.~
IO0 I
~/
200 I
300
400
I
K'4-.;..~.~--.~..lrSj.-.s
500
]
/ :......... . 1 , < ~ / i ~ - -- " -- /,; - .~. . . . ~< i
-1
....-
I
[
- < - - -; /
i.,
..
.... g
600
--
....
,
....
.'0.) I--
,nrl
,.]
:;.'~: : ; (.,'~";~
Operating Under r e p a i r W a i t i n g t i m e (nil
in
the example)
Fig. 6. History of Units 1 and 2 as the Clock Time proceeds from 0.
Unit 2 commences its operation at C L T = 116.95 hr Repairman will be free at CLT = 116.95 hr Unit 1 is still operating. This process continues until all the breakdowns are repaired. Table 6 shows the hand simulated results for the first ten breakdowns. The Availability of each unit evaluated at the end of each cycle are grouped as follows.
Availability of unit I as CLT proceeds from 0 to 624.72 hr CLT (hr) Availability 99.55 0.952 205.40 0.952 312.52 0.950 420.54 0.951 517.69 0.951 Avai~bilityofunit 2 as CLTproceedsfrom 0 to 624.72 hr CLT (hr) Availability 116.95 0.962 242.22 0.954 370.64 0.953 487.89 0.950 624.72 0.949 The simulation is repeated for 2, 3 . . . . . n repairmen and in each case the Availability Of each unit is evaluated. The hand simulation is extremely time-consuming even for a total of ten failures. By using a computer the simulation can be performed for a desired number of failures depending on the available computer time.
5.
RESULTS
A computer programme was written in F O R T R A N for the CDC7600 computer to perform the simulation. A hypothetical system consisting of eight units was considered. The density functions of reliability and repairability of each unit was assumed to be normally distributed. The mean and standard deviation are as given in Table 7. Simulation was performed until a total of 100 failures of any one unit has been completed. The number 100 has been so chosen as to ensure that the Avail-
Table 7. Reliability and Repairability data of eight units considered in the hypothetical system
Unit
Reliability' R Standard Mean deviation u a (hr) (hr)
1
"~ 3 4 5 6 7 8
Repairability R M Standard Mean deviation ,u c5 thr) (hr)
100 120
10 10
6 7
110 115 80 70 105 125
10 10 10 10 10 10
8 9 10 1I 12 13
ability stabilizes within 100 failures. Any other number may be chosen if necessary. The following results were obtained. The Availability evaluated tends to stabilize after an initial oscillation. In some cases when the waiting time was more. the Availability slightly deviated from this stabilized value. But this deviation was very small and significant to only I?~,. This stabilized value was taken as the Availability (for a detailed explanation see reference 6). The Availability results are given in Table 8. The Availability of each unit increases with the number of repairmen. The Availability of unit I increases from 0.90 when there is only one repairman to 0.94 when there are two or three repairmen. Also the Availability of unit 4 increases from 0.89 when there is only one repairman to 0.92 when there are two repairmen and to 0.93 when there are three repairmen. This is because of the reduction in waiting time for repair when there are more repairmen, and Table 8. Availability results for the hypothetical system considered Number of repairmen 1 1 3 2
Availability Unit 2
3
4
5
6
7
8
0.90 0.90 0.88 0.89 0.84 0.81 0.85 0.88 0.94 0.94 0.93 0.92 0.88 0.86 0.89 0.91 0.94 0.95 0.93 0.93 0.89 0.86 0.89 0.91
Availability Prediction Availability increases while waiting time reduces. The increase in Availability for an increase in the number of repairmen is true only till the waiting time reduces to zero. Thereafter an increase in the number of repairmen does not increase Availability. In the present case the Availability remains the same for four or more repairmen. 6. CONCLUSION The numerical method developed in this paper provides a means of predicting the Availability of each unit in a production or an electrical system. Such predictions are possible if the reliability and repairability characteristics are known for individual units in the system. Existing analytical methods are based on the assumptions of exponential distribution and/or of zero waiting time between repairs. By using the proposed method such assumptions need not be made
141
and the model is more representative of the real situation. REFERENCES
1. K. Grace Jr., Approximate system Availability models, Proc. IEEE Syrup. Reliab. 2 (1), 146-52 (1969). 2. H. Ascher, Development of systems reliability models from basic physical~mathematical concept, NATO advanced study Institute, Generic techniques in systems reliability assessment, University of Liverpool, 17-27 July (1973). 3. E. P. Virene, Waiting line (queueing) effects on Availability. Proc. IEEE Symp. Reliab. 2 (1), 162-7 i1969), 4. R. J. McNichols and G. H. Messer Jr., A cost based Availability Allocation Algorithm, IEEE Trans. Reliab. R-20 (3) (Aug. 1971). 5. T. H. Naylor, J. L. Balintfy, S. Burdick and K. Chu, Computer Simulation Techniques, Wiley and Sons (1966). 6. B. A. Basker, Optimum Availability of production plant, Ph.D. Thesis, University of Liverpool (Sept. 1974).