Math/ Comput. Modelling, Vol. 13, No. 7, pp. 119-125, 1990 Printed in Great Britain. Ail rights reserved
0895-71771’90 53.00 + 0.00 Copyright 0 1990 Pergamon Press plc
BRIEF MARKOV
NOTE
PROCESS MODELLING OF A MAINTENANCE WITH SPARES, REPAIR, CANNIBALIZATION AND MANPOWER CONSTRAINTS
SYSTEM
W. W. FISHER The University
of Mississippi,
University,
MS 38677, U.S.A.
Abstract-Maintenance systems serving fleets of aircraft or vehicles are quite complex, often including spares, repair, cannibalization, manpower constraints and other factors. Analytic models of such systems in the literature, however, rarely include manpower constraints and usually incorporate only complete cannibalization if any at all. This paper presents a continuous Markov process model suitable for evaluating the performance of a maintenance system with spares, repair, cannibalization and manpower constraints. Several cannibalization and repair options are included, and the model’s usefulness is demonstrated by a sequence of experiments identifying desirable cannibalization policies under various failure, repair and cannibalization rates.
INTRODUCTION
In a system maintaining a group of similar machines, spares of machine components are often maintained to speed the repair of the machines when they fail. Further, the components themselves are often repairable. In such a system, the situation often occurs when one machine is inoperative due to the (perhaps only temporary) lack of a component at the same time that one or more other machines are inoperative due to the lack of dzjkent component(s). If the restoration of machines is important enough, maintenance personnal may “cannibalize” operative components from one or more machines to repair the other(s). Cannibalization actions produce an immediate benefit by increasing the number of operable machines, but at the cost of expending resources (particularly maintenance man-hours) that could be used for other purposes, including repair of the parts that were originally needed. This practice is particularly common in systems in which the machines are aircraft, but it is also possible in vehicle fleets and other systems where the machines are composed of sufficiently identical component parts. Most past research on maintenance systems has excluded the effect of manpower constraints and is restricted to the case of complete (restoring as many machines as possible), instantaneous cannibalization, if any at all. Sherbrooke [l], for example, uses the assumptions of complete, instantaneous cannibalization in the presence of spares and repair to calculate the expected number of operational machines. Silver [2] uses Sherbrooks’s result to derive a heuristic for setting spares levels subject to a spares budget. Khalifa et al. [3] do consider other than complete cannibalization in a simulation model, but exclude spares and repair. None of the three papers listed here (representative of most of the available literature) incorporate manpower constraints. In one of the few papers in the literature that do incorporate manpower constraints along with spares, repair and cannibalization, Fisher and Brennan [4] use a simulation model of an aircraft maintenance system to compare the relative performance (in terms of average inoperative aircraft) of several cannibalization policies. The authors conclude that neither complete cannibalization nor no cannibalization is necessarily the best policy, and further find that the desirability of cannibalization is influenced by both the availability of maintenance personnel and the ratio of mean repair time to mean cannibalization time. Like all simulation models, the Fisher and Brennan model produces only approximate results that are subject to experimental error. If certain simplifying assumptions are made, however, a maintenance system with spares, repair, cannibalization and manpower constraints can be modeled analytically and performance measures calculated exactly. The remainder of this paper is devoted to the presentation of such an analytic model. In addition, a sequence of experiments identifying MTM 13,7--1
119
Brief Note
120
desirable cannibalization policies under various failure, repair and cannibalization presented to illustrate the model’s usefulness. The source code for the model in FORTRAN too lengthy to be included here but is available from the author. MODEL
Under the following general complex maintenance system:
conditions,
rates is is much
CONSTRUCTION
it is possible
to analytically
model
even a relatively
(a) The system is always in one of a finite number of states, where a “state” is a possible condition of the system (as determined by the number and condition of the machines, spare components and maintenance personnel, for example). (b) All transitions from one state to another occur by Markov processes with known means. This is true in a maintenance system if, for example, time between failures, repair time and cannibalization time follow exponential probability distributions (assumptions common in the literature). Under these conditions, the maintenance system can be modeled as a continuous Markov process. If the system does not tend to gravitate toward any particular state, it is usually reasonable to assume that there exists a long run “steady-state” probability that the system will be found in any one state at any particular time. It is well-known that such a system can be solved for the probability of each state by solving the linear equations formed by equating the mean transition rates into and out of each state. See Heyman and Sobel[5] or an equivalent stochastic processes text for a rigorous mathematical discussion of continuous Markov processes. Managers of maintenance systems frequently seek to minimize the average number of inoperative machines (or maximize the number of operative ones) in the system. If a maintenance system is modeled as a continuous Markov process and the state structure includes information about the number of inoperative machines, the solution for the probability of each state implies the probability of each number of inoperative machines. Once these probabilities are known, the expected number of inoperative machines (as well as other system performance measures) can easily be computed. State structure The first step in the construction of a continuous Markov process is the definition involved. The following simplifying assumptions are made here:
of the states
(a) The system consists of identical machines, each of which consists of two non-identical parts or components. Component spares are permitted. (b) Each machine failure consists of exactly one component failure. Replacement with a spare is instantaneous if one is available, and failed components are always placed in an awaiting-repair pool. (Operative spares and operative parts of failed machines do not risk failure.) (c) All failed components are repairable at a single repair facility. (d) Server availability and assignments affect the rate of transition between states, but not the state definitions. Although it is not theoretically necessary (and not always realistic) to limit the system to two part types per machine, that constraint is used here to keep the state structure readable and the number of states reasonable. The other assumptions clarify the situation being modeled and are common in the literature. Future verification of model conclusions against a more complex simulation model is planned. Within the above assumptions in place, any one state i is uniquely defined by the vector (N 1,) N2,, NB,,L l,, L2,), where (dropping the subscripts): N 1 = the number
of machines
N2 = the number
of machines
requiring requiring
part type 1; part type 2;
Brief Note
121
NB = the number
of machines
requiring
both parts;
L 1 = the number
of type 1 parts awaiting
repair; and
L2 = the number
of type 2 parts awaiting
repair.
Other required information, such as the numbers of operative machines and spares, can be calculated from assumed initial values and the state definition vector. Note that assumption (b) in the preceding paragraph limits the number of possible states in two ways: (1) NB can be non-zero only if cannibalization is done; and (2) N 1, N2 and NB can be non-zero only if previous failures have exhausted all operative spares of the required type(s). If there are two type 1 spares, for example, L 1 must be at least two more than N 1 + NB if N 1 + NB # 0. Since even small numbers of machines and spares result in many possible states, construction of the state list is accomplished by filling in arrays with a FORTRAN subroutine. The procedure employed for this paper begins by saving (N 1 = 0, N2 = 0, NB = 0,L 1 = 0, L2 = 0)as state 1, then increments L2 and checks for whether a new feasible state has been formed. If so, it is saved and L2 incremented again. If the new state is not valid but a higher value of L2 might be, L2 is incremented without saving the state. For example, if one spare is available of part type 2, state (0, 1, 0, 0,O) is not feasible, but state (0, 1, 0, 0, 2) is (incrementation of N2 is described below). The process continues until (Nl, N2, NB, L 1, L2) is reached such that no higher L2 values could be feasible. L 1 is then incremented, L2 is set to zero and the process repeated. NB, N2 and N 1 are each incremented in turn as required, so that the states are constructed in order of (N 1, N2, NB, L 1, L2). This facilitates later state number look-ups when filling in the transition rate matrix. Transition
rate matrix
construction
and solution
Solution of a continuous Markov process, of course, requires the construction and solution of the transition equations, which can be viewed together as a matrix. As with the list of states, the equations are actually constructed by filling in arrays with a FORTRAN subroutine. Let: S = the matrix
of transition
S(ij) = the mean
transition
S(ii) = the negative p = the vector
0 = a vector
rates such that:
of steady
rate from state j to state i, i #j,
of the total transition state probabilities
rate out of state i;
(to be found);
and
of zeros.
The system Sp = 0 now represents the set of transition equations. Solution of Sp = 0 for p by Gaussian elimination will not ordinarily work, unfortunately, since one of the equations is redundant. Since the sum of the elements of p must equal 1, however, any one row of S and the corresponding row of 0 can be replaced by all 1s and the system Sp = 0 (as modified) then solved for p. Once p is known, it can easily be used in conjuction with the state description arrays (N 1, N2, NB, L 1, L2) to calculate the expected number of inoperative machines and other performance measures. Construction of the Sp = 0 system is straightforward in the case of full-matrix storage of S. One can initialize S to all zeros, then examine each state in turn for transitions (details are described in the next section). The mean rate of each transition from state i to state j is then added to the ji element of S and subtracted from the ii element. In problems with more than 100 or so states, however, S rapidly becomes too large to manage. Fortunately, S contains relative few non-zero elements for the maintenance system model considered here, allowing the application of sparse matrix solution methods. The algorithm described by Sauer and Chandy [6, p.421 and Stewart [7] has been found to work quite well, allowing systems of 1000 or so states to be solved in reasonable time periods even on a microcomputer. The details of the implementation of that algorithm are described in Ref. [8].
Brief Note
122
Transition
rates and destinations
In the model, transition from one state to another occurs by either failure, repair or cannibalization. The mean rates for part type I are denoted F(Z), R(Z) and C(Z), respectively, and are all assumed to follow the Poisson distribution (implying that time between failures, repair time and cannibalization time follow the exponential distribution). For any state (N 1, N2, NB, L 1, L2), failure is possible if N 1 + N2 + NB is less than the number of machines in the system (recalling that spares and inoperative machines are assumed to not risk failure). The destination state for a part type 1 failure [at mean transition rate F(l)] is (N 1, N2, NB, L 1 + 1, L2) if an operative spare is available; (N 1 + 1, N2, NB, L 1 + 1, L2) otherwise (part type 2 failures are handled similarly). Two failure rate options have been implemented and are described below. Repair and cannibalization are conducted by M identical servers in a single facility. Each server is assigned to either repair or cannibalize a single part, or do nothing. Multiple servers are never assigned to the same part. The model assigns servers by first checking for whether cannibalization is possible (Nl and N2 both > 0), and, if so, permissible under the policy in effect (alternate policies which have been implemented are described below). For any state (Nl, N2, NB, L 1, L2), cannibalization causes transition to the state (N 1 - 1, N2 - 1, NB + 1, L 1, L2) at a rate of either C(1) or C(2), depending on the policy in effect. If Nl, N2, M and the current policy permit, transition may occur at multiples of C(1) or C(2) (representing multiple cannibalizations), but the destination state is unchanged. Any remaining servers are assigned one at a time to do repairs at a rate R(1) or R(2) according to the repair policies in effect (also described below). Subsequent repair assignments may alternate between part types 1 and 2, but all repairs for a given part type have the same destination state (destination options are given below). In addition to the numbers of machines, servers, spares, and failure, repair and cannibalization rates, the following policies and options may be specified in the model as currently implemented (numbers are assigned to the policies for later reference): (a) Cannibalization policy. Cannibalize only if the number of servers left after cannibalization is: (0) N/A (no cannibalization): (1) 2 total inoperative parts + 1; (2) 2 total inoperative parts; (3) 3 total parts needed for inoperative machines + 1; (4) 2 total parts needed for inoperative machines; or (5) always cannibalize if possible (complete cannibalization). (b) Cannibalization priority when cannibalization is elected: (1) always cannibalize part type 1 [rate = C(l)]; or (2) always cannibalize part type 2. (c) Repair priority. Repair type 1 vs type 2 if: (0) never (always repair type 2 first); (1) net inventory (defined below) for type 1 < 2; (2) net inventory for type 1 < type 2; or (3) always repair type 1 first. (d) Net inventory (for repair priority) calculation method: (1) net = on hand - number needed for inoperative machines; or (2) net = on hand - number needed for inoperative machines + number already in repair. (e) Repair destination code. From (N 1, N2, NB, L 1, L2), repair of a type I (type 2 is done in a similar manner) yields destination: (1) (Nl - 1, N2, NB, Ll - 1, L2); or (2)(Nl,N2+l,NB-l,Ll-l,L2). (f) Failure rate code. Input rates [F(l) and F(2)] are: (1) the total rates per time period when at least one machine is operative; or (2) the rates per operative machine.
Brief Note
123
The simulation experiments reported by Fisher and Brennan [4] indicate that cannibalization decisions should sometimes depend on server availability and the relationship between mean repair and mean cannibalization time. In addition to allowing the model to emulate a variety of different operating conditions, the above options allow cannibalization decisions in the model to be made based on those factors. MODEL
VERIFICATION
AND PERFORMANCE
The model has been subjected to the usual verification process to ensure that it produces states and transition rates consistent with the assumptions, parameters and policies in effect. Because closed-form results are available in the literature for the cases of no cannibalization [9] and complete, instantaneous cannibalization [l], it was possible to test the expected number of inoperative machines produced by the model under those two situations. For both tests, the assumptions of an infinite number of servers and an infinite number of machines are required. Using total failure rates of 9 and 14, repair rates of 10 and 16 (each for part types 1 and 2, respectively), 1 spare of each part type, and appropriate model options, a set of experiments was run with the model to test the expected number of inoperative machines produced. Muckstadt’s [9] procedure [no cannibalization, applicable to cannibalization policy(O)] produced an answer of 0.598, while Sherbrooke’s [l] model [complete, instantaneous cannibalization, applicable to cannibalization policy (5)] yielded 0.545. As the results in Table 1 illustrate, the model converges very closely to the desired numbers of inoperative machines as the number of servers, machines and cannibalization rate are increased (to match the assumptions for the two alternate models). Using sparse matrix methods for solution, the model performs well even on a microcomputer. With both construction and solution programmed in IBM (trademark International Business Machines Corp.) FORTRAN 2.0 running on an IBM PC with an 8087 (trademark Intel Corp.) maths co-processor, the 69-state problems for Table 1 took about 1 s to generate and < 8 s to solve. The 309-state problems took about 20 s to generate and from 2 to 8 min to solve (the faster cannibalization rates produce a matrix structure that converges slowly, causing the longer solution times). Problems as large as 884 states have been solved in as little as 2-3 min, depending on the parameters and options in effect. CANNIBALIZATION
POLICY
TESTS
In order to illustrate the model’s use, experiments were conducted with various failure, repair and cannibalization rates to: (a) confirm the results from Fisher and Brennan [4]; and (b) identify critical failure, repair and cannibalization rates where the desirability of cannibalization changes. To simplify the experiments, only the cannibalization policy among the model options was varied, leaving options (b)-(f) set at 1, 2, 2, 1 and 1, respectively. In addition, part types 1 and 2 were set to have equal failure, repair and cannibalization rates, and all experiments except where noted used a system with 10 machines and 2 spares for each part type. The best cannibalization policies found and the resulting numbers of inoperative machines are reported in Table 2. Note that the term “optimal” is avoided, since the 6 cannibalization policies tested in the model certainly do not encompass all those possible. As indicated in Table 2, policy (5) (complete cannibalization) is indeed best when the total failure rate is low relative to the total repair rate (implying low server utilization). With faster Table I. Verification
tests Expected
Servers I
5 50 50 50 50
Machines 5 5 5 IO 10 IO
Cannib. rate 20 20 20 20 50 500
States 69 69 69 309 309 309
Cannib.
inoperative
policy (0) 3.816 0.604 0.595 0.598 0.598 0.598
machines
Cannib.
policy (5) 4.335 0.600 0.577 0.579 0.567 0.549
124
Brief Note Table
2. Best
machines
cannibalization
by failure
Ratio Total
of mean
and
resultmg
of cannibalization
cannibalization
rate
moperative to repair
rated
to repair
rate
failure
rat@
2
3
4
8
8
0
0
0
4 2.126
I
2.21
2.211
2.21 I
0
0
4
1.362
1.362
I.352
0
0
5
0.795
0.795
0.781
0
5
5
0.443
0.442
0.434
0
5
5
0.229
0.228
0.226
0
5
5
0.102
0.102
0.102
0
5
5
5
0.033
0.033
0.033
0.033
7 6 5 4 3 2
0
5
5
0.005
0.005
0.005
5
5
5
0.001
0.001
0.001
I 0.5
“All experiments
with
Each cell contains above
5 servers; total the number
the resulting
text for additional bRate
policies
rate and ratio
per part
mean
repair
1.274 0.;36 0.419 0.222 0.101
0.005 0.001
rate of IO (2 per server).
of the best cannibalization
number
of inoperative
policy
machines.
See
specifications.
type is half
the indicated
rate.
cannibalization (relative to repair), policy (5) is best even at relatively high failure rates. Note, however, that the best policy does not always jump from complete to no cannibalization. At higher cannibalization rates, policy (4) (cannibalize only when servers left after cannibalization equal or exceed total parts needed for inoperative machines) is best even at very high failure rates. These results are consistent with those of Fisher and Brennan. To test whether the desirability of cannibalization might depend on the ratio of the total failure and repair rates rather than the specific values used, additional experiments were conducted with a total repair rate of 20 instead of 10. With a cannibalization to repair ratio of 2, policy (5) was found to cease to be best above a total failure to repair rate of about 0.075 under both total repair rates. Similarly, with a cannibalization to repair ratio of 3, policy (5) was found to cease to be best above a failure to repair rate of about 0.53 under both total repair rates. Finally, experiments were conducted varying the number of spares and servers (holding the total repair rate constant) to determine whether those factors would affect the desirability of cannibalization. Although not repeated here due to space limitations, those experiments indicated that more servers (even with the same total repair rate) and fewer spares tend to make cannibalization more attractive. Under those circumstances, cannibalization policy (2) out-performs policy (0) even under extremely high failure rates. The interaction of spares and numbers of servers on cannibalization policy performance is explained by the presence of repair. Under high failure rates, cannibalization consumes server resources that are heavily needed for repair, making complete cannibalization undesirable. However, with many servers and/or few spares, the system will sometimes be in a state such that servers would be idle if cannibalization were not done. Relative to having them do nothing, the system is obviously better off by allowing the excess servers to do cannibalization. Policy (2) apparently performs well even under high failure rates by allowing such cannibalization only when there would otherwise be idle servers.
CONCLUSIONS
AND
FUTURE
RESEARCH
NEEDS
A model has been presented that will analytically calculate the expected number of inoperative machines in a maintenance system with spares, repair, cannibalization and resource constraints, such as a system maintaining a fleet of airplanes or vehicles. The model can be used to test the impact of various cannibalization and repair policies as well as the effect of system parameters such results from experiments with the model are as numbers of spares and servers. Preliminary consistent with those of simulation studies in the literature.
Brief Note
125
Although the mode1 has been used to identify the ratios of failure to repair rate where cannibalization becomes less attractive for two ratios of cannibalization to repair rate, experimental results indicate that the ratios are not insensitive to changes in the numbers of servers and spares. Additional experiments exploring the desirability of cannibalization under diverse conditions would allow preparation of tables and rules of thumb to aid practitioners who do not have access to the model. Other possible uses of the mode1 including testing repair and spares allocation policies and their interactions both with each other and the desirability of cannibalization. Finally, it should be noted that the assumptions necessary for this model are fairly restrictive, and more complex situations and policies often require the use of simulation models. Models such as the one presented here are still useful, however, for verifying the performance of such simulation models and for helping focus simulation experiments.
REFERENCES An evaluator for the number of operationally ready aircraft in a multilevel supply system. Ops Res. 1. C. C. Sherbrooke, 19, 618635 (1971). allocation among an assembly and its repairable subassemblies. Naa. Res. Log&/. Q. 19, 2. E. A. Silver, Inventory 261-280 (1972). 3. D. Khalifa, M. Hottenstein and S. Aggarwal, Cannibalization policies for multistate systems. Ops Res. 25, 1032-1039 (1977). 4. W. W. Fisher and J. J. Brennan, The performance of cannibalization policies in a maintenance system with spares, repair and resource constraints. Nav. Res. Log&. Q. 33, 1-15 (1986). 5. D. P. Heyman and M. J. Sobel, Stochastic Models in Opera/ions Research, Vol. I. McGraw-Hill, New York (1982). 6. C. H. Sauer and K. M. Chandy, Compufer Systems Performance Modeling. Prentice-Hall, Englewood Cliffs, N.J. (1981). I. W. J. Stewart, A comparison of numerical techniques in Markov modeling. Communs ACM 21, 144152 (1978). 8. W. W. Fisher, Solution of large, sparse, continuous Markov processes on math co-processor-equipped microcomputers. In Decision Sciences Theory and Applications, Proc. 18th A. Conf. Southwest Region, Decision Sciences Institute, Houston, Texas, pp. 4345 (1987). Some approximations in multi-item, multi-echelon inventory systems for recoverable items. Nav. Res. 9. J. A. Muckstadt, Logist. Q. 25, 377-394 (1978).