Computers in Industry 21 (1993) 93-99 Elsevier
93
Short Note
Power plant boiler feed system reliability: A case study D. Sculli a n d S.K. C h o y Department of Industrial and Manufacturing Engineering, Unit'ersity of Hong Kong Received January 2, 1992 Accepted March 6, 1992
This paper briefly considers the historical reliability of p u m p sets that feed water to the boilers of large power plants. Two pump sets out of three are required in order to achieve full output power load. The study excludes major overhauls and confines itself to unpredictable defects in p u m p sets that can cause disruptions. Failure and repair rates are estimated form historical data, and the system is modeled as a Markov process in order to compute its reliability. The availability of sufficient and reliable data appears to be a major obstacle in solving problems of this nature. The use of computers is required to solve the resulting set of first-order linear differential equations.
Keywords: Reliability; Failures; Markov process; Feed pumps; Power plants; Stand-by
1. Introduction The boiler feed system of large power plants, 200 M W and above, usually incorporates three p u m p sets in parallel. Only two p u m p sets are required to satisfy full load conditions, the additional p u m p is redundant and, in effect, gives the system a 50% stand-by capacity. The stand-by p u m p normally cuts in automatically when any one of the pumps that are running fails. Each p u m p set (see Fig. 1) for the power plant under study consists of five major components: Correspondence to: Dr. D. Sculli, Department of Industrial and Manufacturing Engineering, University of Hong Kong, Pokfulam Road, Hong Kong. Fax: (852) 8586535.
the suction strainer, low-speed booster pump, driving motor, speed-increasing gear turbo-coupling, and the main pump. Feed water passes through the suction strainer before it enters the booster pump. The discharge of the booster pump is connected to the suction side of the main pump which in turn pumps the feed water into the boiler. The booster p u m p is needed to provide a net positive suction head to the main pump to suppress cavitation. A failure does not necessarily mean that the p u m p fails to deliver feed water against a certain head, but simply the occurrence of any defect in any pump set that requires the shutting down of that particular p u m p set for repair. Examples of failures can include a failed thrust bearing which requires immediate shut down for repair, and a pin-hole in a pipe-line which may not cause immediate trouble to the p u m p set, but still requires shut down for repair. It is assumed that failures are random and are, thus, described by exponential life distributions. This is justified by the fact the pump sets have been designed for long life, of the order of three or more years. Complete failure of a major internal or rotating part of the p u m p set is very rare. Conditions leading to such wear-type failures can often be observed well before actual failure, and repair work is often planned in advance. These major overhauls are excluded from this study, since work is often performed during low-demand
0166-3615/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved
94
Short Note
Computer in lndustD'
periods, and will not cause significant disruption. The study confines itself to the unpredictable defects that can cause disruptions. Such defects or failures are often minor, requiring only a few hours for repair, and are adequately described by exponential life distributions. While this a priori assumption of negative exponential distributions may seem rather severe, the limited data available makes it difficult to properly explore alternative distributions. However, visual plots of the data did indicate that the random failure or negative exponential distribution could not be rejected. It is assumed that the repair work is arranged as soon as possible, so that there will only be one defect to be cleared in one shut down, which is in line with operational practice. It is also assumed that defects are independent; there are no operational reasons to suspect otherwise. To determine the reliability of the p u m p sets (and other similar systems) the system hehaviour is usually formulated in terms of a Markov process [1] and the resulting set of simultaneous linear first-order differential equations are then solved. The literature is, in the main, concerned with the solution of the set of simultaneous differential equations when the matrix of constants has some recognizable pattern. References often cited in the literature include Billinton and Bollinger's [2] solution to a two-state fluctuating environment system, Dharmadhikari and G u p t a ' s [3] analysis of a one-out-of-two sys-
D. Sculli is a senior lecturer in the Department of Industrial and Manufacturing Systems at the University of Hong Kong. He was previously employed by Australia Consolidated Industries as an operations research/ management science analyst. His current research interests include the development of management skills through the use of business games, and the application of stochastic and simulation techniques to operations planning problems. S.K. Choy Graduated from the Mechanical and Marine Engineering Department of the H0ng Kong Polytechnic in 1981. He later completed the MSc programme in Industrial Engineering at the University of Hong Kong, He has been employed in several electric power generating plants as a maintenance engineer for many years.
tern, and Ensign-Johnson's [4] solutions to several generalized models. The literature also includes Brown's [5] solution to systems with independent exponential c o m p o n e n t s , and M u r a r i and Marathachalam's [6] solution to an interlinked two-unit system. The p u m p set problem appears to differ from other problems presented in the literature because in this system a p u m p can fail while on stand-by, with the rate of failure while on stand-by being lower than when it is running. However, this in itself is not significant, and the main value of this p a p e r lies in the experience gained in using mathematics to model a case problem. The literature is useful to the extent that it can help with the general formulation of the Markov process. However, the set of first-order linear simultaneous equations can be solved using many mathematical techniques, and the general mathematical literature on simultaneous linear differential equations will, in this respect, prove more useful than the literature on reliability.
2. Failure data and reliability parameters
All the defects that occurred within a two-year period were recorded, and they form the basis of the analysis (see Table 1). The repair times (simply called "down time" in the table) are estimated from experience, and are based on the principle of "least possible amount of man power to achieve the shortest repair time". The shut down for repair time includes times for isolation, the actual repair, and de-isolation. Failure data were collected over the whole two-year period, with failures being able to occur during running or while on stand-by. It is assumed that failures do not occur when the p u m p is under isolation, de-isolation, or under repair. The total of running hours for the two-year period was 16,560 h. The actual running time of each p u m p set, t~, was found from the plant running-hours records, and the down times, t 3, are calculated from Table 1. There were no scheduled overhauls nor any major p u m p failures within this period. Forced outages due to boiler tube leaks turbine faults, generator electrical problems, etc., did occur. However, the outage times were so short that the p u m p s were maintained on stand-by. The stand-by time of each
Computers in Industry
D. Sculli, S.K. Choy / Boiler teed system reliabiltty
the pumps were either running or on stand-by. However, when the nature of these 89 defects was examined closely, it was found that the effects were caused by the pressure developed during running, the heat carried by the flow, the flow itself, the vibration set up by the motor, and also the water h a m m e r effect caused by the feed regulating valve. Thus, among these 89 defects, a large proportion should have occurred when the
p u m p set, t2, can thus be calculated directly, and equals the total time (16,560 h) less running time, less down time (see Table 2). The repair rate of each pump set can be simply calculated by dividing the total number of failures, 'n', of that pump set by the corresponding mean, t 3. It can be seen (from Table l) that 21 defects out of the 110 occurred only when the pumps are running; the other 89 defects could occur when
i
II
t ti
II
Voith Coupling
Booster L J Driving PumpllMotor
Suction [ StrainerI
II
l!
II
,UM~MOrO"
BOO E
~-,
~
(
:°; ,
.
, - - ~
L,_..~_~
'.'
w
iI
I I
il
7
!i) 11 li '
I,
')
/
/
,r,; i '
13 r .....
i ~
',',
! i
I
'
~
1
'7" ~ I '
I 1
i r~.-L
i
i
+'~, ~ +
~
~
/
'
-'
I
"
Iit ,"t't'%J~T~
11--_
l
I
r i I ' I
~ING
i I I I
/
I
OIL ~ 1 ~
t I~
iI
I
"
"Fr t-r'l I
;.> I
t
,o,L~. *rTtM,~.*ro. iv,;~+ I
/ ,
/
rt
ou., o,scM*.~ ~
=a~. pu=p, I--E (Voit~ coupling) -,G. "Me~SUAI IT~Gi "U.' GE'"tO~UeeO-COU,~,.G 1 L--]
I
+ --, uII
I
"n
i / "~
~.,,.rESCO..~CT,S~,~,~ r
93
II
II
I I i
I
r
V
L V l m l ~ T I ~ OIL FqL~Em~
\ -
LUIA i ~ ? l ~
Fig. 1. Diagram representation of the three-boiler pump system.
orL C~LEm
4;,;-.
,causr,c
~000
96
Short Note
Computer in Industry
Table 1 Summary of failure data Component
Suction strainers
Booster pumps
Defect description
Mean down time (h)
Pump Set 1
Pump Set 2
Strainer m e s h choked up a Drain valve passing Air release valve passing Sub-totals
Oil leak through bearing oil wiper ~ Cooling water flow sight glass flapper damage Sub-totals
Turbo couplings
Shaft seal oil leak a Air breather dirty Control valve adjustment ~ Working oil and lubrication oil cooler dirty Sub-totals
Pump Set 3
3 2
Inlet pipe thermal pocket water leak Outlet pipe flange leak and air release valve passing Casing drain valve passing and pipe joint leak Casing vent valve passing and pipe joint leak Gland leak ~ Sub-totals
Driving motors
Main pumps
No. of occurrences
1 1
3
1
8
3
I
5
7
2 2 3 7
I
t I
1
2
i
I 3 4
Bearing (thrust and journal) failures a Mechanical seal a Casing drain valve passing R e h e a t spray connection joint leak Balance water pipe leak Mechanical seal cooler, magnetic seperator and pipe joints Lubricating oil pipe and accessories Sub-totals
8 10 4 5 5
t
i 4
3 6
2 1 2
? q
2
)
13 21
20
4 14
a Occurs only when p u m p s running.
pumps were running, and a smaller proportion when the pumps were on stand-by. If we assume that half of the 89 defects have occurred during running, we obtain a minimum
ratio of defects occurring during running to de1 fects occurring during stand-by of (21 + ~89)/(½ • 89) = 1.47. We can, therefore, expect the ratio to be above this figure. Further qualitative
Table 2 Summary of operating characteristics P u m p sets
R u n n i n g time, t I (h) Stand-by time, t 2 (h) Down time, t 3 (h) Ratio t z / t 1 Total no. of failures (n) Estimated no. of failures while running ( m ) # = n / t 3 (h ~) A. . . . ing = m / t l ( h - l ) Astandby = ½Arunning
1
2
3
11,441 4,935 184 0.4313 37
11,955 4,408 197 0.3687 41
8,642 7.767 151 0.8986 32
30.436 0.2011 2.660× 10 3 1.330× 10 -3
34.618 0.2082 2.896x 1 0 3 1.448X 10 3
22.078 0.2120 2.555 × 10 3 1.278× 10 3
Computers in Industry
analysis and consultation with the operations personnel suggested that this ratio should be adjusted to 2. In other words, the failure rate during the running period was taken to be equal to twice the failure rate during the stand-by period. It follows that the estimated number of failures during running is equal to 2ntj/(2tj + t : ) , see Table 2. The estimated failure rates and the repair rates can now be computed directly, and are also shown in Table 2. The three pump sets are mechanically identical, and variations in the estimated failure rates for each pump can only be ascribed to random causes; the highest rate (pump set 1) being 13% higher than the lower rate (pump set 3). The data for the three pump sets can thus be pooled, and a common average failure rate computed. This rate, A, is 0.002704 failures per hour while running, and half of this figure while on stand-by. The average repair rate, tz, is 0.2071 faults per hour.
D. Sculli, S.K. Choy / Boiler feed system reliabifity
assumptions are required. First is the assumption that if a pump is running properly, it will not be switched off, and that it will be available until a defect occurs. Second is the the assumption that if any pump fails, the pump on stand-by will cut in automatically. Third is the assumption that once a pump has been repaired it will be immediately available. Using the concepts of pure b i r t h / d e a t h models [9], and by letting ~ ( t ) be the probability that the system is in state i a time t, a set of simultaneous linear first-order differential equations is obtained. A more general model can be derived when the failure rates, Aj, A 2 and A3, and the repair rates, #,1, /~2 and P-3 of the three pumps are different. The relations are as follows:
P,(t+dt)=Pl(t)[1-(½A,+A
2+A3) dt]
+ P4(t)/z, dt,
151(t)=IZlPa(t)-(½A+A2+A3)P,(t); P2(t+dt)=P2(t)[l-(a,+½A
3. System reliability model The reliability of the system under study is in fact the probability that at least two pump sets out of the three sets are running for an entire time interval t, given that all the three pump sets are as good as new at the start of the time interval. Since the objective is to have 100% feed flow to the boiler, the system is considered as having failed if this 100% flow is not achieved. For the purpose of assessing the reliability of the system, there are only nine possible states, i, i = 1, 2, 3 . . . . . 9, see Fig. 2. The first three states are those in which any two pumps are running and the other pump, i, is on stand-by. The next three states are those in which any two pumps are running and the other pump, i - 3, is down. The final three states are those in which any two pumps are down and the ' other pump, i - 6 , is supplying half the required flow. Note that the state of the three pumps being down is never reached since the system is considered as having failed when only two pumps are down. Note that the first six states are non-absorbing, while the last three are absorbing. The absorbing states in this system are the states in which the system is considered as having failed [7,8]. In order to complete the Markov reliability model, the following additional, and perhaps obvious,
97
(1)
2+A3) dt]
+ P s(t)/z 2 dt,
P2(t) =/x2Ps(t) - (A~ + ½A2+X3)P2(t); P3(t + d t ) = P~(t)[1- (a~ + a2 + ½a3) dt]
(2)
+/-t 3P6(t ) , /53(t ) =/z3P6(t ) - (A, + A 2 +
½,~3)P3(t);
(3)
P4( t + dt ) = P4( t ) [ 1 - ( A2 + A3 + /x2) dt] + e2(t)Al dt + ½P,(t)A, d t + P 3 ( t ) a I dt,
P4(t) =,~[½P~(/) +P:(t) + P3(t)] - P4(t)(A2 + A3 + ~ , ) ; Ps(t+dt)=P~(t)[1-(A,+A3+P~2)
(4) dt]
+ A2Pl(t ) d t +A2P3(t ) d t + ½A2Pe( t ) dt,
Ps(t) = a2[P~(t) + ½P2(t) + P3(t)] - P ~ ( t ) ( ~ , + , ~ + ~,~);
P~(t+dt) = P 6 ( t ) [ 1 - ( A , + A 2 + / z 3 )
(5) dt]
+ A3PI(t ) d t + z~3P2(t ) d t + ~a3P3(t ) d t , /56(t)
=a3[P~it) + ee(t)
+ ½P3(t)]
- P 6 ( t ) ( a ~ + A2 + ~ 3 ) .
(6)
98
Short Note
Computer in Industry
t, given that all pumps were as good as new at time zero: This is given by:
R(t) =s0(t)
7~3
(15)
+s,(t).
This definition of R(t) also sets the initial conditions as: So(0)=l 7x2
Fig. 2. Markov reliability process of the 2-out-of-3×50% boiler feed pump system.
and
S,(0)=0.
(16)
There are many ways of solving systems of first-order linear differential equations, and many programmed computer subroutines are available for this purpose. When the system of equations is large, it may prove difficult to obtain exact analytical solutions, and it may be necessary to use approximate numerical methods [10] (1981). The following solution is obtained for the case problem when A = 0.002704 a n d / z = 0.2071. So( t ) = 0.9698853 e - 0.~1667, + 0.0301147 e -°2191°15t, SI(t) = 0.0308764(e -o.0o01667t __ e
-0.2191015t)
,
giving The set of simultaneous equations can be further simplified because each pump has the same constant failure rate A and repair rate tz, giving:
t~i(t)=~Pi+3(t)-~APi(t )
R( t ) = s o ( t ) + s,( t ) = 1.0007617 e -t/5998"8°1 - 0.0007617 e -°'2191°15t.
/=1,2,3,
(17)
(7-9)
/54(t ) = 1 A e l ( t ) + AP2(t ) + AP3(t ) - (/.~ + 2A)P4(t ) ,
(10)
/55(t ) = A e , ( t ) + ½AP2(t ) + AP3(t )
-(Ix + 2x)es(t), ti6(t) =
(11)
A P , ( t ) + APz(t ) + ½AP3(t ) - (/~ + 2A)P6(t ) .
(12)
The probability that all pumps are available at time t, So(t), is simply the sum of the probabilities that the system is in states 1 to 3, So(t)= Pl(t) + P2(t) + P3(t). The probability that any one pump is down at time t, St(t) is the sum of probabilities that the system is in states 4 to 6, Sl(t) = P4(t) + Ps(t) + P6(t). This gives: S0(t) = - {AS0(t ) + / . , S t ( t ) ,
(13)
S l ( t ) = {AS0(t ) - (/., + 2A)S~(t),
(14)
The system reliability, R(t), is the probability that the system has not failed at all up to a time
4. Results and concluding comments The results for R(t) given in eqn. (17) are closely approximated by the function e x p ( - t / 6000). This gives a mean time for the system to have at least two pumps down at the same time, the mean time between system failures, of 6000 hours. This figure is within the bounds of normal operating experience in as much as operational staff were not surprised by it. The system can be compared with a hypothetical system, consisting of only two feed pumps is parallel, each coping with 50% of the boiler capacity. Here the system is considered as having failed when any pump fails, and its mean time between failure will be 180 hours. Thus by fitting an additional pump set, the system mean failure time will be increased from 180 hours to about 6000 hours. Our experience suggests that the success of a study of this nature will depend very much on the quality and availability of failure data, and on the
Computers in Industry acceptability of the various a s s u m p t i o n s that will n e e d to be made. It will probably be necessary to involve staff with o p e r a t i o n a l experience to confirm that the values of p a r a m e t e r s estimated from the historical data are in line with o p e r a t i o n a l expectations. It may also be necessary to involve them with the subjective e s t i m a t i o n of the p a r a m eters themselves w h e n failure data are scarce or do not exist. It was r a t h e r f o r t u n a t e that a reas o n a b l e a m o u n t of failure data was available for the three p u m p sets used in this study, a n d that the p u m p sets were identical, e n a b l i n g the data to be polled. However, it was still necessary to rely on subjective o p e r a t i o n a l experience to decide what p r o p o r t i o n of the failures can occur while the p u m p sets were on stand-by. T h e theory leading to the Markov f o r m u l a t i o n a n d the solution of the linear first-order s i m u l t a n e o u s differential e q u a t i o n s a p p e a r to be well d o c u m e n t e d a n d can be readily applied.
References [1] G. Apostolakis, S. Garribba and G, Volta (eds.), Synthesis and Analysis Methods for Safety and Reliability Studies, Plenum, New York, 1980.
D. Sculli, S.K. Choy / Boiler feed system reliability
99
[2] R. Billinton and K.E. Bollinger, "Transmission system reliability evaluation using Markow processes", IEEE Trans PowerAppar. Syst., Vol. PAS-87, 1968, pp. 538-547. [3] A. Dharmadhikari and Y. Gupta, "Analysis of l-out-of-2: G warm-standby repairable system", IEEE Trans. Reliab., Vol. R-34, 1985, pp. 550-553. [4] L. Ensign-Johnson, "Dynamic and steady-state solutions for a general availability model", IEEE Trans. Reliab., Vol. R-34, 1985, pp. 539-544. [5] M. Brown, "On the reliability of repairable systems", Oper. Res., Vol. 32, 1984, pp. 607-615. [6] K. Murari and C. Marathachalam, "The reliability of a two-unit system with two different interlinkings in two different periods", J. Oper. Res., 1984, pp. 835-845. [7] H. Wenzelburger, "How to use the minimal cuts of fault tree analysis in Markov modelling", in: G. Apostolakis, S. Garribba and G. Volta (eds.), Synthesis and Analysis Methods for Safety and Reliability Studies, Plenum, New York, 1980. [8] A. Blin, A. Carnino, J.P. Georgin and J.P. Signoret, "Use of Markov processes for reliability problems", in: G. Apostolakis, S. Garribba and G. Volta (eds.), Synthesis and Analysis Methods for Safety and Reliability Studies, Plenum, New York, 1980. [9] H.A. Taha, Operation Research--An Introduction, Collier MacMillan International Editions, 1971. [10] E.J. Henley and H. Kumamoto, Reliability Engineering and Risk Assessment, Prentice-Hall, Englewood.