Reliability Engineering and System Safety 76 (2002) 149±154
www.elsevier.com/locate/ress
Determination of preventive maintenance periodicities of standby devices Sergio BrandaÄo da Motta a,*, Enrico AntoÃnio Colosimo b a
Departamento de Engenharia de ManutencËaÄo da TransmissaÄo, Centrais EleÂtricas de Minas Gerais, 30.123-970 Belo Horizonte-MG, Brazil b Departamento de EstatõÂstica, Universidade Federal de Minas Gerais, 31.270-901 Belo Horizonte-MG, Brazil Received 11 September 2000; accepted 22 October 2001
Abstract This article presents a statistical approach of analysis and decision that uses reliability techniques to de®ne the best periodicity for preventive maintenance of power system protective relays. Relays are standby devices and may stay in the hidden failure state when they are not working. This state of failure generates dif®culties in the determination of preventive maintenance periodicities. A case study presented in this work deals speci®cally with the reliability of the transmission and distribution system protective relays of CEMIG (the state owned Electrical Power Company of Minas Gerais, Brazil). Preventive maintenance data of protective relays obtained during a 4year period were used in the proposed method. The choice of the periodicities, distinguished by groups of similar relay and voltage operation levels of the protected systems, is made according to the failure risk level that the company is willing to take. The main result obtained by using this method is a substantial reduction of 62% in the amount of preventive maintenance work load for the relays of the distribution system. q 2002 Elsevier Science Ltd. All rights reserved. Keywords: Exponential models; Inspection interval; Protective relays; Reliability model
1. Introduction It is important to reduce failure occurrence of a system during operation when such an event is dangerous and costly. The purpose of this study is to ®nd a way to better determine when system replacement becomes necessary. Age replacement practice is a known maintenance policy for dealing with these kinds of situations [2,3]. Other maintenance policies such as age replacement with minimum repair, block replacements are also considered in the literature [1]. However, there are some equipment types with special features that make the implementation of these policies dif®cult or even impossible. This article deals with two main issues related to the preventive maintenance: (1) which is the best preventive maintenance periodicity of power system protective relays and (2) what is the link between the preventive maintenance periodicity and the failure patterns of the several operating equipment types. Relays are standby devices and usually stay in the hidden failure state when they are not working. This state of failure generates dif®culties in the determination of preventive maintenance periodicity. In general, there is a great deal of subjectivity in the usual way the frequency
of preventive maintenance of these devices is de®ned since it is usually based on the experience and judgment of technicians and maintenance engineers. This problem is directly related to the inability of the several methodological strategies to objectively answer a question that is crucial to the effectiveness of maintenance: when should the preventive tasks be done? It can be assumed that the approach developed for relays may serve as a basis for the development of similar models for other types of equipment. A protection system can be de®ned as a set composed of components endowed with electrical characteristics that are speci®c and compatible to each other, intended to automatically determine the operative condition limits of power systems or parts of those systems. Usually, a protection system is made up of four subsystems: measurement transformers, protection relays, circuit breakers and communication channels. The main parts are the relaysÐthe intelligence units, and the circuit breakersÐthe opening units. Fig. 1 presents a typical general scheme of a protection system. Relays are standby devices, that is, they should only be active in the presence of an operational demand. Basically, these devices have two failure modes:
* Corresponding author. E-mail addresses:
[email protected] (S. BrandaÄo da Motta),
[email protected] (E. AntoÃnio Colosimo). 0951-8320/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S 0951-832 0(01)00134-X
(a) failure to operate, in the presence of an operational demand, also known as operational failure; (b) unnecessary operation, in the absence of operational demand, also known as safety failure.
150
S. BrandaÄo da Motta, E. AntoÃnio Colosimo / Reliability Engineering and System Safety 76 (2002) 149±154
Fig. 1. General scheme of a protection system [8].
A relay usually stays in the hidden failure state either until being discovered through a preventive inspection or through the occurrence of an operational failure. The latter case is also called a multiple failure, that is, a failure of the protected function with the protection device in a hidden failure state. Usually the reliability studies of protection systems only compute the operational failures, thus overestimating the reliability of these systems. The hidden failures must be included in reliability studies because the majority of failures that occur in protective devices tend to stay in a hidden condition for a long period, if these devices are not periodically inspected. In general, the anomalies found through these periodic inspections are ®xed and the device once again becomes, in reliability terms, `as good as new'. This assumption can be supported by the fact that the periodical relay inspections are complete. It means that all operational features are veri®ed and adjusted if necessary. In this study, the operational failures of the relays were identi®ed as well as their hidden failures, so that the inspection interval could be properly determined. Safety failures (improper operation) were not included, because they are usually caused by external random factors and they are not registered by the computerized maintenance management systems. It is clear that an incorrect determination of inspection intervals may restrict the protection availability and thus, put the electric system at risk of operational failure. The reliability of the protection systems plays a crucial role in the reliability of power systems, especially nowadays when the transmission lines are more and more overloaded and working almost at their capacity limit. Several blackouts that have occurred in the linked electrical system of the Southern, Southeastern and Centralwestern areas of Brazil, in the past 15 years, may con®rm that this system is currently overloaded. This paper proposes a statistical approach to analysis and decision that uses reliability techniques to de®ne the best periodicity for preventive maintenance of power system protective relays. The choice of the periodicities, distinguished by group of similar relays and voltage operation levels of the protected systems, is made according to the failure risk level that the organization is willing to accept. An outline of this paper is as follows. Section 2 presents the probability expression of a multiple failure. A brief description of the case study is presented in Section 3 following
which is a reliability analysis in Section 4. The model is applied in Section 5 and the results obtained are presented in Section 6. 2. Preventive maintenance approach for relays The model is based on the multiple failure probability, that is, the probability that the protected equipment will fail between the relay hidden failure and the next relay inspection. In order to do this, it is necessary to de®ne the following events: A1: relay is in hidden failure; A2: protected equipment fails in the period that the relay is in hidden failure. These two events are independent, and therefore: PMF
A1 > A2 P
A1 P
A2 :
1
That is, protected equipment is damaged only if a relay is in hidden failure and there is an operational request. Assuming an exponential distribution for times to failure of relays (TR) and protected equipment (TPE), it is possible to derive the multiple failure probability at time TI. That is TR , exp(l R), which means: P
A1 PTR , TI 1 2 e2lR TI and TPE , exp(l PE), which means: P
A2 P
TI . TPE . TR : The latter probability is given by: ZTI ZTI P
TPE . TR lR lPE e2lR tR e2lPE tPE dtPE dtR 0
tR
h
i 1 lR 1 lPE e2TI
lR 1lPE 2
lR 1 lPE e2lPE TI : lR 1 lPE
By plugging in these two probabilities in Eq. (1), multiple failure probability is obtained at time TI. Estimates of l PE were obtained from historical information available in the company. The exponential distribution assumption can be justi®ed in theoretical terms by a renewal process [6]. This type of process takes place when a number of individual processes combine to form an overall process, which is called a superimposed process. The overall process tends to be a
S. BrandaÄo da Motta, E. AntoÃnio Colosimo / Reliability Engineering and System Safety 76 (2002) 149±154 Table 1 Amount of relays by groups (source: database of RME SystemÐCEMIG, 1998) Groups RAUX
RDIF
RDIR
RDIS
RELE
RSOC
RTPO
380
958
1046
598
1872
5647
1197
homogeneous Poisson process (HPP) even if the individual variables are not necessarily independent and identically distributed as an exponential. 3. Data set description The study was developed using the set of relays of CEMIGÐElectrical Power Company of Minas Gerais, Brazil. The company's protection relays are spread throughout the state of Minas Gerais and there were 11,698 records of them. The relays have different characteristics such as age, technology and species. In terms of technology the relays are classi®ed into electrical mechanic, electronic analogical and electronic digital. Approximately 77% of the company's relays are electrical mechanic, 22% are electronic analogical and 1% digital. By stratifying the relays into age range, it was veri®ed that there were about 78% in the range of 0±20 years, 13% in the range of 21±30 years and 9% of them did not have any record regarding age. Protection relays are classi®ed in seven groups: RDIF (of differentiation), RDIR (directional), RDIS (of distance), RSOC (of over current), RTPO (of time), RAUX (auxiliary) and RELE (generic, when it does not ®t in any of the previous groups). Table 1 shows the respective amounts of relays classi®ed by groups. A sample of 539 relays that had had preventive maintenance accomplished in 4 consecutive years was extracted from the total of 11,698 relays controlled by the system named RME (results of measurements and essays) on which the maintenance results are registered. The preventive maintenance of a protection relay is basically constituted by a set of essays that veri®es if the several adjustment values are within the tolerance range that had been speci®ed by the manufacturer. A relay is either readjusted or repaired, depending on whether some discrepancy or some other anomaly is found during the maintenance process. Usually, there is no element replacement when the relay is repaired and the maintenance duration can be considered negligible when compared with the relay failure time. The quantity of essays performed during each preventive maintenance step depends on the relay model. If the relay is good, the inspection has no negative effects on it. For each preventive maintenance step for a given relay, the device was considered to be in a failure status if one or more essays presented adjustment values outside of the
151
manufacturer's tolerance range. Besides this, if that essay was related to a critical function, the relay would probably not accomplish its requested function in the case of some disturbance or some failure of the protected system. Therefore, this relay would be considered to be in hidden failure state. Initially, descriptive statistics were used to adequately describe the behavior of the relay sample data in terms of failure occurrences in the period under study. Box plots presented in Fig. 2 that shows the relationship between relay ages and technology indicated that the electrical mechanic relays presented a mean age (5290 days) bigger than the electronic ones (2130 days). A t-test con®rmed this statement presenting a p value less than 0.0001. The main indications of this stage of analysis were: (a) the newer relays performed better than the older ones; (b) the electronic relays performed better than the electrical mechanic ones; (c) there existed at least three relay species groups differentiated in terms of failure rates: the ®rst group (Ga), with high failure proportion, was constituted by the species RDIR and RDIS; the second group (Gb), with medium failure proportion, was formed by relays belonging to the RELE species; and the third (Gc), with low failure proportion, was constituted by the species RSOC, RTPO and RDIF. The association of age and technology in failure proportions is very high as indicated by the previous descriptive analysis. Therefore, it is just necessary to use one of them in the reliability analysis. It has been decided that species and technology are the factors to be considered in the next analysis. 4. Reliability analysis Kaplan±Meier's [4] estimator was used for the reliability function of time to failure variable. For the case under study, most of the failures were hidden and were identi®ed through annual inspections. Thus the number of intervals for the estimator construction was practically restricted to the three preventive maintenance steps accomplished in each
Fig. 2. Boxplot for relays age and technology.
152
S. BrandaÄo da Motta, E. AntoÃnio Colosimo / Reliability Engineering and System Safety 76 (2002) 149±154
relay in the period under study, since very few operational failures occurred in that period. It was assumed that each relay was restored to the as good as new condition after each preventive maintenance. In this way, a new data set was generated, originating from the ®rst one, where each failed relay was considered as a new one, since it was repaired and contributed more than one observation. The size of the new sample was 878 observations corresponding to 475 failures registered in the space of 3 years and 403 censored observations happened in the ®nal period. Censored observations are partial information of the response variable that is related to the relays, which have not presented failure at the end of the observation period. An important additional assumption in this analysis was that failures, which had occurred in the same relay, were considered independent of one another. Since the exact times of failure were unknown, the failures were taken as if they had occurred in the middle of their respective intervals. Time until failure analysis was based according to the statistical analysis presented in Section 3. So, the reliability analysis initially focused on the three relay groups, which were de®ned in the former analysis. Kaplan±Meier curves were built by using the sample of 878 observations. These curves were built for all possible combinations of relay characteristics and the log-rank test [5] was used to identify the different classes. An example of Kaplan±Meier is presented in Fig. 3, which depicts the three relay curves for groups: Ga (RDIR and RDIS); Gb (RELE) and Gc (RSOC, RTPO and RDIF). This graph seems to con®rm that groups Ga, Gb and Gc, respectively, have low, average and high reliability, as it had already been indicated in Section 3. Results of the former analyses indicated that the reliability characteristics of the protection relays could be in¯uenced by technology and species groupings. Therefore, each one of the three species groups was also strati®ed by technology and then an analysis of the reliability function could be accomplished with these two factors being simultaneously considered. Therefore, six relay groups, which could possess different performances in terms of the respective reliability functions, were de®ned.
According to the graphic indications it could be said that: (a) the electronic relays of the species RDIR and RDIS (Ga) seemed to perform better than the electrical mechanical relays of the same species; (b) the electronic relays of the species RELE (Gb) seemed to perform better than the electrical mechanical relays of the same species; (c) the relays of the species RSOC, RTPO and RDIF (Gc) seemed to present similar performance for both technologies. The indications Kaplan±Meier curves were con®rmed by log-rank tests results, as depicted in Table 2. The results con®rmed that the technology affects the reliability functions of the proposed groupings, except group Gc which was composed by the species RSOC, RTPO and RDIF. Thereby, both similar species grouping and technology factors were considered to achieve the improvement of the protection relay inspection intervals. The ®nal groupings of relays concerning the periodicity of inspections were then formed in the following way: G1Ðelectrical mechanical relays of RDIR and RDIS species; G2Ðelectronic relays of RDIR and RDIS species; G3Ðelectrical mechanical relays of RELE species; G4Ðelectronic relays of RELE species; G5Ðelectrical mechanical and electronic relays of RSOC, RTPO and RDIF species. Estimates of l R were estimated for each group assuming an exponential distribution [7]. Estimates of l PE were obtained from historical information available in the company, as depicted in Table 3. The source of those data are statistical reports on lines and transformers issued by the GCOI that stands for Coordinator Group of the Brazilian Linked Power System. 5. Model implementation An estimate of the multiple failure probability related to Table 2 Log-rank tests for relays divided into the following groups: GT1 electrical mechanic, GT2 electronic analogical, G1T1 electrical mechanic directional and distance, G1T2 electronic analogical directional and distance, G2T1 electrical mechanic generic, G2T2 electronic analogical generic, G3T1 electrical mechanic over current, time and differentiation, G3T2 electronic analogical over current, time and differentiation (source: sample of 878 observations originated from the 539 relays sample)
Fig. 3. Kaplan±Meier curves for the relay species.
Comparisons
Species
Log-rank (p value)
GT1 e GT2 G1T1 e G1T2 G2T1 e G2T2 G3T1 e G3T2
General RDIR, RDIS RELE RSOC, RTPO, RDIF
0.0004 0.0000 0.0033 0.2306
S. BrandaÄo da Motta, E. AntoÃnio Colosimo / Reliability Engineering and System Safety 76 (2002) 149±154
153
Table 3 Protected equipment failure rates (source: statistical reports of GCOI (1998)) Equipment
Operation voltage (kV)
Failure rate
Subtransmission line
69 138 230
0.710 0.170 0.070
Transformer
69 138 230
0.061 0.029 0.018
each inspection periodicity is obtained at the end of the analysis. Therefore, the choice of the best interval can be made in an objective way so that both technical and economical factors can be related to the risk of operational failure, which the organization is willing to accept. Inspection intervals ranging from 0.5 to 6 years in increments of 0.5 years were considered. In other words, we tried this set of inspection intervals and chose the value that minimized the multiple failure function. Figs. 4 and 5 present the multiple failure probability estimates based on expression (1) for 138 kV lines and 138 kV, transformers, respectively. Intervals of inspections for each relay grouping were recommended, according to the voltage levelsÐ69, 138 and 230 kVÐof the protected systems; that is, for transformers and subtransmission lines. The different voltage level criterion was chosen because protected systems of higher voltage are more critical than the lower voltage ones. The inspection intervals de®ned by the company were those whose related multiple failure risks were not superior to 1% for transformer protection relays, and not superior to 5% for line protection relays. The following, with regard to the ®nal inspection intervals that were adopted, was observed: (a) the inspection intervals of some protection line relays remained the same, mainly in the 69 kV lines; and (b) an important tendency to increase the periodicity was veri®ed for most relays, especially transformer protection relays, because of the transformers low failure rates.
Fig. 4. Multiple failure probability estimates of relays and 138 kV lines.
Fig. 5. Multiple failure probability estimates of relays and 138 kV transformers.
6. Main results After this study, the work load required for preventive maintenance of relaysÐwhich used to be about 63,000 man-hours over a 5-year periodÐturned out to be only 24,000 man-hours over the same 5-year period. This means a reduction of 62% in the amount of the preventive maintenance workload that is necessary to maintain the required reliability of the protective relays of the company's distribution system. In addition, and probably more important, the company is now in control of the amount of the operational failure risk for its equipment. 7. Conclusion The company will achieve signi®cant savings by adopting the inspection intervals, which were de®ned by taking into account the reliability characteristics of both the protection devices and the protected systems, since the prior inspection intervals of most relays were too short. In fact, an excess of preventive maintenance may cause either wastage of resources or an increase in the probability of failure. The developed model is innovative and, in addition, possesses a very important characteristic: ¯exibility. It is innovative because it uses the hidden failure data for quantifying the reliability of power system protective devices. Usually only the operational failures are considered in most of the studies that deals with power system protection reliability. Relay hidden failure data are usually not registered by the computer maintenance management systems at many power electrical companies. The model is ¯exible because it allows the company to decide which risk level the company is willing to accept for the unprotected operation of the electric system, since criteria such as technical, economical, operational and safety issues can be considered in the choice of the best maintenance interval. The existent models usually search for a single inspection interval that maximizes one parameter previously chosen like the availability or a cost function. The completed study provides a foundation for the
154
S. BrandaÄo da Motta, E. AntoÃnio Colosimo / Reliability Engineering and System Safety 76 (2002) 149±154
development of other similar analysis models so that it will be possible to de®ne the best intervals for other preventive maintenance tasksÐsuch as periodic repairs and component changesÐfor other types of power system equipment including transformers and circuit breakers.
Acknowledgements The authors wish to thank an editor, two referees for their valuable comments and constructive suggestions on an earlier version of the paper. The work of Enrico Colosimo is partially supported by the Brazilian agencies Conselho Nacional de Pesquisa, CNPq, and FundacËaÄo de Amparo aÁ Pesquisa do Estado de Minas Gerais, FAPEMIG.
References [1] Archer H, Feingold H. Repairable systems reliability. New York: Marcel Dekker, 1984. [2] Barlow RE, Hunter LC. Optimum preventive maintenance policies. Oper Res 1960;8:90±100. [3] Barlow RE, Proschan F. Mathematical theory of reliability. New York: Wiley, 1965. [4] Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;60:457±87. [5] Lawless JF. Statistical models and methods for lifetime data. New York: Wiley, 1982. [6] Meeuwsen JJ, Kling WL, Ploem WAGA. The in¯uence of protection system failures and preventive maintenance on protection systems in distribution systems. IEEE Trans Power Delivery 1997;12(1):125± 31. [7] Nelson W. Accelerated testing: statistical models test, plan and data analysis. New York: Wiley, 1990. [8] O'Connor PDT. Practical reliability engineering. 3rd ed (rev). London: Wiley, 1995. 431 p.