Electric Power Systems Research 81 (2011) 347–356
Contents lists available at ScienceDirect
Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr
Parallel Monte Carlo simulation for reliability and cost evaluation of equipment and systems Haifeng Ge a , Sohrab Asgarpoor b,∗ a b
ABB Inc., 940 Main Campus Drive, Suite 300, Raleigh, NC 27606, USA Department of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
a r t i c l e
i n f o
Article history: Received 6 July 2010 Received in revised form 23 September 2010 Accepted 27 September 2010 Available online 23 October 2010 Keywords: Sequential Monte Carlo simulation Parallel computing Maintenance Substation reliability
a b s t r a c t An algorithm for reliability simulation of equipment and systems using a parallel computing environment is developed. A sequential Monte Carlo simulation-based method is applied for generating reliability and maintainability history charts for equipment, from which the reliability indices, as well as their probability distributions, are calculated. A parallel methodology is developed to discretize the simulation into periods, and the simulation of individual periods is dispatched to different processors for efficient computing. Case studies utilizing the proposed method for reliability evaluation of equipment and simple systems are presented. Simulation results are compared with the analytical approach results for the same cases. The study provides the basis for using a parallel computing environment in power system reliability and cost evaluations. The environment used in this paper is Rock-131 from the San Diego Supercomputer Center in La Jolla, CA. © 2010 Elsevier B.V. All rights reserved.
1. Introduction In substation reliability evaluation studies, there are two main approaches applied: the analytical approach and the Monte-Carlo simulation (MCS) approach. The analytical approach, such as Markov processes, is frequently utilized for reliability modeling of aging equipment and small substations [1–5]. Methods extending from the analytical approach to incorporate uncertainty calculations are also developed [6,7]. The advantages of the analytical approach include high accuracy and relatively fast computation time; the disadvantages are the limited number of states to be considered, and the inability to provide more reliability information. Moreover, in some situations, transitions between some states do not have Markovian characteristics and, therefore, cannot be modeled by standard Markov processes [8,9]. Compared to analytical methods, the MCS approach is a powerful tool that can handle more conditions related to reliability evaluation (i.e., impact of severe weather, load variation) of systems [3,10]. An uncertainty calculation can also be included [9]. Moreover, the MCS approach is suitable for large-scale systems and is capable of providing more comprehensive results than analytical methods. Consequently, the MCS approach is broadly applied
∗ Corresponding author. Tel.: +1 402 472 6852; fax: +1 402 472 4732. E-mail addresses:
[email protected] (H. Ge),
[email protected] (S. Asgarpoor). 0378-7796/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.epsr.2010.09.012
for reliability evaluations of transmission [11], distribution [12,13], and renewable energy systems [10,14]. In addition to the high computation burden, several other limitations also exist when applying MCS to reliability evaluation of aging equipment or systems while considering the effect of maintenance: (1) Most studies use a binary-state model to represent the component in a system in order to simplify the model and increase the convergence speed. Because of the lack of modeling states, other than success and failure, those models mask the impact of deterioration of equipment, maintenance, or other conditions, which are common in the operation of aging equipment or systems. (2) Algorithms are designed to be executed on a single processor, where the computation capacity and memory is limited. Consequently, the size or scale of studies using MCS for reliability evaluation is limited; and the speed of execution becomes relatively slow. (3) Few stochastic simulation approaches incorporate cost, which is critical and needed by asset managers to compare different strategies and make decisions. With the rapid development of computer technologies, parallel computing and supercomputers are becoming more readily available [15], with their large memory and fast computing power providing more effective platforms for performing reliability evaluations by simulation. Several pioneering studies have been reported for using parallel computing in reliability studies [16] and for con-
348
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
Fig. 1. Equipment reliability and maintainability history chart through simulation. Fig. 2. Diagram of the generation of a reliability and maintainability history chart of equipment.
ducting the composite reliability evaluation for the IEEE Reliability Test System (RTS) using distributed computers [17]. In these applications, traditional MCS algorithms with binary-state component models were parallelized; the scale down strategies and their efficiencies were examined. However, simulating multi-state Markov processes was not studied; and the application for system reliability evaluation with detailed equipment modeling was not included. For reliability assessment applications, in order to simplify the simulation problem and focus on developing generic approaches, the paper chooses relatively simple case studies for illustration. This paper first describes a new method for reliability evaluation using sequential MCS with multi-state and provides validation studies by comparing the simulation results with analytical results, for both equipment and simple systems. Secondly, a parallel methodology to perform MCS on different computers simultaneously is presented; and the performance and efficiency is discussed. 2. Methodology 2.1. Sequential MCS for equipment reliability evaluation There are two main approaches to MCS: state sampling and sequential sampling [11,18]. In state sampling, the system states are randomly sampled based on the probability distributions of the component states; in sequential sampling, the chronological behavior of the system is simulated by sampling sequences of system operating states. In reliability evaluations of substations where failures of aging equipment account for a large portion of outages, sequential sampling outperforms state sampling; because equipment is frequently modeled with multi-state, and the time-to-transition among states may follow other probability distributions (i.e., Weibull) [4,8,19]. For example, when considering equipment that is modeled by three states, working (abbreviated by (“UP”), failed (“DN”), and maintenance (“M”), the random transitions among these states can be given by a Markov process, assuming it meets the Markovian characteristics. The stochastic process of this model can be visualized by a set of continuously connected rectangles, where the color of the blocks represents the states, and the length of the rectangles indicates the duration of time in this state (referred to as the reliability and maintainability history chart for this equipment). The reliability and maintainability history chart naturally visualizes the stochastic transitions among the states with different holding times and can be adopted for reliability evaluation. Fig. 1 is a reliability and maintainability history chart of a three-state machine. In Fig. 1, the blue rectangle indicates that the equipment currently resides in a working state; and the lengths of the rectangles, tUP-i (i = 1,2,. . .), are the holding times of being in this working state, before making a transition to another state. The destination of the transitions and the holding time is random and determined by the probabilistic characteristics among those states; the holding time sets ({tUP-i }, {tDOWN-j }, and {tM-k }) have a specific probability distribution that can be determined by analysis of historical reliability and maintainability data. Given a reliability and maintainability history chart, the reliability indices, such as availability, A (percentage of time staying in a working state), frequency of failure, f (average number of failures
per unit of time), and the expected duration between failures, d, can be calculated from the reliability and maintainability history chart above by the following equations: A=
f =
d=
tUP-i +
t UP-i
tDOWN-j +
NUP–DN tUP-i +
tDN-j +
tDN-j
tM-k
tM-k
(1) (2)
(3)
NUP–DN
where NUP–DN is the total number of transitions from UP states to DN states in Fig. 1. The reliability and maintainability history chart shown in Fig. 1 can be generated from a sequential MCS. Any update of this history chart (such as the addition of a new state) will result in a new set of reliability indices. The final reliability indices are the mean values of all the sets. For example, if the total availability value calculated during the simulation is {Ai }, where Ai is the availability value computed after the ith iteration, the estimated availability from the simulation is 1 N
A = E[Ai ] =
Ai
(4)
where N is the total number of iterations. For the purpose of checking the convergence and terminating the iteration process, there are several different types of stop criteria, such as maximum number of iterations, maximum execution time, or coefficient of variance. Among these criteria, coefficient of variance is widely utilized in MCS for reliability evaluation [18]. The coefficient of variance (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation, , to the mean, . The CV value of the availability sets during simulation is
CV =
(Ai − E[Ai ])2 /N E[Ai ]
(5)
The above simulation process and stop criteria enable generation of a reliability and maintainability history chart and calculation of reliability indices for equipment. However, in practice, recording reliability and maintainability history chart data and calculation of reliability indices after the addition of new states is both time consuming and memory intensive. Thus, the simulation process is modified to separate the reliability and maintainability history chart into different periods, where calculation of reliability indices and checking for stop criteria are only activated at the end of each period. Then the original reliability history chart is discarded, and only reliability-related data (such as total simulation time in this period, total transition to failure states, and total duration of being in a failure state) is recorded. Fig. 2 illustrates a modified reliability and maintainability history chart generated by sequential MCS. In Fig. 2, in order to decrease the massive flow of simulated raw data, the program performs the following procedures after a predefined period: (1) calculate reliability indices; (2) check the stop
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
349
criteria; (3) record reliability-related information; and (4) discard raw simulation data. Compared to Fig. 1, this modification reduces computation of stop criteria and the required memory, thereby improving the computation efficiency. Also, because raw simulation data will be released periodically, the modification also reduces the program’s dependency on memory size. For the program where historical reliability data is insufficient and uncertainty exists, fuzzy values can be used to quantify the uncertainty degrees by fuzzy membership functions. The corresponding fuzzy reliability indices can be calculated accordingly by fuzzy reliability assessment approaches. The details can be found in [6] for the analytical approach and [9] for the simulation approach.
Fig. 3 illustrates the cost incurred during a change of equipment state, where P is the power (kW) that the equipment delivered. For example, when equipment state transits from UP to M, there is a maintenance cost, Cper M incurred; and another cost, P · tM1 · CkWh is incurred during the equipment outage durations due to maintenance outages. In other words, Fig. 3 is, in fact, the visualization of the process described by Eq. (7). It should be noted that this paper focuses on long-term cost assessment of operating equipment/systems, for reliability assessment purposes only, not for the purpose of electric market design. When conducting a practical cost analysis, additional factors such as production cost, congestion cost, market roles, etc should be considered.
2.2. Sequential MCS for equipment cost evaluation
2.3. Parallel computing
Economic evaluation is crucial in power system reliability evaluation and maintenance engineering analysis. Maintenance decision-support programs have to incorporate economic or costbenefit analysis to determine the optimal maintenance schedules. In power system reliability and maintenance analysis, not only the cost of maintenance should be considered but also the cost penalty of unreliability due to equipment failure or maintenance outage should be included in calculations. Typically, during the life cycle cost (LCC) analysis of critical equipment, investment cost, operation and maintenance cost, penalty cost, and salvage value are considered [20]. However, from a day-to-day operations standpoint, only the following costs are considered in this paper [13]. Total cost = Maintenance cost + Outage cost = Maintenance cost + cost of interruption +Cost of lost energy
(6)
The cost incurred during interruption can be due to either failures or maintenance. In this paper, the cost of maintenance is added into “Cost of Interruption,” since in practice, the expense for repairing or replacing equipment may not be the primary expense as compared to the costly failure consequence/penalty cost. The cost of lost energy is assumed to be the energy not served multiplied by the cost of lost energy per kWh of electricity. Therefore, analytical Eq. (6) can be applied to calculate the annual cost of equipment. Here, the annual expected cost, CAnnual , is given by Eq. (7), CAnnual = CMaintenance + Coutage = fM · Cper +L · T · CkWh = A · M · Cper +365 · (1 − A) · L · CkWh
M
M
+ fD · Cper
+ A · D · Cper
D
D
(7)
where CMaintenance and COutage are the annual maintenance and outage cost ($/yr); fM and fD are the frequencies of maintenance and failure (unit: occurrences/year, abbreviated as occ./yr here after); M and b are maintenance and failure rates; A is availability; L is the expected load delivered through the equipment (kW); T is the duration of the power interruption (hour); Cper M is the cost per maintenance occurrence ($/occ.); Cper D is the interruption cost per failure ($/occ.); and CkWh is the cost of losing one kWh of energy ($/kWh). For simplicity and emphasis on methodology illustration and validation, the discount rate is assumed to be zero. Since the cost incurred is closely related to the equipment state (UP, DN, and M) during the operation history, the cost can also be simulated during generation of the reliability history chart. Fig. 3 presents a diagram for simulating a cost history chart, similar to Fig. 2.
Parallel computing has been developed as a technique to solve the limitation of memory latency in computation capacity. According to [21], main memory in a parallel computer can either be shared memory, which is shared between all processing elements in a single address space, or distributed memory, in which each processing element has its own local address spaces. The supercomputer on which the subject program is executed is based on a distributed memory architecture. The programming model adopted in the application is a masterslave model with communication among different processors. For communication among different processors, several message passing techniques are available, with Message Passing Interface (MPI) widely utilized as it supports both shared memory architecture and distributed memory architecture [15]. The objective in parallel programming is to maximize the utilization of processors and minimize the communication among different processors, in which the tasks of scaling down a sequential code to parallel code is the primary work. In the programming, the dispatching of jobs for each processor and communications among those processors are the key factors to achieving high performance in parallel computing. In this paper, the On Demand (Rocks-131) Cluster in the San Diego Supercomputer Center (SDSC) is utilized [22]. The cluster has 32 nodes with each node having two processors. Each node has 8G of memory. The StarP server was installed in this supercomputer; the StarP for MATLAB [23] was used to parallelize existing serial MATLAB code, where the users could manage the distribution of parallel tasks and the gathering of results, according to the diagram in Fig. 4. The disadvantage of choosing a parallel computing platform such as StarP is that the speed-up values may not be as good as coding the program without parallel computing platform. The reason is that the detailed variance definition and communications among processors will be managed by StarP, instead of the users. However, if no parallel computing platform is used, one has to balance the cost of rewriting the code to fit parallel computing and the performance improvement. Therefore, StarP was adopted for task dispatching and communication coordination among processors. 2.4. Parallel sequential MCS of equipment 2.4.1. Dependency on initial state Theoretically, the sequential MCS discussed in Section 2.1 is capable of modeling equipment with any number of states. In practice, the memory requirement to record reliability-related data and the computation, as well as dependency on the computation to check the stop criteria cannot always be met when running in a single processor environment. Moreover, the generation of a reliability history chart in Fig. 1 cannot be directly scaled for parallel computing because of its nature, i.e., in sequential simulation, the determination of every state depends on its previous state.
350
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
Fig. 3. Diagram of the generation of a cost history chart of equipment.
However, since reliability indices are calculated as the mean value of the total indices in the simulation and they are steadystate measures, the selection of an initial state at every period in Fig. 2 will not have explicit influence on the results, as long as the number of states in a period is not too short. This hypothesis will be verified later by performing case studies. With this hypothesis, the generation of a reliability history chart for each period is independent of other periods. The character of independency indicates that the generation process can be separated into different periods and simulated independently, which is a typical example of task-parallel application. For this task-parallel application, the task (reliability history generation in each period) can be dispatched to a worker processor; and overall task coordination can be assigned to a master processor. Based on the description above, Fig. 4 shows the reliability history chart generation in a parallel computing environment, with CPUs 1, 2, and 3 as workers and CPU 0 as the master. Fig. 5 provides a flowchart for using parallel sequential MCS in reliability evaluation of equipment.
Because of this independency assumption, there are two strategies available for parallel simulation. First, the simulation of a reliability history diagram of each component in the system can be dispatched to different processors and simulated simultaneMASTER CPU 0 Initialization: 1. Model 2. No. of transitions per period 3. Convergence criteria 4. Initial state
Distribute the task of simulating one period, to one processor
1) Generate random vectors as the random durations between transitions (exponential or non-exponential distributions)
Begin simulation of one period
2) Sample the duration of the transition of all possible transitions that depart from the current state
2.5. Parallel MCS for systems 2.5.1. Strategy of parallelizing In a system where different components are interconnected, the components are operated either independently or dependently (for example, protection control). To focus on illustrating the approach, it is assumed that the components are operated independently. For example, for a critical transformer that has a spare one connected to it, usually the spare transformer will only be loaded when the primary one fails or needs to be de-energized due to maintenance; in this case, the two transformers are operated dependently. This characteristic should be reflected during the simulation in order to improve the module accurately.
3) Choose the state such that the duration value is the minimum among all possible durations as the next state. NO Record the current state and the duration before it transfers to another state
Reach the end ?
YES Calculate reliability indices from current period and previous periods WORKER CPU 1 WORKER CPU 2 WORKER CPU 3 MASTER CPU 0 Meet the convergence criteria
NO Simulation of next three periods
YES Display the results
Fig. 4. Parallel generation of reliability history chart.
Fig. 5. Flowchart of parallel sequential MCS for reliability evaluation.
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
351
the modified sequential MCS in equipment reliability evaluation. However, it should be emphasized that, for equipment and small system reliability evaluations, analytical methods such as Markov processes will still be the first choice. The simulation of equipment with multi-states is proposed to provide a foundation for largescale system reliability evaluation only. 3.2. Validation of sequential MCS for cost evaluation
Fig. 6. Parallel sequential MCS for system reliability evaluation.
ously. Second, similar to the parallel simulation of equipment, the simulation can be separated into different periods; and the generation of reliability history of each period is executed on a processor. Comparing these two strategies, the second strategy limits the communication among different states within the same processor, while the first strategy requires communication among various processors. Therefore, compared with the first strategy, the second strategy has less communication cost and, therefore, is adopted in this paper. Fig. 6 shows the parallel simulation of a system with independent operation components. 3. Case studies 3.1. Validation of sequential MCS for reliability evaluation An analytical method, such as Markov processes, is selected as a reference to validate the results of sequential MCS for reliability evaluation of multi-state equipment. Fig. 7 is a Markov process for equipment with UP, M, and DN states. In Fig. 7, the transition times for all transitions are assumed to be exponentially distributed. Calculation of availability, A, frequency of failure, f, and expected duration between failures, d, from analytical approaches are available from [3]. The values of the parameters of this model are chosen to be D = 1/1095 failures/yr, = 1/40 replacement/yr, M = 1/365 maintenance/yr, and M = 1/10 repair/yr. In this model, it is assumed that after the failures, the equipment will either be fully restored to a success state or be replaced by a new one. Following the procedures described above, the reliability indices, as well as the probability distributions of reliability indices calculated from parallel computing, are presented in Figs. 8 and 9. The analytical results are also presented as a reference. From Figs. 8 and 9, it can be observed that the sequential MCS provides closed results compared to the analytical methods. The relative errors for A, f, and d are 0.0357%, 0.41%, and 0.193%, respectively. Fig. 8 validates the correctness of the hypothesis that the selection of the initial state in every period does not have explicit impact on the final result. Moreover, this result validates the accuracy of
λM M
λD UP
3
µM
DN 1
µ
2
Fig. 7. State-space diagram of a three-state Markov process.
Following the procedure described in Section 2.2, the cost analysis for equipment, including the annual expected cost, is calculated. The input values for this analysis are: Cper M = 65,000$/ maintenance; Cper D = 1,000,000$/failure; and CkWh = 0.07$/kWh; P = 1 MW. The same indices were also calculated, through analytical methods by (7); and the results were taken as a reference. Fig. 10 compares simulation results with analytical results references. The probability distributions of the results are also presented. Similar to Fig. 8, Fig. 10 validates the accuracy for cost analysis. The relative error of the total annual cost between simulation and analytical methods is less than 0.38%. In addition, the simulation results can provide more information than analytical results, such as the distribution of possible cost values presented in Fig. 11. 3.3. Validation of parallel MCS for reliability evaluation The same model in Fig. 7 is scaled and executed on four nodes on the Rock-131 supercomputers in SDSC [24]. In order to examine the improvement of the computation efficiency, the maximum iterations number of 2,000,000 (number of periods is 10,000, transitions within each period is 200) is selected as the stop criteria, rather than the coefficient of variance. The purpose of choosing fixed iteration steps over coefficient of variance (CV) is to better control iteration length, in order to keep the computation task the same, for the purpose of comparing performance between parallel computing (multiple CPU) and serial computing (1 CPU). Since random number generation algorithms in the local MATLAB program and the StarP server on super computers are different, the simulated reliability historical diagram may not be the same. If CV is used as the stop criteria, the iteration length to achieve the same accuracy between serial computing and parallel computing may not be the same, hence the computation burden difference. Therefore, in order to control the iteration length and keep the computation burden comparable, the fixed number of iterations is used as the stop criteria. Figs. 12 and 13 give the results of parallel MCS. The parallel MCS results (labeled as “Parallel” in the legend) are compared with results obtained from both the single processor simulation (labeled as “Serial”) and analytical reference (labeled as “Reference”). Again, the parallel simulation can achieve very close results compared with the analytical method. However, the execution time is much less than using a single processor, demonstrated in Case A in Section 3.3. It should be noted that there are differences between reliability indices calculated from a single processor (serial) and parallel processors (parallel), especially during the time period of 0–0.5. The difference is caused by a breakdown of the initial sequential MCS presented in Fig. 3 and random number generation differences between the local MATLAB program and the StarP server on the supercomputers. Firstly, the break down of the initial sequential MCS will result in historical diagram difference as well as the difference in results values. However, in the long run it will not affect the accuracy of reliability indices.
352
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
Relative error: 0.00035721
0.95
a
Seq. MCS Analytical
0.945 0.94 0.935 0.93 0.925
0
0.5
1
1.5
2
2.5
3
time (year) Relative error: 0.0041017
1.4
3.5 x 10
b
4
Seq. MCS Analytical
1.35 1.3 1.25 0
0.5
1
1.5
2
2.5
3
time (year) Relative error: 0.0019285
19.5
c
19
3.5 x 10
4
Seq. MCS Analytical
18.5 18 17.5 17
0
0.5
1
1.5
2
2.5
3
time (year)
3.5 x 10 4
Fig. 8. Simulation results of the reliability indices of three-state equipment: (a) availability, A; (b) frequency of Failure, f (occ./yr); (c) duration between failures, d(day).
In parallel computing environment, the simulation of each period presented in Fig. 4 is independent of other periods; in contrast, the simulation in serial computing environment has the characteristic that the generation of next state always depends on the previous state. Therefore, the generated reliability historical charts from serial and parallel computing environment of the same case will not be the same. However, because in parallel MCS the reliability indices are average values calculated from all simulated reliability historical chart in all processors according to Eqs. (1)–(3), there is no distinct difference between the results calculated from parallel computing environment compared to serial computing environment. Secondly, the random number generation variations will also result in historical chart differences, which will also result in difference in the simulated reliability history diagram presented in Fig. 3. Again, in the long run, this difference only causes acceptable errors between serial computing results and parallel computing results. This is illustrated by comparing reliability indices values computed from serial and parallel computing environments in Figs. 12 and 14. 3.4. Parallel MCS for a simple parallel connected system A simple parallel connected system is used as a demonstration of using parallel simulation for a system reliability evaluation. Suppose in the parallel connected system that each component is modeled by the three-state Markov process in Fig. 7. The 100
100
50
50
parameters for each model are chosen as 1 = 1/1095, 1 = 1/40, M1 = 1/365, M1 = 1/10; 2 = 1/543, 2 = 1/20, M2 = 1/180, M2 = 1/5. Following the procedure to generate a reliability history chart described in Fig. 4, the reliability indices are calculated. Figs. 14 and 15 illustrate the system reliability results achieved by both the parallel simulation, as well as the analytical results given as a reference. Again, Figs. 14 and 15 validate the accuracy of using a parallel simulation method for system level reliability studies. 3.5. Parallel MCS for substations A case for reliability evaluation of a substation is presented to validate the accuracy and computation efficiency of the proposed method. Fig. 16 shows the simplified one-line diagram of the substation to be studied. In Fig. 16, equipment (transformers and circuit breakers) within this substation are modeled by a three-state model; and the algorithm will study the Load-Point 1 availability, through the parallel MCS method. For simplicity, it is assumed that the availability of sub-transmission lines is 100% (no failure). The Load-Point 1 availability acquired by a parallel MCS method is presented in Fig. 17. For validation purposes, the availability values calculated by the analytical approach and traditional sequential MCS method are also presented for comparison. 150
100
50
0 0.9
0.92
0.94 Availbility
0.96
0.98
0 0.5
1
1.5
Frequency of failure
2
0 10
15 20 25 Duration between failures
Fig. 9. Probability distributions of the reliability indices of three-state equipment.
30
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
7
Relative error: 0.0047554
x 10 4
a
Seq. MCS Analytical
6.5
6
4
0
0.5
x 10
353
1
1.5
2 2.5 time (year)
3
3.5
4
4.5 x 10
4
Relative error: 0.0035015
5
Seq. MCS Analytical
b
3.5
3
5
0
0.5
x 10
1
1.5
2 2.5 time (year)
3
3.5
4
4.5 4
x 10
Relative error: 0.003762
5
Seq. MCS Analytical
c
4.5 4 3.5
0
0.5
1
1.5
2 2.5 time (year)
3
3.5
4
4.5 4
x 10
Fig. 10. Simulation results of the annual cost of a three-state equipment (a) annual maintenance cost ($/yr); (b) annual outage cost ($/yr); (c) total annual cost ($/yr).
Again, Fig. 17 validates that the result achieved by the parallel MCS method is very close to the result calculated by the analytical or traditional MCS methods while significantly reducing the execution time. Fig. 18 provides the execution time length (s) of the parallel MCS method, when utilizing 8, 16, 32, 64, and 96 CPUs, which proves the advantages of parallel MCS in reducing execution time. It should be noted that because the major purpose of this paper is methodology illustration, speed-up curves and computation efficiency studies are not examined here. However, the algorithm could be rewritten using FORTRAN or C with MPI for computation efficiency analysis purposes, which is not the focus of this paper. The difference between the availability values calculated from parallel MCS and traditional MCS is caused by the difference between the random number generation approaches adopted on Star-P and MATLAB. This is similar to the difference in results obtained in Figs. 12 and 14. 4. Discussion • A performance study (speedup values versus the number of processors) of parallel computing in developing the methodology
to scale down an existing algorithm is necessary. The preference is to perform this study by rewriting the algorithm in C or FORTRAN codes, rather than high level computer languages such as Star-P, because it allows manual control of the distribution of tasks to different processors and coordinates communication among the processors. However, this paper is aimed at developing methodologies and algorithms rather than scaling down an existing algorithm; thus, studies to speed up the processing and improve efficiency as a result of a parallelizing strategy are not presented. • Variance reduction techniques, such as the important sampling [18] and antithetic variates method [2], may be necessary when the number of states increases. This is because if the number of rare events sampled is insufficient, the accuracy of the reliability indices as well as the convergence speed will be reduced. • In parallel simulation, the random number generated among different processors must be irrelevant. It is very important to ensure the low relevance of the random numbers generated among the processors, to achieve high accuracy of the results. This can be achieved by selecting a different seed or utilizing a toolbox for simultaneously generating random numbers among different processors.
Mean value =61415.4692 $/yr 600
Mean value =351375.4006 $/yr 600
400
400
200
200
Mean value =412790.8697 $/yr 400 300 200
0
6
6.5 C. of Maint.($/yr)
7 x 10
4
0
100 3
3.5 C. of Outage ($/yr)
4 x 10
5
0 3.5
4 4.5 Total C.($/yr)
Fig. 11. Probability distributions of the annual maintenance, outage, and total cost.
5 x 10
5
354
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356 0.97
a
Serial Parallel Analytical
0.96 0.95 0.94 0.93 0.92
0
0.5
1.4
1
time (year)
1.5
2
2.5 x 10 4
b
1.35
Serial Parallel Analytical
1.3 1.25 1.2 1.15
0
0.5
20
1
time (year)
1.5
2
2.5 4 x 10
c
18 16
Serial Parallel Analytical
14 12
0
0.5
1
time (year)
1.5
2
2.5 x 10 4
Fig. 12. Parallel simulation results of the reliability indices of a three-state equipment (a) availability A; (b) frequency of failure f (occ./yr); (c) duration between failures d(day).
80
80
80
60
60
60
40
40
40
20
20
20
0
0.9
0.92
0.94 0.96 0.98 Availability
0
0.5
1 1.5 Frequency offailure
2
0
10
15 20 25 30 Duration between failures
Fig. 13. Probability distributions of the reliability indices of three-state equipment from parallel simulation.
1
a
Serial
0.999
Parallel Analytical
0.998 0.997 0.996 0.995
0
0.5
1
1.5
2
2.5
3
time (year) 0.35
4
x 10
b
Serial
0.3
Parallel Analytical
0.25 0.2 0
0.5
1
1.5
2
2.5
3
time (year) 8
4
x 10
c
6 Serial
4
2
Parallel Analytical
0
0.5
1
1.5
time (year)
2
2.5
3 4
x 10
Fig. 14. Parallel simulation results of the reliability indices of a parallel connected system (a) availability Asys ; (b) frequency of failure fsys (occ./yr); (c) expected duration between failures dsys (day).
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356 150
150
100
100
50
50
355
200 150 100
0 0.985
0.99
0.995
1
1.005
0
50
0
0.2
0.4
0.6
0.8
Frequency of failure
Availability
0
0
5
10
15
20
Duration between failures
Fig. 15. The probability distributions of the reliability indices of a parallel connected system.
erating the reliability and maintainability history charts. A parallel Monte-Carlo simulation algorithm was also developed to separate the simulation tasks and execute them simultaneously on different CPUs. Compared with traditional MCS, the parallel Monte Carlo simulation algorithm developed here has the following advantages: • It reduces the total simulation execution time while maintaining high accuracy and simulation details. • It efficiently utilizes multiprocessors and large memory resources that will be widely adopted among personal computers in the next decades. • It extends traditional sequential MCS with the capability to simulate and study the impact of equipment maintenance toward system-level reliability changes. Fig. 16. Simplified one-line diagram of a substation.
Acknowledgments 0.85
This work is supported by National Science Foundation (NSF) Grant ECS-0523498 and the Cyber Infrastructure Experiences for Graduate Students (CIEG) Program. The authors are grateful to Dr. Amitava Majumdar, Dr. Mahidhar Tatineni and Dr. Yifeng Cui in the San Diego Supercomputer Center (SDSC) for their helpful advice.
1CPU, time=42.5787
0.845
Parallel 96NP, time=3.1207
Availablity
Analytical
0.84
References
0.835
0.83
0
0.5
1
1.5
2
2.5
3
Total Simulation Period (year)
3.5
4 x 10 5
Fig. 17. Comparison of load point availability conducted by parallel MCS and other methods.
Fig. 18. Comparison of execution times under different number of CPUs.
5. Conclusion A sequential MCS method to simulate the multi-state stochastic processes of equipment and systems was developed based on gen-
[1] J. Endrenyi, G.J. Anders, A.M. Leite da Silva, Probabilistic evaluation of the effect of maintenance on reliability – an application, IEEE Transactions on Power Systems 13 (May (2)) (1998) 576–583. [2] G.J. Anders, Probability Concepts in Electric Power Systems, Wiley, New York, 1990. [3] R. Billinton, R.N. Allan, Reliability Evaluation of Power Systems, second ed., Plenum Press, London, 1996. [4] H. Ge, C.L. Tomasevicz, S. Asgarpoor, Optimum maintenance policy with inspection by semi-Markov decision processes, in: 39th North American Power Symposium, vol. 1, no. 1, September 2007, pp. 541–546. [5] H. Ge, S. Asgarpoor, An analytical method for optimum maintenance of substation, in: IEEE Transmission and Distribution Conference and Exposition 2008, Chicago, IL, April 2008, pp. 1–6. [6] H. Ge, S. Asgarpoor, Reliability evaluation of equipment and substations with fuzzy Markov Processes, IEEE Transaction on Power System 25 (3) (2010) 1319–1328. [7] H. Ge, S. Asgarpoor, Markov processes with fuzzy parameters – a case study, Probabilistic Methods Applied to Power Systems (PMAPS) 25–29 (May) (2008). [8] C. Singh, R. Billinton, System Reliability Modeling and Evaluation, Hutchinson Educational Publishers, London, UK, 1977 (Online) http://www.ece.tamu.edu/People/bios/singh/sysreliability. [9] X. Bai, S. Asgarpoor, Fuzzy-based approaches to substation reliability evaluation, Electric Power Systems Research 69 (May (2–3)) (2004) 197–204. [10] R. Billinton, H. Chen, R. Ghajar, A sequential simulation technique for adequacy evaluation of generating systems including wind energy, IEEE Transaction on Energy Conversion 11 (1996) 728–734. [11] W. Li, Risk Assessment of Power Systems, Wiley-IEEE, New York, 2005. [12] R.E. Brown, Electric Power Distribution Reliability, Marcel Dekker, New York, 2002. [13] R.E. Brown, S.S. Venkata, Predictive distribution reliability and risk assessment, IEEE Tutorial Course Probabilistic T&D System Reliability Planning, 07TP182 (2007) 29–36. [14] H. Ge, L. Ni, S. Asgarpoor, Reliability-based stand-alone photovoltaic system sizing design – a case study, Probabilistic Methods Applied to Power Systems (PMAPS) (May 2008).
356
H. Ge, S. Asgarpoor / Electric Power Systems Research 81 (2011) 347–356
[15] W. Gropp, E. Lusk, A. Skjellum, Using MPI – 2nd Edition: Portable Parallel Programming with the Message Passing Interface, MIT Press, Cambridge, Massachusetts, 1999. [16] N. Gubbala, C. Singh, Models and considerations for parallel implementation of Monte Carlo simulation methods for power system reliability evaluation, IEEE Transactions on Power Systems 10 (May (2)) (1995) 779–787. [17] C.L.T. Borges, D.M. Falcao, J.C.O. Mello, A.C.G. Melo, Composite reliability evaluation by sequential Monte Carlo simulation on parallel and distributed processing environments, IEEE Transactions on Power Systems 16 (May (2)) (2001) 203–209. [18] R. Billinton, W. Li, Reliability Assessment of Electric Power Systems Using Monte Carlo methods, Plenum Press, New York, 1994. [19] S. Ross, Stochastic Processes, second ed., Pearson Education, Inc., 1996. [20] C.E. Ebeling, An Introduction to Reliability and Maintainability Engineering, McGraw Hill, New York, 1997. [21] J.L. Hennessy, D.A. Patterson, Computer Architecture: A Quantitative Approach, third ed., Morgan Kaufmann, New York, 2002. [22] On Demand (Rocks-131) Cluster User Guide (Online) http://www.sdsc.edu/ us/resources/ondemand/. [23] Start-P for MATLAB Users (not acquired by Microsoft. Inc. (Online) http:// www.microsoft.com/pathways/star-p/.
[24] Installing and Running Star-P on the On Demand Cluster (Online) http:// www.sdsc.edu/us/resources/ondemand/Star-P.html. Haifeng Ge received the Bachelor and Master degrees in Electrical Engineering from Xi’an Jiaotong University, Xi’an, China in 2003 and 2006, and Ph.D. degree from University of Nebraska-Lincoln, Lincoln NE, U.S.A. in 2010, all in Electrical Engineering. He worked in the Electric Power Research Institute (EPRI), Palo Alto, CA in 2008 as an Intern Student, and San Diego Supercomputer Center (SDSC), La Jolla, CA in 2008 as a Visiting Student. He joined ABB Inc. since 2009 as a consulting engineering. His research focuses on Reliability, Asset Management, and Parallel Computing. Sohrab Asgarpoor received his B.S. (1978), M.S. (1981), and Ph.D. (1986) all in Electrical Engineering from Texas A&M University. From 1986 to 1989, he was with ABB Network Management Inc. (formerly Ferranti International Controls Corp.) as a Lead Engineer where he designed and developed advanced power system applications software for energy management systems. Since September 1989 he has been with the University of Nebraska-Lincoln, where he is an Associate Professor in the Department of Electrical Engineering. His areas of interest include reliability evaluation, maintenance optimization, and advanced computer applications in security and optimization of electric power systems.