Reliability Engineering and System Safety 83 (2004) 301–309 www.elsevier.com/locate/ress
Empirical models based on machine learning techniques for determining approximate reliability expressions Claudio M. Rocco S.a,*, Marco Musellib b
a Facultad de Ingenierı´a, Universidad Central Venezuela, Caracas, Venezuela Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni, Consiglio Nazionale delle Ricerche, Genova, Italy
Received 7 May 2003; accepted 5 October 2003
Abstract In this paper two machine learning algorithms, decision trees (DT) and Hamming clustering (HC), are compared in building approximate reliability expression (RE). The main idea is to employ a classification technique, trained on a restricted subset of data, to produce an estimate of the RE, which provides reasonably accurate values of the reliability. The experiments show that although both methods yield excellent predictions, the HC procedure achieves better results with respect to the DT algorithm. q 2003 Elsevier Ltd. All rights reserved. Keywords: Network reliability evaluation; Reliability expression; Rule generation; Decision tree; Hamming clustering
1. Introduction The reliability of a network represents its ability of continuing to perform a specified operation despite the effects of malfunctioning and damages [1]. For example, the reliability of an s – t network is given by the probability that exists at least a working path from a source node s to a terminal node t: Its exact numerical evaluation requires the computation of the symbolic reliability expression (RE) for the system at hand, which is also needed to solve several related problems, such as reliability allocation and optimisation [2]. In general, the connectivity of the network can be ascertained by knowing all the cut sets (or equivalently all the path sets) of the network [2] or by using a depth-first procedure [3,4]. Most of the methods proposed in the literature produce a compact RE by employing Boolean concepts to obtain a sum-of-product expression and, subsequently, to transform it into an equivalent sum of disjoint products (SDP). For example, once the set of minimal s – t paths of a network has been determined, it can be encoded as a sum-of-product and then rewritten into an equivalent and compact SDP [5]. In order to make affordable the evaluation of the network reliability, several methods have been proposed with two * Corresponding author. E-mail addresses:
[email protected] (C.M. Rocco S); marco.
[email protected] (M. Muselli). 0951-8320/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2003.10.001
main goals: (a) simple and fast enumeration of all minimal path or cut sets and (b) efficient extraction of a compact RE from the minimal sets [6]. For this second goal, it has been assumed that the successful operation of all the elements of any path ensures system success [2]. However, in several situations, such as telecommunication networks, pipeline systems, computer nets and transporting systems, the connectivity itself is not a sufficient condition to ensure reliability. In these cases, the success of the network also requires that a sufficient flow is guaranteed, which depends on the capacity of the elements involved. Thus, a network performs well if and only if it is possible to transmit successfully the required capacity from the source node s to the terminal node t [2]. To evaluate if a given network configuration is capable or not of transporting a required flow, several techniques can be used, such as the max-flow min-cut algorithm [3 – 4,7] or procedures based on the concept of composite paths [2,8]. In other networks, such as electric power systems, the assessment is more complicated since the success of a given state requires additional computation, such as load flow studies among others [9]. In these cases the RE can be difficult to be retrieved. Nevertheless, all the proposed methods to obtain REs are computationally hard, and efficient approximation algorithms have therefore been suggested to reduce the computational burden [5,6].
302
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
At least in principle, standard classification techniques, such as neural networks, radial basis function networks, decision trees, etc. can be employed to retrieve the RE of a network; it is sufficient to examine different states of the network and to established if in each of them the prescribed operation can be performed or not. This produces a collection of examples (training set) for the network at hand, which can be used by any classification method to reconstruct a binary function corresponding to an approximation of the desired RE. Unfortunately, the analytical expression of the binary function produced is difficult to be understood, since it generally involves non-linear operators, whose meaning is not directly comprehensible. A possible solution to this problem consists in adopting rule generation methods [10], a particular kind of classification techniques that are able to generate a set of intelligible rules describing the binary function to be reconstructed. Usually, the rules assume the following if-then form: ‘If X is true and Y is false, then conclude class 1’ [23]. In this paper two machine learning algorithms, belonging to the family of rule generation methods [10], are employed to obtain the approximate reliability expression (ARE) for a network. The first of them develops a decision tree (DT), by recursively dividing the training data on the basis of the state (operating or failed) of network components. This produces a classifier represented by a tree, which can be easily transformed into a set of intelligible rules [11]. The second rule generation algorithm considered is Hamming clustering (HC) [10]; it aims at reconstructing the AND – OR expression associated with any Boolean function starting from a given collection of examples. The basic kernel of the method is the generation of clusters of input patterns that belong to the same class and are close to each other according to the Hamming distance. DT algorithms have been widely applied in different areas [12 –15]. In reliability studies, Bevilacqua et al. [16] proposed the analysis of pump failure using a DT approach. Recently, Rocco [17] has employed a DT approach to speed up a Monte Carlo simulation for system reliability assessment. HC has been successfully used in the synthesis of digital circuits [18] and tested on real-world classification examples [10]. To our best knowledge, neither of the two methods has yet been used to obtain approximate reliability expressions. The paper is organised as follows: In Section 2 some definitions are presented. Sections 3 and 4 introduce the methods considered in this paper, while Section 5 compares the results obtained through DT and HC on two different examples. Finally, Section 6 contains the conclusions.
independent events. The state xi of the ith component is defined as [19]: ( 1 ðoperating stateÞ with probability Pi xi ¼ 0 ðfailed stateÞ with probability Qi ¼ 1 2 Pi ð1Þ where Pi is the probability of success of component i: The state of a system containing N components is expressed by a vector x: x ¼ ðx1 ; x2 ; …; xN Þ To establish if x is an operating or a failed state for the network, we define a proper evaluation function (EF): y ¼ EFðxÞ ( 1 if the system is operating in this state ¼ 0 if the system is failed in this state
A depth-first procedure [3,4] can be employed as an EF, if the criterion to be used for establishing reliability is simple connectivity. In the case of capacity requirements, the EF could be given by the max-flow min-cut algorithm. In other systems, special EF may be used [9,20]. For a system of N components, the performance of the whole system is described by the structure function (SF) [1]. The SF is a Boolean function that can be written as a sum-of-product involving the component states xi or their complements x i ; a proper transformation can then be applied to produce an SDP expression. From this SDP expression the value of the RE for the network at hand can be readily obtained by changing logical sums and products with standard sums and products between real numbers and by substituting every component xi with the corresponding probability Pi and every complement x i with the probability Qi : As an example, consider the network in Fig. 1, whose component and system states are shown in Table 1. A sumof-product expression for the SF of this network is readily obtained by direct inspection, SFðxÞ ¼ x1 x3 þ x2 x4 ; which is equivalent to the SDP expression SFðxÞ ¼ x1 x3 þ x 1 x2 x4 þ x1 x2 x 3 x4 : Consequently, the RE for this system is given by P 1 P3 þ Q 1 P 2 P4 þ P1 P2 Q 3 P 4 :
2. Definitions It is assumed that system components have two states (operating and failed) and that component failures are
ð2Þ
Fig. 1. A 4-components network.
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309 Table 1 Component and system states for the network shown in Fig. 1 x1
x2
x3
x4
y ¼ EFðxÞ
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1
3. Decision trees Decision tree based methods represent a non-parametric approach that turns out to be useful in the analysis of large data sets for which complex data structures may be present [21]. A DT uses a divide-and-conquer strategy: It attacks a complex problem by dividing it into simpler sub-problems and recursively applying the same strategy to solve each of these sub-problems [22]. Like any classification method, the task of a DT is to obtain a model of the system at hand starting from a finite collection of examples generated according to an unknown function describing the behaviour of the system. Every node in a DT is associated with a component of the network, whose current state is to be examined. From each node start two branches, corresponding to the two different states of that component. Every terminal node (or leaf node) is associated with a class, determining the network state: Operating or failed. Conventionally, the false branch (failed state of the component) is positioned on the left and the true branch (operating state of the component) on the right. As an example consider the DT shown in Fig. 2. When a new case is presented, the DT checks if the component x2 is
303
operating or failed. If x2 is failed, the left branch is chosen and a new test on component x1 is performed. If x1 is failed, the left branch is chosen and y ¼ 0 is concluded. Formally, a DT is a direct acyclic graph in which each node is either a decision node with two or more successors or a leaf node. Every leaf node is labelled with a class, whereas every decision node is associated with a test on attribute values (component state) and gives rise to one branch for each possible output of the test [22]. When constructing the DT associated with a given set of examples, it may seem reasonable to search for the smallest tree (in terms of numbers of nodes) that perfectly classifies training data. Although it is not guaranteed that this tree performs well on a new test sample [23], its generation requires the solution of an NP-hard problem. For this reason DT methods usually exploit heuristics that locally perform a one-step look-ahead search; once a decision is taken it is never reconsidered. This hill-climbing search without backtracking is susceptible to the usual risks of converging to locally optimal solutions that are not globally optimal. On the other hand, this strategy allows building decision trees in a computation time that increases linearly with the number of examples [22]. Any decision tree can be easily transformed into a collection of rules by following the different paths that connect the root to the leaves. Every node encountered produces a condition to be added to the if part of the rule; the final leaf contains the output value to be selected when all the conditions in the if part are satisfied. Since the tree is a direct acyclic graph, we have as many rules as leaves [22]. As an example, for the tree shown in Fig. 2, three rules give the output y ¼ 0; while two rules give output y ¼ 1: It can be easily seen that a decision tree yields a set of conjunctive decision rules. The conditions in the if part are connected through And operations, whereas different rules are grouped in an if – then – else structure For example, the following set of rules solves the problem in Fig. 1: if x1 ¼ 0 AND x2 ¼ 0 else if x1 ¼ 0 AND x4 else if x2 ¼ 0 AND x3 else if x3 ¼ 0 AND x4 else y ¼ 1
then y ¼ 0 ¼ 0 then y ¼ 0 ¼ 0 then y ¼ 0 ¼ 0 then y ¼ 0
which is equivalent to the SF for this network SFðxÞ ¼ x1 x3 þ x2 x4 :
Fig. 2. Example of a decision tree.
Since rules are generated for all the possible output values, a large decision tree is often produced, which is difficult to be understood. To recover this problem, proper optimisation procedures can be adopted, which aim at simplifying the resulting set of rules. Empirical studies have shown that in many situations the resulting set of rules is more accurate than the decision tree [22].
304
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
3.1. Building the tree Different algorithms for constructing decision trees essentially follow a common approach, called top-down induction; the basic outline is as follows [24]: 1. Begin with a set of examples (training set) T. If all the examples in T belong to one class, then halt. 2. Consider all the possible tests that divide T into two or more subsets. Score each test according to how well it splits up the examples. 3. Choose (‘greedily’) the test that achieves the highest score. 4. Divide the examples into subsets and run this procedure recursively on each subset.
Table 2 Available subset of states for the network shown in Fig. 1 x1
x2
x3
x4
y ¼ EFðxÞ
0 0 0 1 1 1 1 1
0 0 1 0 0 1 1 1
0 1 1 0 1 0 0 1
1 1 1 1 1 0 1 1
0 0 1 0 1 0 1 1
EðA; p; nÞ of: For the problem at hand, only attributes with symbolic values are considered, so a test at a node splits an attribute into all of its values. Thus, a test on a component state with two possible values will produce at most two child nodes, each of which corresponds to a different value. The algorithm considers all the possible tests and chooses the one that optimises a pre-defined goodness measure. Ideally small trees are sought, since they lead to simpler set of rules and to an increase in the performance. To achieve this goal, splits should be based upon the most predictive components [23]. 3.2. Splitting rules [22] A splitting rule is guided by a measure of the ‘goodness of split’, that indicates how well a given attribute discriminates the classes. Several methods have been described in the literature: 1. Measures of the difference between the parent and the split subsets on some function of the class proportions (such as entropy). 2. Measures of the difference between the split subsets on some function of the class proportions (typically a distance or an angles). 3. Statistical measures of independence (typically a x2 ) between the class proportions and the split subsets. In this paper the method used by C4.5 [25] is selected. A decision tree can be considered as an information source that transmits a message concerning the classification of a system state. Let p be the number of operating states and n the number of failed states included in the training set. The expected information content (or entropy) Iðp; nÞ of the message is: p p n n log log Iðp; nÞ ¼ 2 2 ð3Þ pþn pþn pþn pþn If component A is selected, the resulting tree has an expected information content (or mutual information)
EðA; p; nÞ ¼
k X pi þ ni Iðpi ; ni Þ pþn i¼1
ð4Þ
where pi and ni are the number of instances from each class in the sub-tree associated with the partition i based on the state of component A: The information gain GainðA; p; nÞ achieved is given by: GainðA; p; nÞ ¼ Iðp; nÞ 2 EðA; p; nÞ
ð5Þ
The heuristic then selects the component that results in the maximum information gain for that step; a test on that component will divide the training set into two subsets. For example, consider the system shown in Fig. 1; as it is usually the case in a practical application, suppose that only a subset of the whole collection of possible states (shown in Table 2) is available. Table 2 lists the state xi for each component and the corresponding system state y; if a continuity criterion is adopted. There are p ¼ 4 operating states and n ¼ 4 failed states; thus, the expected information content Ið4; 4Þ assumes the value: 4 4 4 4 log log Ið4; 4Þ ¼ 2 2 ¼1 4þ4 4þ4 4þ4 4þ4 For x1 ¼ 0 there are one system operating state and two system failed states, whereas for x1 ¼ 1 there are three system operating states and two system failed states. Then: Ið1; 2Þ ¼ 0:918296;
Ið3; 2Þ ¼ 0:970951
Eðx1 ; 4; 4Þ ¼ 3=8·Ið1; 2Þ þ 5=8·Ið3; 2Þ ¼ 0:951205 Gainðx1 ; 4; 4Þ ¼ Ið4; 4Þ 2 Eðx1 ; 4; 4Þ ¼ 1 2 0:951205 ¼ 0:048795 Now, for x2 ¼ 0 and x3 ¼ 0 there are one system operating state and three system failed states, whereas for x2 ¼ 1 and x3 ¼ 1 there are three system operating states and one
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
system failed state. Consequently, we have: Ið1; 3Þ ¼ Ið3; 1Þ ¼ 0:811278 Eðx2 ;4; 4Þ ¼ Eðx3 ; 4;4Þ ¼ 4=8·Ið1; 3Þ þ 4=8·Ið3; 1Þ ¼ 0:811278 Gainðx2 ; 4;4Þ ¼ Gainðx3 ; 4;4Þ ¼ 1 2 0:811278 ¼ 0:188722 Finally, the last component x4 shows only one system failed state for x4 ¼ 0; whereas for x4 ¼ 1 we have four system operating states and three system failed states. Thus, we obtain: Ið0; 1Þ ¼ 0;
Ið4;3Þ ¼ 0:985228
Eðx4 ;4; 4Þ ¼ 1=8·Ið0;1Þ þ 7=8·Ið4; 3Þ ¼ 0:862075 Gainðx4 ; 4;4Þ ¼ 1 2 0:862075 ¼ 0:137925 Both x2 and x3 score the maximum information gain and therefore must be considered for the first choice; suppose that the second component is selected as the root node. It is worth noting that, in general, the component chosen to split the first root node holds a primary importance [16]. After this selection the training set in Table 2 is subdivided into the following two subsets: x1
x2
x3
x4
y ¼ EFðxÞ
0 0 1 1
0 0 0 0
0 1 0 1
1 1 1 1
0 0 0 1
x1
x2
x3
x4
y ¼ EFðxÞ
0 1 1 1
1 1 1 1
1 0 0 1
1 0 1 1
1 0 1 1
. The first one contains the examples with x2 ¼ 0 and the second one those with x2 ¼ 1: If we repeat the previous analysis for the first subset, having one system operating state and three failed states, we obtain Gainðx1 ; 1; 3Þ ¼ Gainðx3 ; 3; 1Þ ¼ 0:311278 and Gainðx4 ; 3; 1Þ ¼ 0; which leads to an equivalent choice for x1 and x3 : A similar procedure allows to choose the component x4 for the second subset and to produce, after a further splitting, the decision tree in Fig. 2. A direct inspection allows generating the following set of rules if x2 ¼ 1 AND x4 ¼ 0 then y ¼ 0 else if x1 ¼ 0 AND x2 ¼ 0 then y ¼ 0 else if x1 ¼ 1 AND x2 ¼ 0 AND x3 ¼ 0 then y ¼ 0 else y ¼ 1
305
Although this set of rules perfectly matches all the system configurations in Table 2, it is not equivalent to the system function SFðxÞ ¼ x1 x3 þ x2 x4 that generated Table 2. This is due to the lack of information deriving from the absence of eight feasible system configurations shown in Table 1 and not included in the training set. 3.3. Shrinking the tree The information measures previously described are estimated on the basis of the available training data and the splits are evaluated using set of examples with smaller and smaller size. However, estimations based on small samples will not produce good results for unseen cases, thus leading to models with poor predictive accuracy. This is usually known as overfitting problem [11]. On the other hand, according to the Occam’s Razor principle, small decision trees consistent with the training data tend to perform better than large trees [23]. The most straightforward method to take into account these considerations is to prune branches off the decision tree. Available pruning methods can be divided into two main groups: (1) pre-pruning methods that stop building the tree when some criteria is satisfied, and (2) post-pruning methods that build a full tree and prune it back. All the techniques take in consideration the size of the tree and an estimate of the error; for implementation details, the reader can refer to [22]. Looking at the example developed in the previous subsection, it appears evident that neither nodes nor branches can be removed from the final decision tree in Fig. 2 without scoring errors in the examples of the training set (Table 2). Since the DT previously generated perfectly matches all the system configurations in Table 2, the pruning phase has no effect in this case.
4. Hamming clustering As previously observed, the SF of a network can be written as a Boolean function or, equivalently, as a sum-ofproduct expression; thus, at least in principle, any method for the synthesis of digital circuits is able to retrieve the desired SF from a sufficiently large training set. Unfortunately, classical methods for Boolean function reconstruction do not care about the output assigned to a case not belonging to the given training set. Their target is to obtain the simplest sum-of-product that correctly classifies all the examples provided. Better results can be obtained by adopting a new logical synthesis technique, called Hamming clustering (HC) [10, 18], which is able to achieve performances comparable to those of best classification methods, in terms of both efficiency and efficacy. The Boolean function generated by HC can then be directly converted into a set of intelligible rules underlying the problem at hand.
306
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
It can be easily seen that every system state x can be associated with a binary string with length N; it is sufficient to write the component states in the same order as they appear within the vector x. In this way the system state x ¼ ð0; 1; 1; 0; 1Þ for N ¼ 5 will correspond to the binary string ‘01101’. As usual, a simple metric, called Hamming distance, can be introduced in the space of binary strings having the same length N; it is defined as the number dH ðx; zÞ of different bits in the two strings x and z: dH ðx; zÞ ¼
N X
lxi 2 zi l
i¼1
According to this definition, if x ¼ ‘01101’ and z ¼ ‘11001’ we obtain dH ðx; zÞ ¼ 2: HC proceeds by grouping together binary strings that belong to the same class and are close to each other according to the Hamming distance. A basic concept in the procedure followed by HC is the notion of cluster, sharing the same definition of implicant in classic theory of logical synthesis. A cluster is the collection of all the binary strings having the same values in a fixed subset of components; as an example, the four binary strings ‘01001’, ‘01101’, ‘11001’, ‘11101’ form a cluster since all of them only have the values 1, 0, and 1 in the second, the fourth and the fifth component, respectively. This cluster is usually written as ‘ p 1 p 01’, by placing a don’t care symbol ‘ p ’ in the positions that are not fixed, and it is said that the cluster ‘ p 1 p 01’ covers the four binary strings above. Every cluster can be associated with a logical product among the components of x, which gives output 1 for all and only the binary string covered by that cluster. For example, the cluster ‘ p 1 p 01’ corresponds to the logical product x2 x 4 x5 ; being x 4 the complement of the component x4 : The desired Boolean function can then be constructed by generating a valid collection of clusters for the binary strings belonging to a selected class. This collection is consistent if none of its elements includes binary strings of the training set associated with the opposite class. The procedure employed by HC consists of the following four steps: 1. Choose at random an example ðx; yÞ in the training set. 2. Build a cluster of points including x and associate that cluster with the class y: 3. Remove the example ðx; yÞ from the training set. If the construction is not complete, go to Step 1. 4. Simplify the set of clusters generated and build the corresponding Boolean function. 4.1. Building clusters Once the example ðx; yÞ in the training set has been randomly chosen at Step 1, a cluster of points including x is to be generated and associated with the class y: The only prescription to be satisfied in constructing this cluster is that
it cannot cover binary strings belonging to examples of the training set having the opposite class 1 2 y: As suggested by the Occam’s Razor principle, smaller sum-of-product expressions for the Boolean function to be retrieved perform better; this leads to prefer clusters that cover as many as possible training examples (belonging to class y) and containing more don’t care symbols ‘ p ’ inside them. However, searching for the optimal cluster in this sense leads to an NP-hard problem; consequently, greedy alternatives must be employed to avoid excessive computation times. One possible choice is to apply the Maximum covering Cube (MC) criterion [10], which sequentially introduces a don’t care symbol in the position that reduces the Hamming distance from the highest number of training examples belonging to class y; while avoiding to cover training examples associated with the opposite class. Several trials on artificial and real-world classification problems have established the good properties of the MC criterion. As an example, consider the network in Fig. 1, whose training set is shown in Table 2. Suppose the binary string ‘1101’ belonging to class y ¼ 1 is selected at Step 1 of HC; as one can readily observe both the first and third bit can be substituted by a don’t care symbol without generating a cluster that conflicts with the examples belonging to the opposite class. This substitution leads to the cluster ‘ p 1 p 1’, which is associated with the logical product x2 x4 : By repeating the same consideration for the binary string ‘0111’, the cluster ‘ p 11 p ’ and the logical product x2 x3 is produced. Finally, the analysis of ‘1011’ and ‘1111’ allows to produce a third cluster ‘1 p 1 p ’ associated with the And operation x1 x3 : 4.2. Pruning the set of clusters Usually, the repeated execution of Steps 2 – 3 leads to a redundant set of clusters, whose simplification can improve the prediction accuracy of the corresponding Boolean function. In analogy with methods for decision trees, the techniques employed to reduce the complexity of the resulting sum-of-product expressions are frequently called pruning algorithms. The simplest effective way of simplifying the set of clusters produced by HC is to apply the minimal pruning [10]: According to this greedy technique the clusters that cover the maximum number of examples in the training set are extracted one at a time. At each extraction, only the examples not included in the clusters already selected are considered. Breaks are tied by examining the whole covering. The application of minimal pruning to the example analyzed in Section 4.1 begins with the computation of the covering associated with each of three clusters generated in the training phase. It can be readily observed that ‘ p 1 p 1’ covers three examples of Table 2 (precisely the binary strings ‘0111’, ‘1101’ and ‘1111’), whereas the covering of ‘ p 11 p ’ and ‘1 p 1 p ’ is equal to 2. Consequently, the cluster ‘ p 1 p 1’ is firstly selected.
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
307
Table 3 Performance indices for the training and the test phase in example 5.1 Index
Sensitivity Specificity Accuracy Rules a
DT
HC
Training (%)
Testing (%)
Training (%)
Testing (%)
99.15 99.15 99.15 45
96.35 95.58 96.00
100 100 100 16a
99.78 100 99.90
Not disjoint.
where Fig. 3. Network for examples 5.1 and 5.2 [27].
After this choice only the binary string ‘1011’ remains uncovered, which leads to the selection of the last cluster ‘1 p 1 p ’ (since ‘ p 11 p ’ does not cover ‘1011’). The final expression for the system function is therefore SFðxÞ ¼ x1 x3 þ x2 x4 ; i.e. the correct SF for the network in Fig. 1.
5. Application to an example To evaluate the quality of the ARE produced by the classification methods presented in the previous sections the network shown in Fig. 3 has been considered [27]. It is assumed that each link has reliability ri and capacity of 100 units. The goal is to obtain an ARE that evaluates the reliability between the source node s and the terminal node t: In order to apply a classification method, such as DT or HC, to generate an ARE, it is first necessary to collect a set of examples ðx; yÞ; where y ¼ EFðxÞ; to be used in the training phase and in the subsequent performance evaluation of the resulting set of rules. To this aim, NT þ NE system states have been randomly selected without replacement and for each of them the evaluation of the corresponding value of the EF is performed. The first NT examples are then used to form the training set, whereas the remaining NE are employed to evaluate performance, according to the standard measure of sensitivity, specificity and accuracy [26]: sensitivity ¼ accuracy ¼
TP ; TP þ FN
specificity ¼
TP þ TN TP þ TN þ FP þ FN
TN ; TN þ FP
† TP (resp. TN) is the number of examples belonging to the class y ¼ 1 (resp. y ¼ 0) for which the classifier gives the correct output, † FP (resp. FN) is the number of examples belonging to the class y ¼ 1 (resp. y ¼ 0) for which the classifier gives the wrong output. For reliability evaluation, sensitivity gives the percentage of correctly classified operational states and the specificity the percentage of correctly classified failed states. As suggested by Torgo [11], the proportion NT ¼ 2NE has been chosen. Two cases are analysed: In the first one only connectivity is checked to assess if a selected state x corresponds to an operating or to a failed state; thus, the EF is given by a depth-first procedure [3,4]. In the second case, a system failure occurs when the flow at the terminal node t falls below a specified threshold: Consequently, a max-flow mincut algorithm is used to establish the value of the EF [3,4]. 5.1. Connectivity evaluation The RE was obtained using the NEtwork REliability Assessment (NEREA) software [28] and a depth-first procedure [3,4]. The software determines the minimal path sets (18 paths in this network) and then generates the RE. In this case, the RE consists of 105 terms. The state space (221 possible states) is randomly sampled and a data set with 3000 different ðx; yÞ pairs is generated. The first NT ¼ 2000 pairs are used for training the classifier and the remaining NE ¼ 1000 examples to test its behaviour. Table 3 shows the performance indices during the training and the test phase for DT and HC, and the number of rules generated for the operating system state y ¼ 1: DT produces rules for both operating and failed system states at the same time, while the execution of HC provides rules for only one class. It is interesting to note that the 16
Table 4 System function generated by HC for example 5.1: every logical product corresponds to a minimum path SFðxÞ ¼ x15 x17 x21 þ x7 x8 x19 þ x8 x10 x20 þ x8 x12 x14 x21 þ x7 x8 x9 x20 þ x8 x10 x13 x21 þ x8 x15 x16 x21 þ x8 x11 x12 x20 þ x1 x2 x4 x18 þ x2 x4 x6 x19 þ x7 x8 x9 x13 x21 þ x8 x11 x12 x13 x21 þ x2 x4 x6 x9 x20 þ x1 x2 x4 x5 x19 þ x1 x2 x3 x7 x8 x18 þ x2 x4 x6 x9 x13 x21
308
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
Table 5 ARE generated by DT for example 5.1 P8 P10 P20 þP3 Q7 P8 P9 P12 Q15 P19 Q20 P21 þQ7 P8 Q9 P12 Q15 P19 Q20 P21 þQ7 P8 Q12 Q15 P17 P19 Q20 P21 þP8 P10 P13 Q15 Q19 Q20 P21 þP8 Q10 P11 P12 P13 Q15 Q19 Q20 P21 þP7 P8 Q10 P11 Q12 P13 Q15 Q19 Q20 P21 þP8 P12 Q13 P14 Q15 Q19 Q20 P21 þP7 P8 P19 Q20 Q21 þP1 P2 P4 Q7 P8 P19 Q20 Q21 þQ1 P2 P4 P6 Q7 P8 P19 Q20 Q21 þP1 P2 P4 P8 P18 Q19 Q20 Q21 þP1 P2 Q4 P7 P8 P18 Q19 Q20 Q21 þQ8 P15 P17 P21 þP1 P2 P4 P5 Q8 P13 P15 P16 P17 Q21 þP1 P2 P4 Q8 Q13 P15 P16 P17 Q21 þP1 P2 P4 Q8 P15 Q16 P17 P19 Q21 þP1 P2 P4 Q8 Q15 P17 P18 þP1 P2 P4 P5 Q8 Q15 P17 Q18 P19 þQ1 P2 P4 P6 Q8 Q15 P17 P19 þQ1 P2 P4 P6 Q8 Q15 P17 Q19 P20 þP1 P2 P4 Q8 Q17 P18 þP1 P2 P4 P5 Q8 Q17 Q18 P19 þP1 P2 Q3 P4 Q5 Q8 Q17 Q18 P19 þQ1 P2 P4 P6 Q8 Q17 P19 þQ1 P2 P4 P6 Q8 P9 P11 Q17 Q19 þQ1 P2 P4 P6 Q8 P9 Q11 P13 Q17 Q19 þP7 P8 Q15 P19 Q20 P21 þP1 P2 P7 P8 Q9 Q10 Q12 Q15 Q19 P20 þP2 P4 Q7 P8 P9 Q10 Q12 P20 Q21 þQ7 P8 Q10 Q12 P15 Q16 P17 P20 P21 þP7 P8 Q9 Q10 P12 Q15 Q19 P20 þQ7 P8 Q10 Q12 P15 P16 P20 P21 þP7 P8 Q9 Q10 P15 Q19 P20 þP7 P8 Q9 Q10 P19 P20 þP7 P8 P9 Q10 P20 þQ7 P8 Q10 P11 P12 P20 þP8 P15 P17 Q20 P21 þP8 P15 P16 Q17 Q20 P21 þP8 P14 P15 Q16 Q17 Q20 P21 þQ7 P8 Q10 Q11 P12 P14 P20 P21 þP7 P8 Q14 P15 Q16 Q17 P19 Q20 P21 þP5 Q7 P8 Q10 Q11 P12 Q14 P20 P21 þQ7 P8 P11 Q14 P15 Q16 Q17 Q20 P21
Table 6 Network reliability results obtained for example 5.1 by using RE and AREs ri
RE
ARE-DT
Relative error (%)
ARE-HC
Relative error (%)
0.7 0.8 0.9
0.85166 0.95362 0.99407
0.83747 0.94559 0.99254
1.666 0.841 0.154
0.851251 0.953519 0.994074
0.048 0.010 0.0002
rules extracted by HC for the operating system state correspond to minimal paths; they give rise to the sum-ofproduct expression for the SF shown in Table 4. DT rules are in disjoint form, so the ARE can be easily determined (Table 5). On the other hand, rules generated by HC are not disjoint, so an additional procedure has to be used to perform this task, such as the algorithm KDH88 [29]. In this way, the 16 non-disjoint rules extracted by HC are converted into 80 disjoint logical products. It is interesting to note that if the procedure HC is trained for the class of failed system states, it generates rules that could correspond to minimal cuts. In this example, 49 rules are produced and 26 of them correspond to minimal cuts. Once DT and HC are trained, their resulting ARE is used to evaluate the network reliability for different values ri of component reliability. Table 6 shows the network reliability evaluated using the correct RE and the ARE obtained by both models; the relative errors are also included for completeness. Both models produce excellent results, but HC errors are significantly lower. Table 7 Performance indices for the training and the test phase in example 5.2 Index
Sensitivity Specificity Accuracy Rules
DT
5.2. Flow evaluation In this case the network is considered in the operating state if at least a 200 units flow can be transmitted between the source node s and the terminal node t: The RE is obtained using the NEREA software [28], a composite paths procedure [2,8] and a max-flow min-cut algorithm [3,4]. In this case, NEREA produces 43 valid paths and the RE consists of 101 terms. Through a random sampling of the state space a data set with 7500 different example is generated, being NT ¼ 5000 and NE ¼ 2500: Table 7 shows the performance indices obtained during the training and the test phase, together with the number of rules generated. The HC procedure trained for the operating system state y ¼ 1 produces 21 minimal paths, while the set of rules generated for the failed system state includes 39 minimal cuts. Again, Table 8 shows the network reliability evaluated using the correct RE and the ARE obtained by both models, together with the corresponding relative errors. As in the previous example, HC errors are lower.
6. Conclusions This paper has evaluated the performances of two machine learning techniques (DT and HC) in generating the approximated reliability expression of a network. The excellent results obtained in the experiments show the potential of both methods in the evaluation of the reliability of a system through an approximation of the reliability expression. Table 8 Network reliability results obtained for example 5.2 by using RE and AREs
HC
Training (%)
Testing (%)
Training (%)
Testing (%)
97.28 99.83 99.70 37
85.88 99.44 98.48
100 100 100 21a
96.97 99.27 99.12
ri
RE
ARE-DT
Relative error (%)
ARE-HC
Relative error (%)
0.7 0.8 0.9
0.40433 0.66843 0.90190
0.39947 0.65932 0.89821
1.20 1.36 0.41
0.40604 0.66883 0.90190
20.42 20.06 0.00
C.M. Rocco S, M. Muselli / Reliability Engineering and System Safety 83 (2004) 301–309
In the example presented the approximated RE, built from a very small fraction of the total state space, produces very close reliability estimates: HC models outperform the corresponding DT models. Rules extracted using DT are in disjoint form, which allows to easily obtain the corresponding ARE. On the other hand, rules generated by HC need to be converted in disjoint form, so as to produce the desired ARE. However, the HC procedure provides information about minimum paths and cuts.
[16]
References
[17]
[1] Colbourn Ch. The combinatorics of network reliability. New York: Oxford University Press; 1987. [2] Aggarwal KK, Chopra YC, Bajwa JS. Capacity consideration in reliability analysis of communication systems. IEEE Trans Reliab 1982;31(2). [3] Reingold E, Nievergelt J, Deo N. Combinatorial algorithms: theory and practice. New Jersey: Prentice Hall; 1977. [4] Papadimitriou CH, Steiglitz K. Combinatorial optimisation: algorithms and complexity. New Jersey: Prentice Hall; 1982. [5] Rauzy A, Chaˆtelet, Dutuit Y, Be´renguer C. A practical comparison of methods to assess sum-of-products. Reliab Engng Syst Safety 2003; 79/1:33–42. [6] Chaturvedi SK, Misra KB. An efficient multi-variable algorithm for reliability evaluation of complex systems using path sets. Int J Reliab Quality Safety Engng 2002;3(3):237–59. [7] Rueger WJ. Reliability analysis of networks with capacity-constraints and failures at branches and nodes. IEEE Trans Reliab 1986;R-19: 523–8. [8] Rai S, Soh S. A computer approach for reliability evaluation of telecommunication networks with heterogeneous link-capacities. IEEE Trans Reliab 1991;40(4). [9] Billinton R, Li W. Reliability assessment of electric power system using Monte Carlo methods. New York: Plenum Press; 1994. [10] Muselli M, Liberati D. Binary rule generation via Hamming clustering. IEEE Trans Knowledge Data Engng 2002;14:1258–68. [11] Torgo LF. Inductive learning of tree-based regression models, PhD Thesis, Faculdade de Cieˆncias da Universidade do Porto; 1999 [12] Hatziargyriou N, Papathanassiou S, Papadopulos M. Decision trees for fast security assessment of autonomous power systems with large
[13] [14]
[15]
[18] [19] [20]
[21]
[22] [23] [24] [25] [26]
[27] [28]
[29]
309
penetration from renewables. IEEE Trans Energy Conversion 1995; 10(2). Zhou Z, Chen Z. Hybrid decision tree. Knowledge-Based Syst 2002; 115(8):515 –28. Aha DW, Breslow LA. Comparing simplification procedures for decision trees on an economics classification task. Naval Research Laboratory, Technical Report NRL/FR/5510-98-9881; 1998. ´ beda EF, Go´mez T. Enhancing optimal transPeco J, Sa´nchez-U mission or sub-transmission planning by using decision trees. Paper BPT99-304-16 IEEE Power Tech’99 Conference, Budapest, Hungary; August 29, September 2, 1999. Bevilacqua M, Braglia M, Montanari R. The classification and regression tree approach to pump failure rate analysis. Reliab Engng Syst Safety 2002;79/1:59–67. Rocco CM. A rule induction approach to improve Monte Carlo system reliability assessment. Reliab Engng Syst Safety 2003;82/1:87– 94. Muselli M, Liberati D. Training digital circuits with hamming clustering. IEEE Trans Circuits Syst 2000;47:513–27. Billinton R, Allan R. Reliability evaluation of engineering systems, concepts and techniques, 2nd ed. New York: Plenum Press; 1992. Rocco CM, Moreno JM. Reliability evaluation using monte carlo simulation and support vector machine, Lecture Notes in Computer Science, vol. 2329, Springer; 2002. Cappelli C, Mola F, Siciliano R. A statistical approach to growing a reliable honest tree. Comput Statistics Data Anal 2002;38: 285–99. Portela da Gama JM. Combining classification algorithms. PhD Thesis, Faculdade de Cieˆncias da Universidade do Porto; 1999 Mock K. Lecture Notes on Machine Learning, www.math.uaa.alaska. edu1/~afkjm/ Murthy S, Kasif S, Salzberg S. A system for induction of oblique decision tree. J Artif Intell Res 1994;2:1–32. Quinlan JR. C4.5: programs for machine learning. Los Altos, CA: Morgan Kaufmann; 1993. Veropoulos K, Campbell C, Cristianini N. Controlling the sensitivity of support vector machines. Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, (IJCAI99), Workshop ML3; 1999. p. 55–60. Yoo YB, Deo N. A comparison of algorithm for terminal-pair reliability. IEEE Trans Reliab 1988;37(2). Martı´nez L. NEREA: a network reliability assessment software. MSc Thesis, Facultad de Ingenierı´a, Universidad Central de Venezuela; June 2002 (in spanish) Heidtmann KD. Smaller sums of disjoint products by subproducts inversion. IEEE Trans Reliab 1989;38(4).