Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection

Expert Systems with Applications 39 (2012) 13214–13224 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal h...

Download PDF

1MB Sizes 0 Downloads 1 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 39 (2012) 13214–13224

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection Corneliu T.C. Arsene ⇑, Bogdan Gabrys 1, David Al-Dabass 2 School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom

a r t i c l e

i n f o

Keywords: Decision support system Operational control of water distribution systems Loop corrective ﬂows equations Modeling and simulation Neural network Graph theory

a b s t r a c t This paper presents an efﬁcient and effective decision support system (DSS) for operational monitoring and control of water distribution systems based on a three layer General Fuzzy Min–Max Neural Network (GFMMNN) and graph theory. The operational monitoring and control involves detection of pipe leakages. The training data for the GFMMNN is obtained through simulation of leakages in a water network for a 24 h operational period. The training data generation scheme includes a simulator algorithm based on loop corrective ﬂows equations, a Least Squares (LS) loop ﬂows state estimator and a Conﬁdence Limit Analysis (CLA) algorithm for uncertainty quantiﬁcation entitled Error Maximization (EM) algorithm. These three numerical algorithms for modeling and simulation of water networks are based on loop corrective ﬂows equations and graph theory. It is shown that the detection of leakages based on the training and testing of the GFMMNN with patterns of variation of nodal consumptions with or without conﬁdence limits produces better recognition rates in comparison to the training based on patterns of nodal heads and pipe ﬂows state estimates with or without conﬁdence limits. It produces also comparable recognition rates to the original recognition system trained with patterns of data obtained with the LS nodal heads state estimator while being computationally superior by requiring a single architecture of the GFMMNN type and using a small number of pattern recognition hyperbox fuzzy sets built by the same GFMMNN architecture. In this case the GFMMNN relies on the ability of the LS loop ﬂows state estimator of making full use of the pressure/nodal heads measurements existent in a water network. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction Two broad categories of faults occurring in water distribution systems are considered in this work. The faults because of malfunctioning of transducers and telecommunication equipment which are referred to as the measurement errors. And the faults due to leakages and wrong status of valves, invalidating the system model used in the state estimation which are referred to as the topological errors (Arsene & Bargiela, 2001; Gabrys, 1997; Gabrys & Bargiela, 1999; Arsene, 2004; Arsene, 2011; Carpentier & Cohen, 1993). This paper addresses the topological errors introduced by a leakage in a pipe while the recognition of the wrong status of a

⇑ Corresponding author. Address: School of Computing and Informatics, Nottingham Trent University, Nottingham NG11 8NS, UK. Tel.: +44 (0)1202 965298; fax: +44 (0) 1202 965314. E-mail addresses: [email protected] (C.T.C. Arsene), [email protected] (B. Gabrys), [email protected] (D. Al-Dabass). 1 Computational Intelligence Research Group School of Design, Engineering & Computing Bournemouth University, Poole House Talbot Campus, Fern Barrow Poole, BH12 5BB, United Kingdom Tel.: +44 (0)1202 965298; fax: +44 (0) 1202 965314. 2 School of Computing and Informatics, Nottingham Trent University, Clifton lane, Nottingham NG11 8NS, UK. 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.05.080

valve can be dealt with in a similar way by using the concepts described herein. It is obvious that in the absence of accurate real measurements, the topological errors not only pose a much greater danger to the safety of water network operation but also are more difﬁcult to locate and eradicate even when reliable and efﬁcient state estimators are available. Depending on the topology of the distribution network and the state estimator used (Gabrys, 1997; Arsene, 2004; Arsene, Bargiela, & Al-Dabass, 2004a; Arsene, Bargiela, & Al-Dabass, 2004b), the topological class of errors form characteristic patterns that can be utilized to classify the state of the water network. The classiﬁcation of the state of the water network it has been investigated (Gabrys, 1997; Gabrys & Bargiela, 1999; Gabrys & Bargiela, 2000) in the context of the Least Squares (LS) state estimator based on the nodal heads equations (Gabrys, 1997; Gabrys & Bargiela, 1999). The respective approach for diagnosis of leakages and other operational faults occurring in water networks was based on the examination of patterns of state estimates (i.e. nodal pressures, pipe ﬂows) or LS nodal heads residuals by a General Fuzzy Min–Max Neural Network (GFMMNN) (Gabrys & Bargiela, 1999; Gabrys & Bargiela, 2000). It was shown that both the LS nodal heads state estimates with their conﬁdence limits and the LS nodal heads residuals with their conﬁdence limits can

13215

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

be successfully used to train the GFMMNN recognition system (Arsene & Bargiela, 2001; Arsene, 2004; Belsito, Lombardi, Andreussi, & Banerjee, 1998). This paper presents the application of the GFMMNN to the classiﬁcation of the state of the water distribution system based on patterns of data obtained with a LS loop ﬂows state estimator (Arsene and Bargiela, 2001; Arsene et al., 2004a; Arsene et al., 2004b) and Conﬁdence Limits Analysis (CLA) implemented with the same state estimator (Arsene, 2004; Arsene, 2011; Arsene et al., 2011). The investigation has two aims: ﬁrst, to build an effective and efﬁcient Decision Support System (DSS) (Bargiela et al., 2002) for fault detection and identiﬁcation in water networks by using (a) the LS loop ﬂows state estimator, (b) the CLA algorithm based on the same LS loop ﬂows state estimator and (c) the GFMMNN system (Bargiela et al., 2002). The second aim is to compare the novel DSS with the initial system described in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) and based on the LS nodal heads equations. Recently in other works it was tackled the fault detection and identiﬁcation in water networks by using machine learning technique such as various other types of neural networks (Caputo and Pelagagge, 2003; Shinozuka et al., 2005; Belsito et al., 1998; Feng and Zhang, 2006; Izquierdo et al., 2007) or Support Vector Machine (SVM) (Mashford et al., 2009). However, the use of the same pattern recognition architecture, such as the GFMMNN, with different simulation, state estimation and CLA algorithms is very scarce.

where d are the nodal demands and Anp is the topological incidence matrix. The topological incidence matrix Anp has a row for every node and a column for every pipe of the water network. The entries for each row +1 and 1 indicate that the ﬂow in a pipe enters or leaves the node (Arsene et al., 2004a; Arsene et al., 2004b). The energy equation is solved with the Newton–Raphson method and states that sum of the pipe head losses around each loop equals 0:

Mlp h ¼ 0

ð2Þ

where h represents the pipes head losses calculated by the Hazen– Williams equation (Epp & Fowler, 1970) and Mlp is the loop incidence matrix. The loop incidence matrix has the property that the entries +1 and 1 corresponds to the ﬂow in a pipe being for example clockwise in a loop or anti-clockwise in the same loop, while 0 means that a pipe does not belong to a loop (Arsene et al., 2004a; Arsene et al., 2004b). The loop corrective ﬂows D Ql at the step k + 1 of the Newton– Raphson iteration method which solves (2) are:

"

DQ lkþ1 ¼ DQ lk

@ DH @ DQ l k

#1

DH

ð3Þ

where DH are the residual loop head losses (i.e. DH = Mlph). The Jacobian matrix @@DDQH in (3) can be expressed as:

J ¼ Mlp AMpl

Three main numerical algorithms are used in this paper for the generation of the training and testing data of the GFMMNN recognition system: a simulator algorithm, a state estimator and a CLA algorithm. The GFMMNN pattern recognition system was developed initially in the context of the simulation and state estimation of water network based on the nodal heads equations, the Newton–Raphson numerical technique (Jeppson & Davis, 1976) and the LS optimization criterion. 2.1. Simulator algorithm Modeling and simulation of water distribution system consists of two main ingredients: the set of independent equations that describe the water network and the numerical optimization method used to calculate the nodal heads and the pipe ﬂows. In Fig. 1 is shown a water network where the edges are the pipes that distribute the water to the consumers which are represented by the nodes (e.g. 1, 2, 3, etc.). A simulator algorithm is deﬁned as a solution of the water network equations for a given set of nodal demands. The nodes represented with a square in the ﬁgure below are nodes with ﬁxed head/pressure. The simulator algorithm used here is based on the loop corrective ﬂow algorithm deﬁned for a water distribution system with n-nodes, l-loops, and p-pipes. The continuity equation must be satisﬁed, that is the ﬂow entering a node equals the nodal consumption plus the ﬂow exiting the respective node. Therefore, an initial pipe ﬂows solution Qi that satisﬁes the continuity equation is calculated as:

2 ΔQl1

4

ð1Þ

lk

2. Numerical algorithms

1

Anp Q i ¼ d

3 ΔQl2

5

ΔQl3

6

Fig. 1. Example of water network.

7

ð4Þ

where Mpl is the transpose of loop incidence matrix and A is a diagonal matrix with a special property.

0 B A¼B @

us1 jQ 1 ju1

0...

0

0...

us2 jQ 2 ju1

0

0

0...

usp jQ p ju1

1 C C A

ð5Þ

where s1,2,. . .p is the pipe head loss coefﬁcient and u is the exponent in the Hazen–Williams equation (Arsene, 2011; Carpentier & Cohen, 1993; Arsene et al., 2004a; Arsene et al., 2004b; Kumar, Narasimhan, & Bhadllamudi, 2008; Jeppson & Davis, 1976). e for each pipe is: The ﬁnal pipe ﬂow solution Q

e ¼ Q þ M T DQ Q i l lp

ð6Þ

e are the ﬁnal pipe ﬂows calculated at the end of the Newwhere Q ton–Raphson method (Arsene et al., 2004a; Arsene et al., 2004b; Jeppson & Davis, 1976). The loop simulator requires the computation of the loop incidence matrix Mlp and the initial pipe ﬂows Qi. This problem is in general based on the decomposition of the water network into a spanning tree (e.g. Fig. 5) starting from a node which becomes the main root node with ﬁxed value pressure. A spanning tree contains all the vertices and the edges of a connected and undirected graph except for the edges which form the cycles (i.e. loops) of the graph (Arsene et al., 2004a; Arsene et al., 2004b). Different search strategies can be employed in order to search the water network. The Depth First (DF) search from the graph theory is one of the possible choices for ﬁnding the loops in a water network. The DF search has the property that always a pipe that does not belong to the spanning tree called a chord pipe or a co-tree pipe, connects a node with one of its predecessor in the tree. Based on the spanning tree, the topological incidence matrix Anp can be split in a tree T(nn) incidence matrix which deﬁnes the incidence of the tree pipes, which are the pipes situated in the spanning tree, and a co-tree incidence matrix C(nl) which contain the co-tree pipes that are not in the spanning tree and form the loops (i.e. Anp = [T

13216

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

C]). In this case the loop corrective ﬂows can become known as the co-tree pipe ﬂows (Andersen and Powell, 1999; Epp and Fowler, 1970; Rahal, 1995). The handling of the non-linear controlling hydraulics elements such as pumps or Pressure Reducing Valves (PRV) is done in the water network simulations shown in this paper based on algorithms described in Kumar et al. (2008), Andersen and Powell (1999), Arsene, Al-Dabass, and Hartley (2012). 2.2. State estimation An additional set of independent variables is introduced, the variation of nodal demandsD d, in which case the pipe ﬂows are written with respect to the loop corrective ﬂows DQl and the variation of nodal demands:

e ¼ Q A Dd þ Mpl DQ Q i l

ð7Þ

e are the pipe ﬂows in tree and co-tree pipes, Q ¼ Q T i are where Q i 0ln the initial ﬂows in tree pipes while the initial ﬂows in the co-tree pipes are zero (i.e. 0ln zero vector of size (ln)), and matrix A⁄ is 1 the matrix with the property A ¼ T . 0ln There are two sets of equations which are used to describe the hydraulics of the water network. The ﬁrst set of equations states that the loop head losses around the loops equal to zero:

DHðDQ l ; DdÞ ¼ 0

ð8Þ

where the loop head losses residuals DH are function of the loop corrective ﬂows DQl and the variation of nodal demands Dd. The second set of equations states that the total amount of inﬂow/outﬂow from the water network carried out through the ﬁxed-head nodes (i.e. nodes with ﬁxed pressure) equal the variation of nodal demands:

Dd ¼ Bnl DQ l

ð9Þ

The matrix Bnl(n l) from previous equation has a non-zero element equal to 1 which corresponds to the main root node and 1 for each of the ﬁxed-head nodes. Pseudo-loops are added between the main root node and each of the ﬁxed head node and a loop corrective ﬂow is considered for each such pseudo-loop. Eqs. (8) and (9) represent the hydraulic function that describes the water network. It can be written as a system of equations:

DHðDQ l ; DdÞ ¼ 0 Bnl DQ l Dd ¼ 0

ð10Þ

The system of equations can be augmented with the equations corresponding to real pressure and pipe ﬂow measurements (Arsene, 2004; Arsene, 2011). It can be shown that for the system of equations (10) the Hessian matrix is semipositive deﬁnite (Arsene, 2004; Arsene, 2011) and it has a global minimum point. The system of equations from above (10) can be solved by using the Newton–Raphson iterative method and minimizing the LS criterion. The Jacobian matrix of the system of equations is

3 @ DH @ DH 7 6 @ Dd DQ l 7 J¼6 4 @ðDdÞ @ðDQ l Þ 5 Bnl @ Dd DQ l 2

ð11Þ

It is possible to calculate the inﬂows/outﬂows in the water network at the end of the Newton–Raphson method by subtracting the variation of the nodal demands at the ﬁxed-head nodes from the loop corrective ﬂows corresponding to the respective pseudoloops, in which case matrix Bnl can be taken out. Following this, the Jacobian matrix becomes:

" J¼

@ DH @ Dd

@ DH @ DQ l

Inn

0

# ð12Þ

where Inn is the (n n) identity matrix. The Jacobian matrix J resembles the one presented in Arsene and Bargiela (2001); Arsene et al. (2004a); Arsene et al. (2004b). 2.3. Conﬁdence Limit Analysis It has been shown that in water networks state estimation for a given set of input data and estimation criterion there is one optimal solution. However due to the inaccuracies in the input data, there are many possible, different combinations of such input data and therefore there are many feasible, different state estimate vectors. As a result, the uncertainty analysis becomes an inevitable part of the water distribution systems since it is very important, from the safety of the system operational control point of view, to know how the inaccuracies can affect the estimated solution. Extensive work on the quantiﬁcation of the inﬂuence of measurements and pseudo-measurements uncertainties in water distribution system has been done (Gabrys, 1997; Gabrys & Bargiela, 1999; Arsene et al., 2011) and is based on the principle of unknown-but-bounded errors for the set of measurements:

z ¼ gðxÞ þ r;

jri j 6 jei j;

i ¼ 1; . . . ; m

ð13Þ

where e is the vector representing the maximum expected measurement errors, z is the measurement vector, g is the hydraulic network function, x are the set of independent variables which in this case are the loop corrective ﬂows and the variation of nodal demands, r is the vector of residuals which can not be accounted by the state estimator and the measurement data, m is the number of measurements. The knowledge of statistical properties of errors is not required and the only restriction imposed was the one of errors falling within a range bounded by e. Several CLA algorithms were proposed but the most successful ones in terms of computational complexity were based on the linearized model of the water network. The linearized model of the water network was used to obtain a sensitivity matrix S. The sensitivity matrix was the pseudo-inverse of the Jacobian matrix calculated for the state estimates which were the nodal heads, the pipe ﬂows and the inﬂows/outﬂows by using the LS nodal heads state estimator. A state estimate was produced on the assumption that the measurement vector zt is correct and the possible error of the measurement set Dz was considered and used together with the sensitivity matrix S in order to predict the resulting error in the state estimates. This approach was facilitated by the use of the nodal heads equations in the LS state estimator. Because of this, the (i,j)th element sij of the pseudo-inverse of the Jacobian matrix relates the sensitivity of the ith element, xi, of the state vector/estimates, xt, to the jth element, zj, of the measurement vector. In the context of the LS loop ﬂows state estimator, the EM method stands that for a Maximum level (M) of the uncertainties/Errors (E) in the input measurement data of the water distribution system, it is obtained suitable conﬁdence limits for the nodal heads and the pipe ﬂows and hence the EM term. This method is suitable to be used only with the LS loop ﬂows state estimator (Arsene, 2004; Arsene, 2011; Arsene et al., 2011). The EM method considers the maximum variability of water consumptions and accuracy of real meters which are forming the state estimated measurement vector ^z which is calculated from the observed state vector ^ x (i.e. nodal heads and pipe ﬂows) which in turn was obtained from the observed measurement vector zo with the LS loop ﬂows state estimator. The state estimated measurement vector ^z is used instead of the observed measurement

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

13217

Fig. 2. CLA based on EM method.

vector zo which is the measurement vector which was initially given to the human operator. Therefore the upper or the lower measurement limits [^zl ^zu ] of the estimated measurement vector ^z which is modiﬁed with the measurement accuracy for water consumptions and real meters, are used again in the LS loop ﬂows state estimator. It results a new state vector x1 which is used for determining the conﬁdence limits on the state variables (nodal heads, in/out ﬂows, pipe ﬂows) with the equation:

xcli ¼ absðx1 ^xÞ

ð14Þ

where xcli is the conﬁdence limit on the i-th state variable, x^ is the state vector obtained for the observed measurement vector zo, x1 is the state vector obtained for the maximum level of errors (i.e. ^zl or ^zu ) in the estimated measurement vector ^z, abs is the absolute value. At Fig. 2 is shown the EM method. Calculating the conﬁdence limits with the EM method, is possible to be implemented only with the LS loop ﬂows state estimator and it does not work with other state estimators which are not based on the loop corrective ﬂows and the variation of nodal demands such as the LS nodal heads state estimator. The LS loop ﬂows state estimator modiﬁes the inﬂows/outﬂows into the ﬁxed-head nodes so that to match the sum of the estimated nodal demands ^z obtained with the LS loop ﬂows state estimator. This means that if the estimated nodal water consumptions and real meters ^z are moved to their lower ^zl or upper limit ^zu then the mass balance of the water network will still be satisﬁed by the in/out ﬂows at the ﬁxed-head nodes which are modiﬁed during the Newton–Raphson optimization method. In this case the ﬁxed-head nodes are part of the measurement data (i.e. nodal heads, pipe ﬂows) and are used to form pseudo-loops together with the main source/root node. The CLA is realized with the EM method (Arsene, 2004; Arsene, 2011; Arsene et al., 2011) which is based on the LS loop ﬂows state estimator. It provides realistic conﬁdence bounds for the nodal heads and the pipe ﬂows obtained with the LS loop ﬂows state estimator. There are also obtained conﬁdence bounds for the variation of nodal consumptions, in effect the state estimated nodal consumptions in vector x1 (Arsene, 2004; Arsene, 2011; Arsene et al., 2011).

Fig. 3. 34-node water network for generation of the training data.

day or days as might be required in order to obtain the representative set of labeled data. It is an accepted practice that, for processes where the physical interference is not recommended or even dangerous, mathematical models and computer simulations are used to predict the consequences of some emergencies so that one might be prepared for quick response. In our case the computer simulations are used to generate data covering 24 h period for the water network depicted at Fig. 3. Such simulations that stretch over longer periods of time are called extended time simulations (Rao, Markel, & Bree, 1977; Rossman, 1994). The reason for choosing the water network from Fig. 3 is that it will be possible to compare the results of training the recognition system with patterns obtained with the LS loop ﬂows state estimator and conﬁdence limits with the results reported in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) for the same water network and operational testing conditions in the context of the LS nodal heads equations. The process of generating the training data is shown in the form of block diagram at Fig. 4. It consists of three major blocks. The ﬁrst

3. Generation of the training data for the GFMMNN While for the well maintained water distribution systems the normal operating state data can be found in abundance the instances of abnormal events are not that readily available. In order to observe the effects of abnormal events in the physical system, it is possible to resort to deliberate closing of valves or opening of hydrants in order to simulate leakages (Carpentier & Cohen, 1993). Although such experiments can be very useful to conﬁrm the agreement between the behavior of the physical system and the mathematical model, it is not feasible to carry out such experiments for all pipes and valves in the system during the whole

Fig. 4. Graphical representation of the training patterns generation scheme.

13218

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

module is the co-tree ﬂows simulator (i.e. loop simulator). The simulator is used as a substitute for the physical water distribution network. It is this module where the leakages are simulated by updating the topology information rather than opening hydrants. In the second module, the LS loop ﬂows state estimation process is carried out for accurate measurements taken from the simulation module but without knowledge of any anomalous event that might happen, as would be the case in the real water network. Between the ﬁrst two modules can be introduced a telemetry module which can simulate the measurement noise because of the transducers from the real water networks. In the third module the conﬁdence limits are found for state estimates (nodal heads, pipe ﬂows) and the variation of nodal demands calculated at the estimation stage. The CLA is realized with the EM technique which provides the conﬁdence bounds for the nodal heads and the pipe ﬂows state estimated as well as the variation of nodal consumptions. Additionally to the state estimates with their conﬁdence limits the system’s status or label of the current pattern is stored. In the physical system simulator, the leakage is modeled as an additional demand lying midway between the two end nodes of a pipe. The additional demand is not modeled as a pressure dependent variable and thus can be set to any desired value. The spanning tree for the 34-node water network is shown at Fig. 5. The main root node is node 30 and a pseudo-loop is added between the ﬁxed head-node 31 and the main source node. The inﬂows to the other ﬁxed-head nodes 27, 28, 29, 32, 33 and 34 are maintained constant. This means that the pumping stations represented by links 32–20, 27–29, 28–4, 33–29, 29–19 and 34–1 are assumed to produce a constant inﬂow and are not affected by leakage. Therefore the inﬂows at the reservoirs 30 and 31 are adjusted during the Newton–Raphson method so that to cover the additional demand resulting from the leakage. Since the lower part of the water network which includes nodes 26, 29, 1, 33, 34 has no bearings on the upper part, only the upper part will be used for fault detection and identiﬁcation (Gabrys, 1997; Gabrys & Bargiela, 1999; Arsene, 2004; Arsene, 2011).

Leakage Labels of nodes and pipes are incremented by one

By systematically working through the network, ten levels of leaks are introduced, one at a time, in every single pipe for every hour of the 24 h period. Since there are 38 pipes multiplied by 10 levels of leakages and plus the normal operating status gives 381 patterns of state estimates for each hour. For a full day this will become a training set of data consisting of 9144 labeled patterns of state estimates computed for accurate measurements and leakages ranging from 2 (l/s) to 29 (l/s). However, since an additional consumption is used in order to simulate the leak, this would require modifying the incidence matrixes and the initial pipe ﬂows for the loop algorithms. Therefore rebuilding the spanning tree for each of the 9144 patterns of data would represent a computational drawback for the training patterns generation scheme shown at Fig. 4. This would be also a disadvantage when compared to the implementation based on the nodal heads equations (Gabrys, 1997; Gabrys & Bargiela, 1999). The simulation of a leakage based on the loop corrective ﬂows equations and graph theory (Arsene, 2004; Arsene et al., 2011; Arsene et al., 2004a; Arsene et al., 2004b) is based on modifying the initial spanning tree built for the normal operating state of the water network so that to account for an additional water consumption that models the leak. Following this, new topological and loop incidence matrixes and initial pipe ﬂows are determined as input information for the simulator algorithm. This procedure avoids the time consuming process of rebuilding the spanning tree. Assuming that Mlp, T and Qi are the loop incidence matrix, the tree incidence matrix and the initial pipe ﬂows obtained from the spanning tree and corresponding to the normal operating status, and d are the initial nodal demands, then the training pattern generation scheme from Fig. 4 is pursued once. The nodal heads and the pipe ﬂows state estimates with the conﬁdence limits, the variation of nodal consumptions with conﬁdence limits and the status of the water network are stored for subsequent utilization in the pattern classiﬁcation module. In the original 34-node water network a leakage is modeled as an additional water consumption for example between nodes 17 and 18 (i.e. using the new label notations in Fig. 5) and then the incidence matrixes and the initial pipe ﬂows are recalculated. The procedure for avoidance of rebuilding of the spanning tree for each of the 9144 patterns works in the following way: the labels for the nodes and pipes that are situated in the spanning tree below the leakage location, are incremented by one so that to preserve an upper form for the tree incidence matrix. The new vector of nodal demands d0 comprises the initial nodal demands d plus the leakage that is introduced as a distinct element in the vector of water consumptions. One column and one row are introduced in the incidence matrixes (i.e. loop and topological) so that to take into account the incidence of the two half-pipes resulted from the additional demand. Following this, the new initial pipe ﬂows and the loop and the tree incidence matrixes are obtained through simple matrix operations (Arsene, 2004; Arsene, 2011; Arsene et al., 2004a; Arsene et al., 2004b) more efﬁcient to use in terms of computational time rather than to reconstruct a spanning tree for the 36–node water network. Therefore instead of carrying out the time consuming process of rebuilding the spanning tree, the new set of initial conditions (i.e. initial pipe ﬂows, incidence matrixes) are determined:

Q 0i ¼ T 1 d

Fig. 5. Spanning tree for water network from Fig. 3; new labels for nodes and pipes are added which produces an upper form triangular incidence matrix T; with dashed lines are shown the co-tree pipes which close loops.

0

ð15Þ

where Q 0i are the initial tree pipe ﬂows used in the next extended time simulation or leakage simulation. The new loop and tree incidence matrixes are obtained with respect to the new initial solution of the tree pipe ﬂows. Therefore, for the pipes in which the direction of initial tree pipe ﬂows Q 0i

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224 Table 1 Parameters used during generation of the training data set as in Arsene and Bargiela (2001), Arsene (2004), Arsene et al. (2004a), Arsene et al. (2004b). Head measurements Fixed-head inﬂow measurements Water consumptions Fixed-head measurements Leak levels Parameters used in CLA Accuracy of head measurements at load nodes Accuracy of inﬂow measurements Variability of consumptions

1, 2, 4, 8, 11, 15, 17, 19, 22, 29, 30, 31 27, 28, 29, 30, 31, 32, 33, 34 All nodes 27, 28, 29, 30, 31, 32, 33, 34 0.002, 0.005, 0.008, 0.011, 0.014, 0.017, 0.020, 0.023, 0.026, 0.029 (m3/s) +0.1 (m)

+1% +10%

changes due to the new set of nodal demands d0 , the loop and the tree incidence matrixes are updated as follows:

M 0lp ð:; kÞ ¼ ð1ÞM lp ð:; kÞ

ð16Þ

T 0 ð:; kÞ ¼ ð1ÞTð:; kÞ

ð17Þ M 0lp

where k is the pipe with the reversed ﬂow, and T0 are the new loop and tree incidence matrixes used in the next extended time simulation or leakage simulation. By means of Eqs. (15)–(17), the block diagram shown at Fig. 4 was successfully run for a 24 h extended time simulation and leakage simulation. The Central Processing Unit (CPU) times were similar (i.e. 40 s) with the times obtained for the implementation based on the LS nodal heads equations (Gabrys, 1997; Gabrys & Bargiela, 1999). The 24 h proﬁles of consumptions and inﬂows that characterize the normal operating states throughout the day are being reported elsewhere (Gabrys, 1997; Arsene, 2004; Arsene, 2011; Arsene et al., 2004a). Furthermore, the computational time required to rebuild the spanning tree and assign new labels for each of the 9144 labeled patterns of data would have been 15 min. This would have been unfavorable when compared to less of 40 s obtained by using the graph and matrix operations described above (Arsene, 2004; Arsene et al., 2011; Arsene et al., 2004a; Arsene et al., 2004b). The computational time required to rebuild the spanning tree would increase steadily with the size of the water network, that is the larger is the size of the network in terms of pipe and nodes, more time is required to build the spanning tree and assign new labels. By contrast, the solution adopted here is based on a couple of basic matrix operations that are almost insensitive to the size of the network. Finally, the whole set of parameters used during the generation of the training set are shown at Table 1. 4. Classiﬁcation of the state of the water network based on patterns of state estimates with conﬁdence limits In order to design the recognition system based on state estimates, the set of 9144 training patterns representing 37 categories is used. The training data spanned across 24 h period of water network operation. The 37 categories stand for normal operating state and leakages in 36 pipes of the upper part (Gabrys & Bargiela, 1999; Arsene, 2004; Arsene, 2011) of the network shown at Fig. 3. The indexes dh of classes were chosen the same as in the original algorithm (Arsene, 2004; Arsene et al., 2011): dh = 1 – normal operating state; dh = 2 – leakage in pipe between nodes 3 and 4; dh = 3 – leakage in pipe between nodes 4 and 20, etc. The training data is ﬁrst scaled in order to be contained in the range [0 1] as required by the pattern recognition system. The range of values for the nodal heads state variables was chosen to be between 2 and 50 (m), and for inﬂows/outﬂows between 0.2 and 0.2 (m3/s).

13219

A three layer GFMMNN (Gabrys, 1997; Gabrys & Bargiela, 1999; Gabrys & Bargiela, 2000; Gabrys, 2002a; Gabrys, 2002b) is built using hyperbox fuzzy sets. A hyperbox deﬁnes a region of the ndimensional pattern space where the n-dimension is formed for example from the number of nodal heads plus the number of pipe ﬂows plus the number of in/out ﬂows. All patterns contained within a hyperbox have full cluster/class membership. A user speciﬁed value H is introduced to control the size of the hyperbox which can be described as the difference between the max and min value for each dimension. The combination of the min–max points and the hyperbox membership function deﬁnes a fuzzy set that is a cluster, that is the case of fault detection. In the case of classiﬁcation, that is a leakage in a pipe, hyperbox fuzzy sets are aggregated to form a single fuzzy set class. Learning in the fuzzy min–max clustering and classiﬁcation neural networks consists of creating and adjusting hyperboxes in pattern space as they are received. It is an expansion/contraction process. The learning process begins by selecting an input pattern and ﬁnding the closest hyperbox to that pattern that can expand if necessary to include the pattern. If a hyperbox cannot be found that meets the expansion criteria, a new hyperbox is formed and added to the system. This growth process allows existing clusters/classes to be reﬁned over time, and it allows new clusters/classes to be added without retraining. Due to hyperbox expansion it can appear the overlapping hyperboxes which can cause ambiguity. It can be assumed that a pattern can have the same partial membership in more than one cluster/class. It can not be assumed that a pattern can completely belong to more than one cluster/class. For completeness we include the description of the GFMMNN algorithm originally developed by Gabrys and Bargiela (2000)): Initialization The GFMMNN pattern recognition algorithm was intended to be used for the water distribution system state classiﬁcation task. Therefore the information obtained from CLA, namely conﬁdence limits for each state variable, had been accommodated by this classiﬁcation procedure. This requirement had been met by specifying the input to classiﬁcation/clustering algorithm as a pair of two vectors: X h ¼ ½X lh X uh - the lower and upper limits for the state vector. In other words instead of a point in n-dimensional space that had to be classiﬁed, they obtained a hyperbox with the min point determined by the vector X lh and the max point determined by the vector X uh . When the min and max points are equal the hyperbox shrinks to the point. In conclusion the algorithm is capable of classiﬁcation/clustering inputs in a form of the n-dimensional vector without any changes to the algorithm because a point in n-dimensional space is simply the special case of a hyperbox with the min and max points equal. They observed that because of the size of the modern water distribution system it is impossible to predict and cover all possible combinations of consumption-inﬂows patterns and anomalies that can occur in the network during day to day operations. Therefore, in order to allow labelled (i.e. normal operating state etc.) and unlabelled inputs to be processed an additional index, dh ¼ 0 meaning that the input pattern is not labelled, had been introduced. A hybrid, supervised (labelled inputs - classiﬁcation) and unsupervised (unlabelled inputs - clustering), neural network had emerged. Hyperbox membership function The fuzzy hyperbox membership function plays a crucial role in the Fuzzy Min-Max Classiﬁcation and Clustering algorithms. The decisions whether the presented input pattern belongs to the particular class or cluster, whether the particular hyperbox is to be expanded, depend mainly on the membership value describing the degree to which an input pattern ﬁts within the hyperbox. The j-th hyperbox fuzzy set, Bj , can be deﬁned by the ordered set:

13220

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

Bj ¼ fX h ; V j ; W j ; bj ðX h ; V j ; W j Þg

ð18Þ ½X lh

X uh

for all h=1,2,...,m, where X h ¼ is the h-th input pattern, V j ¼ ðv j1 ; v j2 ; :::; v jn Þ is the min point for the j-th hyperbox, W j ¼ ðwj1 ; wj2 ; :::; wjn Þ is the max point for the j-th hyperbox, and the membership function for the j-th hyperbox is 0 6 bj ðX h ; V j ; W j Þ 6 1. The min points are initialized with 1 and the max points with 0. An investigation has been carried out in (Gabrys & Bargiela, 1999) in order to decide the most appropriate form for the membership function. They chose the function so that the membership values of the patterns to decrease steadily with the increasing distance from the hyperbox. The reason for doing so is to eliminate the cases when hyperboxes that represent different classes are overlapping. The chosen function is shown below and it can be described as the minimum value of maximum min-max hyperbox points violations for all dimensions:

bj ðX h Þ ¼ min ðminð½1 f ðxuhi wji ; ci Þ; ½1 f ðv ji xlhi ; ci ÞÞÞ i¼1::N

ð19Þ

where xuhi and xlhi are the lower and the upper limits of the h-th input pattern speciﬁed for each dimension i, wji and v ji are the max and min points of the j-th hyperbox, f ðx; cÞ is a two parameter ramp threshold functions which can be written as:

8 > < 1 if xc > 1 f ðx; cÞ ¼ xc if 0 6 xc 6 1 > : 0 if xc < 0 The

membership function

ð20Þ contains

also

the

parameter

c ¼ ½c1 ; c2 ; :::; cn that controls how fast the membership values decrease and it has to be speciﬁed for each dimension (i.e. nodal pressures, pipe ﬂows, outﬂows/inﬂows). Hyperbox expansion This process can be described brieﬂy as to identify the hyperbox closest to the input pattern that can be expanded and expand it. If an expandable hyperbox cannot be found, add a new hyperbox. With respect to the user speciﬁed value H introduced to control the size of the hyperbox, it has been observed (Gabrys, 1997) that keeping the parameter H constant during the learning process can have undesired effects on performance or the number of created hyperboxes. Setting H big can cause too many misclassiﬁcations, especially when there are complex, overlapping classes. On the other hand, it has been observed that when H is small too many unnecessary hyperboxes can be created, especially for concentrated, standing alone groups of data forming one class, while small H might be needed to resolve other overlapping classes. These problems were addressed by introducing an adaptive maximum size of the hyperbox. We shall take into account these observations when testing the recognition system with the loop-based state estimates and conﬁdence limits. Hyperbox overlap test Determine whether the recent expansion caused any undesired overlap between hyperboxes. Hyperbox contraction If the overlap test identiﬁed overlapping hyperboxes, then contract the hyperboxes to eliminate overlap. More details about each of these steps can be found in (Gabrys, 1997; Gabrys and Bargiela, 2000). However, we will mention here that the training process is completed when after presentation of all training patterns there have been no misclassiﬁcation for the training data or the minimum, user speciﬁed value of the parameter H has been reached. The topology of this neural network grows to meet the demands of the problem. The input layer has 2⁄n processing elements, two for each of the n dimensions of the input pattern X h ¼ ½X lh X uh . Each second layer node of this three-layer neural network represents a hyperbox fuzzy set where the connections of ﬁrst and second layer

are the min-max points and the transfer function is the hyperbox membership function. The min points are stored in the matrix V and the max points are stored in the matrix W. The way these connections are adjusted is described in (Gabrys, 1997; Gabrys and Bargiela, 2000). The connections between the second and third layer nodes are binary values. They are stored in a matrix U. The equation for assigning the values of U is:

ujk ¼

1 if bj is a hyperbox for class ck 0 otherwise

ð21Þ

where k is the index for the output node. Each of the third layer nodes represents a pattern class. The output node c0 from the third layer represents all unlabelled hyperboxes from the second layer, while the other output nodes c1;2;...p represents the normal operating state, leakage in pipe 1, etc. The classiﬁcation and clustering algorithm brieﬂy described here had been tested extensively on different sets of data (both data points and fuzzy labelled and labelled input patterns) and compared to other existent classiﬁcation algorithms (Gabrys, 1997). We will just say here that the GFMMNN algorithm dealt successfully with both labelled and unlabelled patterns, in most of the cases resolved all the overlappings between hyperboxes from different classes, which ﬁnally resulted in fewer misclassiﬁcations compared with several other neural, fuzzy and traditional classiﬁers (Gabrys & Bargiela, 2000; Gabrys, 1997). In the original work based on patterns of state estimates calculated with the LS nodal heads state estimator, it has been observed the existence of multiple classes with full membership for a large number of testing patterns (i.e. patterns belonged to classes representing leakages in different pipes). A two level recognition system was proposed in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) as a mean of solving the respective problem (Fig. 6). The ﬁrst level of the recognition system can

Fig. 6. Two level recognition system proposed in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000). First level consists of one Neural Network (NN) and its purpose is to select one of the n second level ‘‘experts’’. Input to the ﬁrst level NN, XI, comprises all the variables not affected by occurrence of anomaly. Second level consists of n NNs. They are called ‘‘experts’’ since each of them is trained using only a part of training set and covers a distinctive part of 24 h operational period. Input to the second level NNs, XII, comprises all the variables sensitive to occurrence of anomaly. The output of the second level NNs is the classiﬁcation of the water network state.

13221

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224 Table 2 Misclassiﬁcation rates for a test set consisting of 9144 examples of LS loop ﬂows state estimates computed for accurate measurements. Training set

Parameter

H

Loop ﬂows state estimates computed for accurate measurements without conﬁdence limits

0.2 0.1

Loop ﬂows state estimates computed for accurate measurements including conﬁdence limits

0.2 0.1 Variable⁄

Misclassiﬁcation rates Highest membership

Top 2 alternatives

Top 3 alternatives

Top 5 alternatives

33.41 10.12

21.63 4.62

15.63 2.6

8.69 2.39

5.39 0.70 0

3.71 0.32 0

7.83 1.01 0.002

6.51 0.73 0.001

⁄ Parameter H was determined separately for each dimension of each of the six subsets of the training set and was set to the value of the largest input hyperbox for each of these six subsets.

Table 3 Number of hyperboxes and misclassiﬁcation rates for different value of parameter H. Training set

Parameter

Top 5 alternatives

H

Loop ﬂows state estimates computed for accurate measurements including conﬁdence limits

distinguish between different typical behavior of the water network (e.g. night load) while the second level is responsible for detection of anomalies for some characteristic load patterns. The second level is viewed as ‘‘experts’’. By doing so, the distinctive variations in the typical network behavior for different days of the week or seasons of the year, can be accommodated without the need to retrain the existing networks. In exchange, a new expert network is added to the second level and the size of the ﬁrst level network is increased accordingly. Thus the GFMMNN is able to grow so that to meet the demands of the problem. It was noticed in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) that if the input patterns obtained with LS nodal heads state estimator are processed by the two level neural networks, the dimension of the training data can be reduced in comparison to the full training data. Also, the fact that only one of the ‘‘experts’’ is selected for further processing means that the other n-1 ‘‘experts’’ are not active. This way another dimensionally reduction is achieved since each of the second level networks covers only a part of the day rather than 24 h period. The two level recognition system is trained with the 9144 labeled patterns of data obtained with the LS loop ﬂows state estimator and the training generation data scheme from Fig. 4. The same as in the original system six characteristic inﬂow patterns (i.e. subset) can be found for six periods during the 24 h water network operation Gabrys and Bargiela (1999), Arsene (2004), Arsene (2011), Arsene et al. (2004a), Gabrys and Bargiela (2000): 1–5, 6– 8, 9–12, 13–17, 18–20 and 21–24. The misclassiﬁcation rates for the testing set consisting of the 9144 examples of LS loop ﬂows state estimates with and without conﬁdence limits computed for accurate measurements is shown at Table 2. The ﬁrst interesting result is the comparison of the performance of the recognition system trained for patterns of LS loop ﬂows state estimates with the performance of the recognition system trained for patterns of LS nodal heads state estimates: the misclassiﬁcation rates obtained here are slightly higher with 2–4% on average in comparison with the similar misclassiﬁcation rates reported in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela

0.2 0.1 0.009 0.008 0.005

Misclassiﬁcation rates

Number of hyperboxes

Patterns of data

3.71 3.2 2.9 0.1 0

6411 7062 8597 8700 8777

9144 9144 9144 9144 9144

(2000). This is due to the high sensitivity of the state estimates calculated with the LS loop ﬂows state estimator to the available pressure measurements. Hence the LS loop ﬂows state estimates used for training the two levels neural network are deﬁning a space of patterns of data which are overlapping making more difﬁcult to solve the classiﬁcation task. The second observation is that in the context of the training with the patterns obtained with the LS loop ﬂows state estimates and conﬁdence limits, the misclassiﬁcation rates compares well with what was reported in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) in the context of the training set consisting of LS nodal heads state estimates with conﬁdence limits. However, after an examination of the number of hyperboxes obtained during the training process, it has been observed that in order to solve all the overlappings, there were necessary a number of hyperboxes almost equal to the number of patterns of LS loop ﬂows state estimates and conﬁdence limits (Table 3). The attempt to solve all the overlappings for the training set resulted in an unacceptable number of hyperboxes representing identical classes of operation (i.e. leakages). Next, the GFMMNN training is performed for patterns of variation of nodal consumptions and conﬁdence limits.

5. Classiﬁcation of the water network state based on patterns of variation of load measurements and conﬁdence limits In the case of the LS loop ﬂows state estimator, the difference between the actual measurement and the value of the measured quantity as computed by the state estimator should be zero for a pressure measurement (Arsene, 2004; Arsene, 2011). It means that the variation of the load measurements (i.e. nodal consumptions) while in the presence of a pressure measurement will contain strong information about the location of topological errors. In terms of the pattern recognition system, this ensures that the hyperboxes representing topological errors of different magnitudes will move away from the zero reference point representing the normal operating point for each operational period of the consid-

13222

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

Table 4 The signiﬁcance of the parameter H on the number of hyperboxes and misclassiﬁcation rates for a test set consisting of 9144 patterns of variations of nodal demands and conﬁdence limits. Training set

Parameter

Number of hyperboxes

Patterns of data

Misclassiﬁcation rates

Variation of nodal demands computed for accurate measurements without conﬁdence limits

0.2 0.25 0.5

53 47 39

9144 9144 9144

0 0 0

Variation of nodal demands computed for accurate measurements including conﬁdence limits

0.2 0.25 0.5

62 52 39

9144 9144 9144

0 0 0

ered 24 h operational period. This can solve the overlappings of the hyperboxes in a robust manner. To conclude, the presence of topological error in the vicinity of pressure measurements will result in the modiﬁcation of the nodal demands Dd located in the region of the water network containing the respective pressure measurements. If the pseudo-measurements (i.e. nodal consumptions / nodal demands) are in agreement with the real measurements then the calculated variations of nodal demands Dd should be zero irrespectively of the operating state. Also by using patterns of variations of nodal demands it is possible to classify the operational state of the water network in a way similar to the classiﬁcation of the state of the water network based on the residuals obtained with the LS nodal heads state estimator. The training data consisting of the variations of nodal demands with conﬁdence limits are scaled and mapped onto the [0 1] range. A single neural network of the type shown at Fig. 6 is used for the entire operational period of 24 h. The initial maximum size of a hyperbox was set to the value H = 0.1. The training was completed after one run through the entire training data of 9144 patterns of variations of nodal demands and conﬁdence limits. There were no misclassiﬁcations. The testing procedure showed excellent recognition rates for both patterns of variations of nodal demands as well as patterns of variations of nodal demands with conﬁdence limits. In Table 4 are shown the number of hyperboxes created during the training process. Since no information about the level of leakage in a pipe was included in the training set, then by increasing the size of the hyperboxes, it would eventually be obtained a single hyperbox that is representing all the levels of leakage in a pipe. To understand the reasons for the excellent recognition rates of the classiﬁcation system based on the variation of nodal demands, let us show a couple of examples on the behavior of the variations of nodal demands for different levels of leakages in pipes. In Fig. 7 is showed the behavior of the variation of nodal demands for different levels of leakages in the pipe between the new labeled nodes 3 and 4 from Fig. 5. Fig. 7a shows an example of variation of nodal demand at the new labeled node 3 determined in the course of the state estimation carried out for accurate measurements. Even for a small leakage of 2 (l/s) the variation of nodal demand at node 3 is distinctive from the zero reference point (i.e. normal operating point). In Fig. 7b it can be observed the inﬂuence that given random measurement errors can have on the variation of nodal demand at node 3. The monotonic trend caused by the leakage is not too much distorted by the considered measurement noise. In Fig. 7c is shown the effective ranges within which the variation of nodal demand at node 3 can vary because of the associated measurement noise with the nodal consumption at the respective node. This is shown again for the different levels of leakage in the pipe between nodes 3 and 4. Based on Fig. 7, it can be noticed the ability of the LS loop ﬂows state estimator to make use of the pressure measurements which

Fig. 7. Variation of nodal demand at the new labeled node 3 for different levels of leakage in the pipe between the new labeled nodes 3 and 4 (i.e. old labeled nodes 23 and 19 in Fig. 5): (a) variation of nodal demand at the new labeled node 3 for accurate measurements; (b) variation of nodal demand affected by typical measurements inaccuracies; (c) tight conﬁdence limits marked with ‘‘⁄’’ for variation of nodal demand at new labeled node 3 represented by solid line and corresponding to 10 different levels of leakage: 2 (l/s), 5 (l/s), 8 (l/s), 11 (l/s), 14 (l/s), 17 (l/s), 20 (l/s), 23 (l/s) 26 (l/s), 29 (l/s).

produced patterns of variation of nodal demands that are distinctive from the zero value corresponding to the normal operating

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

13223

point. Consequently, this resulted in the identiﬁcation of the topological errors. The conclusions from above are reinforced in Fig. 8. The tight conﬁdence limits on the variation of nodal demands representing normal operating states are pictured in form of dashed lines, while examples of the input patterns representing different levels of leakage in the pipe between the new labeled nodes 27 and 26 (i.e. nodes 4 and 3 with the old labels in Fig. 5) are marked by a star ‘⁄’. The water network shown in this example contains a high number of pressure measurements, which makes it not difﬁcult to spot the leakages. For the purposes of comparison, it was kept the same number of measurements and accuracy of the measurements as the measurement data used in the original study performed in Gabrys (1997); Gabrys and Bargiela (1999); Gabrys and Bargiela (2000) which used patterns of state estimates obtained with the LS nodal heads state estimator.

In Fig. 8 is easy to identify the presence of the leakage by looking to the variation of the nodal demands that are situated outside of the upper and the lower bounds shown with dashed lines. Leaks as small as 2 (l/s) were identiﬁed in the respective water network. The examples shown at Fig. 8 were obtained for the same operational time period, which explains the similarity of the conﬁdence intervals for the normal operating state. The tight conﬁdence limits for the normal operating state together with the distinctive variation of the nodal demands located in the vicinity of the leakage are the reason for the excellent performance of the detection system based on patterns of variation of load measurements. It is clear that training the GFMMNN with patterns of variation of nodal consumption with or without conﬁdence limits produces the best (i.e. smallest) misclassiﬁcation rates, the best size (i.e. single GFMMNN) of the neural network

Fig. 8. Variation of nodal demands for levels of leakage between the new labeled nodes 27 and 26 (i.e. old labeled nodes 3 and 4 in Fig. 5): (a) leakage of 29 (l/s); (b) leakage of 17 (l/s); (c) leakage of 2 (l/s).

Fig. 9. Examples of variation of nodal demands for different levels of leakage between nodes 14 and 16 (nodes 15 and 10 if you refer to the original notations in Fig. 5); (a) leakage of 29 (l/s); (b) leakage of 17 (l/s); (c) leakage of 2 (l/s).

13224

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

and the best (i.e. smallest) computational time necessary to train and to test the neural network. In Fig. 9 are shown again examples of variation of nodal demands for different levels of leakage between nodes 14 and 16, that are nodes 15 and 10 if referring to the original notations in Fig. 5. In Fig. 9a is shown the variation of nodal demands for a leakage of 29 (l/s), in Fig. 9b for a leakage of 17 (l/s) and in Fig. 9c for a leakage of 2 (l/s). 6. Conclusions The assumption that fault diagnosis can be based on pattern analysis without a need to employ any heuristic or specialist knowledge has been again highlighted within the context of patterns of LS loop ﬂows equations state estimates and conﬁdence limits and patterns of variation of nodal demands and conﬁdence limits. First, it has been shown that the LS loop ﬂows state estimates can be successfully used to train the neural recognition system. Slightly higher misclassiﬁcation rates have been obtained when compared to the training of the recognition system with the state estimates obtained with the LS nodal heads state estimator. This is because of smaller separation of patterns obtained with the LS loop ﬂows state estimator representing different classes (topological errors, operational time periods) and due to a much higher ratio of the sensitivity of the nodal heads to the existing set of pressure measurements in the case of the LS loop ﬂows state estimator. Second, the recognition system based on conﬁdence limits for the LS loop ﬂows state estimates performed better for the data for which it was trained. However, this came at the expense of a high number of hyperboxes necessary to cover the space of input patterns. The classiﬁcation of the water network state based on patterns of variation of load measurements and conﬁdence limits gave the best results. The neural network used for fault detection and identiﬁcation in this last case was essentially a simpler version of the recognition system used in the ﬁrst two cases. It consisted of a single GFMMNN architecture of the type shown at Fig. 6, which was used for the entire 24 h period of operations of the water network. The overlapping was solved in a robust way. This was because the presence of topological error of different magnitudes reﬂected in the variation of the nodal demands which sprung up the hyperboxes from the zero reference point representing the normal operating point for the entire 24 h operational period. The GFMMNN relied on the ability of the LS loop ﬂows state estimator of making full use of the pressure/nodal heads measurements existent in the water network which resulted in the respective variations of nodal demands. While the overlappings could be resolved in a robust manner, it also kept the number of hyperboxes representing different classes to a small number. This is because identical topological errors which were producing at different operational times resulted in similar variations of the nodal consumptions, which in turn were represented by the same hyperbox. Acknowledgments Dr. Corneliu T.C. Arsene thanks to the School of Science and Technology, Nottingham Trent University, UK, which supported this work through a Ph.D. studentship.

References Andersen, J. H., & Powell, R. S. (1999). Simulation of water networks containing controlling elements. Journal of Water Resources Planning and Management, 125(3), 33–41. Arsene, C.T.C., & Bargiela, A. (2001). Decision support for forecasting and fault diagnosis in water distribution systems– robust loop ﬂows state stimation. Water software systems: Theory and applications, In Coulbeck, B., & Ulanicki, B. (Series Eds.) & Ulanicky, B., Coulbeck, B., & Rance, J.P. (Eds.) (Vol. 1) (pp. 133– 145). Arsene, C.T.C. (2004). Operational Decision Support in the Presence of Uncertainties, Ph.D. Thesis, Nottingham Trent University, Nottingham, UK. Arsene, C. T. C., Bargiela, A., & Al-Dabass, D. (2004a). Modelling and simulation of water systems based on loop equations. IJSSST, 5(1 & 2), 61–72. Arsene, C.T.C., Bargiela, A., & Al-Dabass, D. (2004b). Simulation of network systems based on loop ﬂows algorithms. In The Proceedings of the 7th Sim. Soc. Conf. – UKSim 2004, Oxford, UK. Arsene, C.T.C. (2011). Operational Decision Support in the Presence of Uncertainties– Water Distribution Systems, CreateSpace, US, ISBN978-1463535285. Arsene, C. T. C., Al-Dabass, D., & Hartley, J. (2011). Conﬁdence limit analysis of water distribution systems based on a least squares loop ﬂows state estimation technique. EMS, 94–101. Arsene, C. T. C., Al-Dabass, D., & Hartley, J. (2012). A study on modelling and simulation of water distribution systems based on loop corrective ﬂows and containing controlling elements. Intelligent Systems, Modelling and Simulation, 423–430. Bargiela, A., Arsene, C.T.C., & Tanaka, M. (2002). Knowledge-based neurocomputing for operational decision support. In International Conference on Knowledge-Based Intelligent Information and Engineering Systems, Milan, Italy. Belsito, S., Lombardi, P., Andreussi, P., & Banerjee, S. (1998). Leak detection in liqueﬁed gas pipelines by artiﬁcial neural networks. AIChE Journal, 44(12), 2675–2688. Caputo, A. C., & Pelagagge, P. M. (2003). Using neural networks to monitor piping systems. Process Safety Progress, 22(2), 119–127. Carpentier, P., & Cohen, G. (1993). Applied mathematics in water supply network management. Automatica, 29(5), 1215–1250. Epp, R., & Fowler, A. G. (1970). Efﬁcient code for steady ﬂows in networks. Journal of the Hydraulics Division, 96, 43–56. Feng, J., & Zhang, H. (2006). Algorithm of pipeline leak detection based on discrete incremental clustering method. ICIC 2006, LNAI 4114, In Huang, D. -S., Li, K., Irwin, G.W. (eds.), Springer-Verlag, PP. 602–607. Gabrys, B. (1997). Neural network based decision support: Modeling and simulation of water distribution networks. Ph.D. Thesis, Nottingham Trent University. Gabrys, B., & Bargiela, A. (1999). Neural networks based decision support in presence of uncertainties. Journal of Water Resources Planning and Management, ASCE, 125(5), 272–280. Gabrys, B., & Bargiela, A. (2000). General fuzzy min–max neural network for clustering and classiﬁcation. IEEE Transactions on Neural Networks, 11(3), 769–783. Gabrys, B. (2002a). Agglomerative learning algorithms for general fuzzy min–max neural network, the special issue of the Journal of VLSI Signal Processing Systems entitled. Advances in Neural Networks for Signal Processing, 32(1/2), 67–82. Gabrys, B. (2002b). Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning, 30(3), 149–179. Izquierdo, J., López, P. A., & Martínez, R. P. (2007). Fault detection in water supply systems using hybrid (theory and data-driven) modelling. Mathematical and Computer Modelling, 46, 341–350. Jeppson, R. W., & Davis, A. L. (1976). Pressure reducing valves in pipe network analyses. Journal of the Hydraulics Division, 102(7), 987–1001. Kumar, S. M., Narasimhan, S., & Bhadllamudi, S. M. (2008). State estimation in water distribution networks using graph–theoretic reduction strategy. Journal of Water Resources Planning and Management, 134, 395–403. Mashford, J., De Silva, D., Marney, D., & Burn, S. (2009). An approach to leak detection in pipe networks using analysis of monitored pressure values by support vector machine. In Proceedings of the 3rd international conference on network and system security (NSS 2009). Australia: Gold Coast. Rao, H. S., Markel, L. C., & Bree, D. (1977). Extended period simulation of water systems – Part A. Journal of the Hydraulics Division, 103, 97–108. Rossman, L.A. (1994). EPANET – Users Manual, Risk Reduction Engineering Laboratory, Ofﬁce of Research and Development, US Environmental Protection Agency, Cincinnati, Ohio 45268. Rahal, H. (1995). A co-tree ﬂows formulation for steady-state in water distribution networks. Advances in Engineering Software, 22, 169–178. Shinozuka, M., Liang, J., & Feng, M. Q. (2005). Use of supervisory control and data acquisition for damage location of water delivery systems. Journal of Engineering Mechanics, 225–230.

Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection

Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection

Recommend Documents