Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection

Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection

Expert Systems with Applications 39 (2012) 13214–13224 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal h...

1MB Sizes 0 Downloads 1 Views

Expert Systems with Applications 39 (2012) 13214–13224

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Decision support system for water distribution systems based on neural networks and graphs theory for leakage detection Corneliu T.C. Arsene ⇑, Bogdan Gabrys 1, David Al-Dabass 2 School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom

a r t i c l e

i n f o

Keywords: Decision support system Operational control of water distribution systems Loop corrective flows equations Modeling and simulation Neural network Graph theory

a b s t r a c t This paper presents an efficient and effective decision support system (DSS) for operational monitoring and control of water distribution systems based on a three layer General Fuzzy Min–Max Neural Network (GFMMNN) and graph theory. The operational monitoring and control involves detection of pipe leakages. The training data for the GFMMNN is obtained through simulation of leakages in a water network for a 24 h operational period. The training data generation scheme includes a simulator algorithm based on loop corrective flows equations, a Least Squares (LS) loop flows state estimator and a Confidence Limit Analysis (CLA) algorithm for uncertainty quantification entitled Error Maximization (EM) algorithm. These three numerical algorithms for modeling and simulation of water networks are based on loop corrective flows equations and graph theory. It is shown that the detection of leakages based on the training and testing of the GFMMNN with patterns of variation of nodal consumptions with or without confidence limits produces better recognition rates in comparison to the training based on patterns of nodal heads and pipe flows state estimates with or without confidence limits. It produces also comparable recognition rates to the original recognition system trained with patterns of data obtained with the LS nodal heads state estimator while being computationally superior by requiring a single architecture of the GFMMNN type and using a small number of pattern recognition hyperbox fuzzy sets built by the same GFMMNN architecture. In this case the GFMMNN relies on the ability of the LS loop flows state estimator of making full use of the pressure/nodal heads measurements existent in a water network. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction Two broad categories of faults occurring in water distribution systems are considered in this work. The faults because of malfunctioning of transducers and telecommunication equipment which are referred to as the measurement errors. And the faults due to leakages and wrong status of valves, invalidating the system model used in the state estimation which are referred to as the topological errors (Arsene & Bargiela, 2001; Gabrys, 1997; Gabrys & Bargiela, 1999; Arsene, 2004; Arsene, 2011; Carpentier & Cohen, 1993). This paper addresses the topological errors introduced by a leakage in a pipe while the recognition of the wrong status of a

⇑ Corresponding author. Address: School of Computing and Informatics, Nottingham Trent University, Nottingham NG11 8NS, UK. Tel.: +44 (0)1202 965298; fax: +44 (0) 1202 965314. E-mail addresses: [email protected] (C.T.C. Arsene), [email protected] (B. Gabrys), [email protected] (D. Al-Dabass). 1 Computational Intelligence Research Group School of Design, Engineering & Computing Bournemouth University, Poole House Talbot Campus, Fern Barrow Poole, BH12 5BB, United Kingdom Tel.: +44 (0)1202 965298; fax: +44 (0) 1202 965314. 2 School of Computing and Informatics, Nottingham Trent University, Clifton lane, Nottingham NG11 8NS, UK. 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.05.080

valve can be dealt with in a similar way by using the concepts described herein. It is obvious that in the absence of accurate real measurements, the topological errors not only pose a much greater danger to the safety of water network operation but also are more difficult to locate and eradicate even when reliable and efficient state estimators are available. Depending on the topology of the distribution network and the state estimator used (Gabrys, 1997; Arsene, 2004; Arsene, Bargiela, & Al-Dabass, 2004a; Arsene, Bargiela, & Al-Dabass, 2004b), the topological class of errors form characteristic patterns that can be utilized to classify the state of the water network. The classification of the state of the water network it has been investigated (Gabrys, 1997; Gabrys & Bargiela, 1999; Gabrys & Bargiela, 2000) in the context of the Least Squares (LS) state estimator based on the nodal heads equations (Gabrys, 1997; Gabrys & Bargiela, 1999). The respective approach for diagnosis of leakages and other operational faults occurring in water networks was based on the examination of patterns of state estimates (i.e. nodal pressures, pipe flows) or LS nodal heads residuals by a General Fuzzy Min–Max Neural Network (GFMMNN) (Gabrys & Bargiela, 1999; Gabrys & Bargiela, 2000). It was shown that both the LS nodal heads state estimates with their confidence limits and the LS nodal heads residuals with their confidence limits can

13215

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

be successfully used to train the GFMMNN recognition system (Arsene & Bargiela, 2001; Arsene, 2004; Belsito, Lombardi, Andreussi, & Banerjee, 1998). This paper presents the application of the GFMMNN to the classification of the state of the water distribution system based on patterns of data obtained with a LS loop flows state estimator (Arsene and Bargiela, 2001; Arsene et al., 2004a; Arsene et al., 2004b) and Confidence Limits Analysis (CLA) implemented with the same state estimator (Arsene, 2004; Arsene, 2011; Arsene et al., 2011). The investigation has two aims: first, to build an effective and efficient Decision Support System (DSS) (Bargiela et al., 2002) for fault detection and identification in water networks by using (a) the LS loop flows state estimator, (b) the CLA algorithm based on the same LS loop flows state estimator and (c) the GFMMNN system (Bargiela et al., 2002). The second aim is to compare the novel DSS with the initial system described in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) and based on the LS nodal heads equations. Recently in other works it was tackled the fault detection and identification in water networks by using machine learning technique such as various other types of neural networks (Caputo and Pelagagge, 2003; Shinozuka et al., 2005; Belsito et al., 1998; Feng and Zhang, 2006; Izquierdo et al., 2007) or Support Vector Machine (SVM) (Mashford et al., 2009). However, the use of the same pattern recognition architecture, such as the GFMMNN, with different simulation, state estimation and CLA algorithms is very scarce.

where d are the nodal demands and Anp is the topological incidence matrix. The topological incidence matrix Anp has a row for every node and a column for every pipe of the water network. The entries for each row +1 and 1 indicate that the flow in a pipe enters or leaves the node (Arsene et al., 2004a; Arsene et al., 2004b). The energy equation is solved with the Newton–Raphson method and states that sum of the pipe head losses around each loop equals 0:

Mlp h ¼ 0

ð2Þ

where h represents the pipes head losses calculated by the Hazen– Williams equation (Epp & Fowler, 1970) and Mlp is the loop incidence matrix. The loop incidence matrix has the property that the entries +1 and 1 corresponds to the flow in a pipe being for example clockwise in a loop or anti-clockwise in the same loop, while 0 means that a pipe does not belong to a loop (Arsene et al., 2004a; Arsene et al., 2004b). The loop corrective flows D Ql at the step k + 1 of the Newton– Raphson iteration method which solves (2) are:

"

DQ lkþ1 ¼ DQ lk 

@ DH @ DQ l k

#1

DH

ð3Þ

where DH are the residual loop head losses (i.e. DH = Mlph). The Jacobian matrix @@DDQH in (3) can be expressed as:

J ¼ Mlp AMpl

Three main numerical algorithms are used in this paper for the generation of the training and testing data of the GFMMNN recognition system: a simulator algorithm, a state estimator and a CLA algorithm. The GFMMNN pattern recognition system was developed initially in the context of the simulation and state estimation of water network based on the nodal heads equations, the Newton–Raphson numerical technique (Jeppson & Davis, 1976) and the LS optimization criterion. 2.1. Simulator algorithm Modeling and simulation of water distribution system consists of two main ingredients: the set of independent equations that describe the water network and the numerical optimization method used to calculate the nodal heads and the pipe flows. In Fig. 1 is shown a water network where the edges are the pipes that distribute the water to the consumers which are represented by the nodes (e.g. 1, 2, 3, etc.). A simulator algorithm is defined as a solution of the water network equations for a given set of nodal demands. The nodes represented with a square in the figure below are nodes with fixed head/pressure. The simulator algorithm used here is based on the loop corrective flow algorithm defined for a water distribution system with n-nodes, l-loops, and p-pipes. The continuity equation must be satisfied, that is the flow entering a node equals the nodal consumption plus the flow exiting the respective node. Therefore, an initial pipe flows solution Qi that satisfies the continuity equation is calculated as:

2 ΔQl1

4

ð1Þ

lk

2. Numerical algorithms

1

Anp Q i ¼ d

3 ΔQl2

5

ΔQl3

6

Fig. 1. Example of water network.

7

ð4Þ

where Mpl is the transpose of loop incidence matrix and A is a diagonal matrix with a special property.

0 B A¼B @

us1 jQ 1 ju1

0...

0

0...

us2 jQ 2 ju1

0

0

0...

usp jQ p ju1

1 C C A

ð5Þ

where s1,2,. . .p is the pipe head loss coefficient and u is the exponent in the Hazen–Williams equation (Arsene, 2011; Carpentier & Cohen, 1993; Arsene et al., 2004a; Arsene et al., 2004b; Kumar, Narasimhan, & Bhadllamudi, 2008; Jeppson & Davis, 1976). e for each pipe is: The final pipe flow solution Q

e ¼ Q þ M T DQ Q i l lp

ð6Þ

e are the final pipe flows calculated at the end of the Newwhere Q ton–Raphson method (Arsene et al., 2004a; Arsene et al., 2004b; Jeppson & Davis, 1976). The loop simulator requires the computation of the loop incidence matrix Mlp and the initial pipe flows Qi. This problem is in general based on the decomposition of the water network into a spanning tree (e.g. Fig. 5) starting from a node which becomes the main root node with fixed value pressure. A spanning tree contains all the vertices and the edges of a connected and undirected graph except for the edges which form the cycles (i.e. loops) of the graph (Arsene et al., 2004a; Arsene et al., 2004b). Different search strategies can be employed in order to search the water network. The Depth First (DF) search from the graph theory is one of the possible choices for finding the loops in a water network. The DF search has the property that always a pipe that does not belong to the spanning tree called a chord pipe or a co-tree pipe, connects a node with one of its predecessor in the tree. Based on the spanning tree, the topological incidence matrix Anp can be split in a tree T(nn) incidence matrix which defines the incidence of the tree pipes, which are the pipes situated in the spanning tree, and a co-tree incidence matrix C(nl) which contain the co-tree pipes that are not in the spanning tree and form the loops (i.e. Anp = [T

13216

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

C]). In this case the loop corrective flows can become known as the co-tree pipe flows (Andersen and Powell, 1999; Epp and Fowler, 1970; Rahal, 1995). The handling of the non-linear controlling hydraulics elements such as pumps or Pressure Reducing Valves (PRV) is done in the water network simulations shown in this paper based on algorithms described in Kumar et al. (2008), Andersen and Powell (1999), Arsene, Al-Dabass, and Hartley (2012). 2.2. State estimation An additional set of independent variables is introduced, the variation of nodal demandsD d, in which case the pipe flows are written with respect to the loop corrective flows DQl and the variation of nodal demands:

e ¼ Q  A Dd þ Mpl DQ Q i l

ð7Þ 



e are the pipe flows in tree and co-tree pipes, Q ¼ Q T i are where Q i 0ln the initial flows in tree pipes while the initial flows in the co-tree pipes are zero (i.e. 0ln zero vector of size (ln)), and matrix A⁄ is  1  the matrix with the property A ¼ T . 0ln There are two sets of equations which are used to describe the hydraulics of the water network. The first set of equations states that the loop head losses around the loops equal to zero:

DHðDQ l ; DdÞ ¼ 0

ð8Þ

where the loop head losses residuals DH are function of the loop corrective flows DQl and the variation of nodal demands Dd. The second set of equations states that the total amount of inflow/outflow from the water network carried out through the fixed-head nodes (i.e. nodes with fixed pressure) equal the variation of nodal demands:

Dd ¼ Bnl DQ l

ð9Þ

The matrix Bnl(n  l) from previous equation has a non-zero element equal to 1 which corresponds to the main root node and 1 for each of the fixed-head nodes. Pseudo-loops are added between the main root node and each of the fixed head node and a loop corrective flow is considered for each such pseudo-loop. Eqs. (8) and (9) represent the hydraulic function that describes the water network. It can be written as a system of equations:



DHðDQ l ; DdÞ ¼ 0 Bnl DQ l  Dd ¼ 0

ð10Þ

The system of equations can be augmented with the equations corresponding to real pressure and pipe flow measurements (Arsene, 2004; Arsene, 2011). It can be shown that for the system of equations (10) the Hessian matrix is semipositive definite (Arsene, 2004; Arsene, 2011) and it has a global minimum point. The system of equations from above (10) can be solved by using the Newton–Raphson iterative method and minimizing the LS criterion. The Jacobian matrix of the system of equations is

3 @ DH @ DH 7 6 @ Dd DQ l 7 J¼6 4 @ðDdÞ @ðDQ l Þ 5  Bnl @ Dd DQ l 2

ð11Þ

It is possible to calculate the inflows/outflows in the water network at the end of the Newton–Raphson method by subtracting the variation of the nodal demands at the fixed-head nodes from the loop corrective flows corresponding to the respective pseudoloops, in which case matrix Bnl can be taken out. Following this, the Jacobian matrix becomes:

" J¼

@ DH @ Dd

@ DH @ DQ l

Inn

0

# ð12Þ

where Inn is the (n  n) identity matrix. The Jacobian matrix J resembles the one presented in Arsene and Bargiela (2001); Arsene et al. (2004a); Arsene et al. (2004b). 2.3. Confidence Limit Analysis It has been shown that in water networks state estimation for a given set of input data and estimation criterion there is one optimal solution. However due to the inaccuracies in the input data, there are many possible, different combinations of such input data and therefore there are many feasible, different state estimate vectors. As a result, the uncertainty analysis becomes an inevitable part of the water distribution systems since it is very important, from the safety of the system operational control point of view, to know how the inaccuracies can affect the estimated solution. Extensive work on the quantification of the influence of measurements and pseudo-measurements uncertainties in water distribution system has been done (Gabrys, 1997; Gabrys & Bargiela, 1999; Arsene et al., 2011) and is based on the principle of unknown-but-bounded errors for the set of measurements:

z ¼ gðxÞ þ r;

jri j 6 jei j;

i ¼ 1; . . . ; m

ð13Þ

where e is the vector representing the maximum expected measurement errors, z is the measurement vector, g is the hydraulic network function, x are the set of independent variables which in this case are the loop corrective flows and the variation of nodal demands, r is the vector of residuals which can not be accounted by the state estimator and the measurement data, m is the number of measurements. The knowledge of statistical properties of errors is not required and the only restriction imposed was the one of errors falling within a range bounded by e. Several CLA algorithms were proposed but the most successful ones in terms of computational complexity were based on the linearized model of the water network. The linearized model of the water network was used to obtain a sensitivity matrix S. The sensitivity matrix was the pseudo-inverse of the Jacobian matrix calculated for the state estimates which were the nodal heads, the pipe flows and the inflows/outflows by using the LS nodal heads state estimator. A state estimate was produced on the assumption that the measurement vector zt is correct and the possible error of the measurement set Dz was considered and used together with the sensitivity matrix S in order to predict the resulting error in the state estimates. This approach was facilitated by the use of the nodal heads equations in the LS state estimator. Because of this, the (i,j)th element sij of the pseudo-inverse of the Jacobian matrix relates the sensitivity of the ith element, xi, of the state vector/estimates, xt, to the jth element, zj, of the measurement vector. In the context of the LS loop flows state estimator, the EM method stands that for a Maximum level (M) of the uncertainties/Errors (E) in the input measurement data of the water distribution system, it is obtained suitable confidence limits for the nodal heads and the pipe flows and hence the EM term. This method is suitable to be used only with the LS loop flows state estimator (Arsene, 2004; Arsene, 2011; Arsene et al., 2011). The EM method considers the maximum variability of water consumptions and accuracy of real meters which are forming the state estimated measurement vector ^z which is calculated from the observed state vector ^ x (i.e. nodal heads and pipe flows) which in turn was obtained from the observed measurement vector zo with the LS loop flows state estimator. The state estimated measurement vector ^z is used instead of the observed measurement

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

13217

Fig. 2. CLA based on EM method.

vector zo which is the measurement vector which was initially given to the human operator. Therefore the upper or the lower measurement limits [^zl ^zu ] of the estimated measurement vector ^z which is modified with the measurement accuracy for water consumptions and real meters, are used again in the LS loop flows state estimator. It results a new state vector x1 which is used for determining the confidence limits on the state variables (nodal heads, in/out flows, pipe flows) with the equation:

xcli ¼ absðx1  ^xÞ

ð14Þ

where xcli is the confidence limit on the i-th state variable, x^ is the state vector obtained for the observed measurement vector zo, x1 is the state vector obtained for the maximum level of errors (i.e. ^zl or ^zu ) in the estimated measurement vector ^z, abs is the absolute value. At Fig. 2 is shown the EM method. Calculating the confidence limits with the EM method, is possible to be implemented only with the LS loop flows state estimator and it does not work with other state estimators which are not based on the loop corrective flows and the variation of nodal demands such as the LS nodal heads state estimator. The LS loop flows state estimator modifies the inflows/outflows into the fixed-head nodes so that to match the sum of the estimated nodal demands ^z obtained with the LS loop flows state estimator. This means that if the estimated nodal water consumptions and real meters ^z are moved to their lower ^zl or upper limit ^zu then the mass balance of the water network will still be satisfied by the in/out flows at the fixed-head nodes which are modified during the Newton–Raphson optimization method. In this case the fixed-head nodes are part of the measurement data (i.e. nodal heads, pipe flows) and are used to form pseudo-loops together with the main source/root node. The CLA is realized with the EM method (Arsene, 2004; Arsene, 2011; Arsene et al., 2011) which is based on the LS loop flows state estimator. It provides realistic confidence bounds for the nodal heads and the pipe flows obtained with the LS loop flows state estimator. There are also obtained confidence bounds for the variation of nodal consumptions, in effect the state estimated nodal consumptions in vector x1 (Arsene, 2004; Arsene, 2011; Arsene et al., 2011).

Fig. 3. 34-node water network for generation of the training data.

day or days as might be required in order to obtain the representative set of labeled data. It is an accepted practice that, for processes where the physical interference is not recommended or even dangerous, mathematical models and computer simulations are used to predict the consequences of some emergencies so that one might be prepared for quick response. In our case the computer simulations are used to generate data covering 24 h period for the water network depicted at Fig. 3. Such simulations that stretch over longer periods of time are called extended time simulations (Rao, Markel, & Bree, 1977; Rossman, 1994). The reason for choosing the water network from Fig. 3 is that it will be possible to compare the results of training the recognition system with patterns obtained with the LS loop flows state estimator and confidence limits with the results reported in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) for the same water network and operational testing conditions in the context of the LS nodal heads equations. The process of generating the training data is shown in the form of block diagram at Fig. 4. It consists of three major blocks. The first

3. Generation of the training data for the GFMMNN While for the well maintained water distribution systems the normal operating state data can be found in abundance the instances of abnormal events are not that readily available. In order to observe the effects of abnormal events in the physical system, it is possible to resort to deliberate closing of valves or opening of hydrants in order to simulate leakages (Carpentier & Cohen, 1993). Although such experiments can be very useful to confirm the agreement between the behavior of the physical system and the mathematical model, it is not feasible to carry out such experiments for all pipes and valves in the system during the whole

Fig. 4. Graphical representation of the training patterns generation scheme.

13218

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

module is the co-tree flows simulator (i.e. loop simulator). The simulator is used as a substitute for the physical water distribution network. It is this module where the leakages are simulated by updating the topology information rather than opening hydrants. In the second module, the LS loop flows state estimation process is carried out for accurate measurements taken from the simulation module but without knowledge of any anomalous event that might happen, as would be the case in the real water network. Between the first two modules can be introduced a telemetry module which can simulate the measurement noise because of the transducers from the real water networks. In the third module the confidence limits are found for state estimates (nodal heads, pipe flows) and the variation of nodal demands calculated at the estimation stage. The CLA is realized with the EM technique which provides the confidence bounds for the nodal heads and the pipe flows state estimated as well as the variation of nodal consumptions. Additionally to the state estimates with their confidence limits the system’s status or label of the current pattern is stored. In the physical system simulator, the leakage is modeled as an additional demand lying midway between the two end nodes of a pipe. The additional demand is not modeled as a pressure dependent variable and thus can be set to any desired value. The spanning tree for the 34-node water network is shown at Fig. 5. The main root node is node 30 and a pseudo-loop is added between the fixed head-node 31 and the main source node. The inflows to the other fixed-head nodes 27, 28, 29, 32, 33 and 34 are maintained constant. This means that the pumping stations represented by links 32–20, 27–29, 28–4, 33–29, 29–19 and 34–1 are assumed to produce a constant inflow and are not affected by leakage. Therefore the inflows at the reservoirs 30 and 31 are adjusted during the Newton–Raphson method so that to cover the additional demand resulting from the leakage. Since the lower part of the water network which includes nodes 26, 29, 1, 33, 34 has no bearings on the upper part, only the upper part will be used for fault detection and identification (Gabrys, 1997; Gabrys & Bargiela, 1999; Arsene, 2004; Arsene, 2011).

Leakage Labels of nodes and pipes are incremented by one

By systematically working through the network, ten levels of leaks are introduced, one at a time, in every single pipe for every hour of the 24 h period. Since there are 38 pipes multiplied by 10 levels of leakages and plus the normal operating status gives 381 patterns of state estimates for each hour. For a full day this will become a training set of data consisting of 9144 labeled patterns of state estimates computed for accurate measurements and leakages ranging from 2 (l/s) to 29 (l/s). However, since an additional consumption is used in order to simulate the leak, this would require modifying the incidence matrixes and the initial pipe flows for the loop algorithms. Therefore rebuilding the spanning tree for each of the 9144 patterns of data would represent a computational drawback for the training patterns generation scheme shown at Fig. 4. This would be also a disadvantage when compared to the implementation based on the nodal heads equations (Gabrys, 1997; Gabrys & Bargiela, 1999). The simulation of a leakage based on the loop corrective flows equations and graph theory (Arsene, 2004; Arsene et al., 2011; Arsene et al., 2004a; Arsene et al., 2004b) is based on modifying the initial spanning tree built for the normal operating state of the water network so that to account for an additional water consumption that models the leak. Following this, new topological and loop incidence matrixes and initial pipe flows are determined as input information for the simulator algorithm. This procedure avoids the time consuming process of rebuilding the spanning tree. Assuming that Mlp, T and Qi are the loop incidence matrix, the tree incidence matrix and the initial pipe flows obtained from the spanning tree and corresponding to the normal operating status, and d are the initial nodal demands, then the training pattern generation scheme from Fig. 4 is pursued once. The nodal heads and the pipe flows state estimates with the confidence limits, the variation of nodal consumptions with confidence limits and the status of the water network are stored for subsequent utilization in the pattern classification module. In the original 34-node water network a leakage is modeled as an additional water consumption for example between nodes 17 and 18 (i.e. using the new label notations in Fig. 5) and then the incidence matrixes and the initial pipe flows are recalculated. The procedure for avoidance of rebuilding of the spanning tree for each of the 9144 patterns works in the following way: the labels for the nodes and pipes that are situated in the spanning tree below the leakage location, are incremented by one so that to preserve an upper form for the tree incidence matrix. The new vector of nodal demands d0 comprises the initial nodal demands d plus the leakage that is introduced as a distinct element in the vector of water consumptions. One column and one row are introduced in the incidence matrixes (i.e. loop and topological) so that to take into account the incidence of the two half-pipes resulted from the additional demand. Following this, the new initial pipe flows and the loop and the tree incidence matrixes are obtained through simple matrix operations (Arsene, 2004; Arsene, 2011; Arsene et al., 2004a; Arsene et al., 2004b) more efficient to use in terms of computational time rather than to reconstruct a spanning tree for the 36–node water network. Therefore instead of carrying out the time consuming process of rebuilding the spanning tree, the new set of initial conditions (i.e. initial pipe flows, incidence matrixes) are determined:

Q 0i ¼ T 1 d

Fig. 5. Spanning tree for water network from Fig. 3; new labels for nodes and pipes are added which produces an upper form triangular incidence matrix T; with dashed lines are shown the co-tree pipes which close loops.

0

ð15Þ

where Q 0i are the initial tree pipe flows used in the next extended time simulation or leakage simulation. The new loop and tree incidence matrixes are obtained with respect to the new initial solution of the tree pipe flows. Therefore, for the pipes in which the direction of initial tree pipe flows Q 0i

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224 Table 1 Parameters used during generation of the training data set as in Arsene and Bargiela (2001), Arsene (2004), Arsene et al. (2004a), Arsene et al. (2004b). Head measurements Fixed-head inflow measurements Water consumptions Fixed-head measurements Leak levels Parameters used in CLA Accuracy of head measurements at load nodes Accuracy of inflow measurements Variability of consumptions

1, 2, 4, 8, 11, 15, 17, 19, 22, 29, 30, 31 27, 28, 29, 30, 31, 32, 33, 34 All nodes 27, 28, 29, 30, 31, 32, 33, 34 0.002, 0.005, 0.008, 0.011, 0.014, 0.017, 0.020, 0.023, 0.026, 0.029 (m3/s) +0.1 (m)

+1% +10%

changes due to the new set of nodal demands d0 , the loop and the tree incidence matrixes are updated as follows:

M 0lp ð:; kÞ ¼ ð1ÞM lp ð:; kÞ

ð16Þ

T 0 ð:; kÞ ¼ ð1ÞTð:; kÞ

ð17Þ M 0lp

where k is the pipe with the reversed flow, and T0 are the new loop and tree incidence matrixes used in the next extended time simulation or leakage simulation. By means of Eqs. (15)–(17), the block diagram shown at Fig. 4 was successfully run for a 24 h extended time simulation and leakage simulation. The Central Processing Unit (CPU) times were similar (i.e. 40 s) with the times obtained for the implementation based on the LS nodal heads equations (Gabrys, 1997; Gabrys & Bargiela, 1999). The 24 h profiles of consumptions and inflows that characterize the normal operating states throughout the day are being reported elsewhere (Gabrys, 1997; Arsene, 2004; Arsene, 2011; Arsene et al., 2004a). Furthermore, the computational time required to rebuild the spanning tree and assign new labels for each of the 9144 labeled patterns of data would have been 15 min. This would have been unfavorable when compared to less of 40 s obtained by using the graph and matrix operations described above (Arsene, 2004; Arsene et al., 2011; Arsene et al., 2004a; Arsene et al., 2004b). The computational time required to rebuild the spanning tree would increase steadily with the size of the water network, that is the larger is the size of the network in terms of pipe and nodes, more time is required to build the spanning tree and assign new labels. By contrast, the solution adopted here is based on a couple of basic matrix operations that are almost insensitive to the size of the network. Finally, the whole set of parameters used during the generation of the training set are shown at Table 1. 4. Classification of the state of the water network based on patterns of state estimates with confidence limits In order to design the recognition system based on state estimates, the set of 9144 training patterns representing 37 categories is used. The training data spanned across 24 h period of water network operation. The 37 categories stand for normal operating state and leakages in 36 pipes of the upper part (Gabrys & Bargiela, 1999; Arsene, 2004; Arsene, 2011) of the network shown at Fig. 3. The indexes dh of classes were chosen the same as in the original algorithm (Arsene, 2004; Arsene et al., 2011): dh = 1 – normal operating state; dh = 2 – leakage in pipe between nodes 3 and 4; dh = 3 – leakage in pipe between nodes 4 and 20, etc. The training data is first scaled in order to be contained in the range [0 1] as required by the pattern recognition system. The range of values for the nodal heads state variables was chosen to be between 2 and 50 (m), and for inflows/outflows between 0.2 and 0.2 (m3/s).

13219

A three layer GFMMNN (Gabrys, 1997; Gabrys & Bargiela, 1999; Gabrys & Bargiela, 2000; Gabrys, 2002a; Gabrys, 2002b) is built using hyperbox fuzzy sets. A hyperbox defines a region of the ndimensional pattern space where the n-dimension is formed for example from the number of nodal heads plus the number of pipe flows plus the number of in/out flows. All patterns contained within a hyperbox have full cluster/class membership. A user specified value H is introduced to control the size of the hyperbox which can be described as the difference between the max and min value for each dimension. The combination of the min–max points and the hyperbox membership function defines a fuzzy set that is a cluster, that is the case of fault detection. In the case of classification, that is a leakage in a pipe, hyperbox fuzzy sets are aggregated to form a single fuzzy set class. Learning in the fuzzy min–max clustering and classification neural networks consists of creating and adjusting hyperboxes in pattern space as they are received. It is an expansion/contraction process. The learning process begins by selecting an input pattern and finding the closest hyperbox to that pattern that can expand if necessary to include the pattern. If a hyperbox cannot be found that meets the expansion criteria, a new hyperbox is formed and added to the system. This growth process allows existing clusters/classes to be refined over time, and it allows new clusters/classes to be added without retraining. Due to hyperbox expansion it can appear the overlapping hyperboxes which can cause ambiguity. It can be assumed that a pattern can have the same partial membership in more than one cluster/class. It can not be assumed that a pattern can completely belong to more than one cluster/class. For completeness we include the description of the GFMMNN algorithm originally developed by Gabrys and Bargiela (2000)): Initialization The GFMMNN pattern recognition algorithm was intended to be used for the water distribution system state classification task. Therefore the information obtained from CLA, namely confidence limits for each state variable, had been accommodated by this classification procedure. This requirement had been met by specifying the input to classification/clustering algorithm as a pair of two vectors: X h ¼ ½X lh X uh  - the lower and upper limits for the state vector. In other words instead of a point in n-dimensional space that had to be classified, they obtained a hyperbox with the min point determined by the vector X lh and the max point determined by the vector X uh . When the min and max points are equal the hyperbox shrinks to the point. In conclusion the algorithm is capable of classification/clustering inputs in a form of the n-dimensional vector without any changes to the algorithm because a point in n-dimensional space is simply the special case of a hyperbox with the min and max points equal. They observed that because of the size of the modern water distribution system it is impossible to predict and cover all possible combinations of consumption-inflows patterns and anomalies that can occur in the network during day to day operations. Therefore, in order to allow labelled (i.e. normal operating state etc.) and unlabelled inputs to be processed an additional index, dh ¼ 0 meaning that the input pattern is not labelled, had been introduced. A hybrid, supervised (labelled inputs - classification) and unsupervised (unlabelled inputs - clustering), neural network had emerged. Hyperbox membership function The fuzzy hyperbox membership function plays a crucial role in the Fuzzy Min-Max Classification and Clustering algorithms. The decisions whether the presented input pattern belongs to the particular class or cluster, whether the particular hyperbox is to be expanded, depend mainly on the membership value describing the degree to which an input pattern fits within the hyperbox. The j-th hyperbox fuzzy set, Bj , can be defined by the ordered set:

13220

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

Bj ¼ fX h ; V j ; W j ; bj ðX h ; V j ; W j Þg

ð18Þ ½X lh

X uh 

for all h=1,2,...,m, where X h ¼ is the h-th input pattern, V j ¼ ðv j1 ; v j2 ; :::; v jn Þ is the min point for the j-th hyperbox, W j ¼ ðwj1 ; wj2 ; :::; wjn Þ is the max point for the j-th hyperbox, and the membership function for the j-th hyperbox is 0 6 bj ðX h ; V j ; W j Þ 6 1. The min points are initialized with 1 and the max points with 0. An investigation has been carried out in (Gabrys & Bargiela, 1999) in order to decide the most appropriate form for the membership function. They chose the function so that the membership values of the patterns to decrease steadily with the increasing distance from the hyperbox. The reason for doing so is to eliminate the cases when hyperboxes that represent different classes are overlapping. The chosen function is shown below and it can be described as the minimum value of maximum min-max hyperbox points violations for all dimensions:

bj ðX h Þ ¼ min ðminð½1  f ðxuhi  wji ; ci Þ; ½1  f ðv ji  xlhi ; ci ÞÞÞ i¼1::N

ð19Þ

where xuhi and xlhi are the lower and the upper limits of the h-th input pattern specified for each dimension i, wji and v ji are the max and min points of the j-th hyperbox, f ðx; cÞ is a two parameter ramp threshold functions which can be written as:

8 > < 1 if xc > 1 f ðx; cÞ ¼ xc if 0 6 xc 6 1 > : 0 if xc < 0 The

membership function

ð20Þ contains

also

the

parameter

c ¼ ½c1 ; c2 ; :::; cn  that controls how fast the membership values decrease and it has to be specified for each dimension (i.e. nodal pressures, pipe flows, outflows/inflows). Hyperbox expansion This process can be described briefly as to identify the hyperbox closest to the input pattern that can be expanded and expand it. If an expandable hyperbox cannot be found, add a new hyperbox. With respect to the user specified value H introduced to control the size of the hyperbox, it has been observed (Gabrys, 1997) that keeping the parameter H constant during the learning process can have undesired effects on performance or the number of created hyperboxes. Setting H big can cause too many misclassifications, especially when there are complex, overlapping classes. On the other hand, it has been observed that when H is small too many unnecessary hyperboxes can be created, especially for concentrated, standing alone groups of data forming one class, while small H might be needed to resolve other overlapping classes. These problems were addressed by introducing an adaptive maximum size of the hyperbox. We shall take into account these observations when testing the recognition system with the loop-based state estimates and confidence limits. Hyperbox overlap test Determine whether the recent expansion caused any undesired overlap between hyperboxes. Hyperbox contraction If the overlap test identified overlapping hyperboxes, then contract the hyperboxes to eliminate overlap. More details about each of these steps can be found in (Gabrys, 1997; Gabrys and Bargiela, 2000). However, we will mention here that the training process is completed when after presentation of all training patterns there have been no misclassification for the training data or the minimum, user specified value of the parameter H has been reached. The topology of this neural network grows to meet the demands of the problem. The input layer has 2⁄n processing elements, two for each of the n dimensions of the input pattern X h ¼ ½X lh X uh . Each second layer node of this three-layer neural network represents a hyperbox fuzzy set where the connections of first and second layer

are the min-max points and the transfer function is the hyperbox membership function. The min points are stored in the matrix V and the max points are stored in the matrix W. The way these connections are adjusted is described in (Gabrys, 1997; Gabrys and Bargiela, 2000). The connections between the second and third layer nodes are binary values. They are stored in a matrix U. The equation for assigning the values of U is:

ujk ¼



1 if bj is a hyperbox for class ck 0 otherwise

ð21Þ

where k is the index for the output node. Each of the third layer nodes represents a pattern class. The output node c0 from the third layer represents all unlabelled hyperboxes from the second layer, while the other output nodes c1;2;...p represents the normal operating state, leakage in pipe 1, etc. The classification and clustering algorithm briefly described here had been tested extensively on different sets of data (both data points and fuzzy labelled and labelled input patterns) and compared to other existent classification algorithms (Gabrys, 1997). We will just say here that the GFMMNN algorithm dealt successfully with both labelled and unlabelled patterns, in most of the cases resolved all the overlappings between hyperboxes from different classes, which finally resulted in fewer misclassifications compared with several other neural, fuzzy and traditional classifiers (Gabrys & Bargiela, 2000; Gabrys, 1997). In the original work based on patterns of state estimates calculated with the LS nodal heads state estimator, it has been observed the existence of multiple classes with full membership for a large number of testing patterns (i.e. patterns belonged to classes representing leakages in different pipes). A two level recognition system was proposed in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) as a mean of solving the respective problem (Fig. 6). The first level of the recognition system can

Fig. 6. Two level recognition system proposed in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000). First level consists of one Neural Network (NN) and its purpose is to select one of the n second level ‘‘experts’’. Input to the first level NN, XI, comprises all the variables not affected by occurrence of anomaly. Second level consists of n NNs. They are called ‘‘experts’’ since each of them is trained using only a part of training set and covers a distinctive part of 24 h operational period. Input to the second level NNs, XII, comprises all the variables sensitive to occurrence of anomaly. The output of the second level NNs is the classification of the water network state.

13221

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224 Table 2 Misclassification rates for a test set consisting of 9144 examples of LS loop flows state estimates computed for accurate measurements. Training set

Parameter

H

Loop flows state estimates computed for accurate measurements without confidence limits

0.2 0.1

Loop flows state estimates computed for accurate measurements including confidence limits

0.2 0.1 Variable⁄

Misclassification rates Highest membership

Top 2 alternatives

Top 3 alternatives

Top 5 alternatives

33.41 10.12

21.63 4.62

15.63 2.6

8.69 2.39

5.39 0.70 0

3.71 0.32 0

7.83 1.01 0.002

6.51 0.73 0.001

⁄ Parameter H was determined separately for each dimension of each of the six subsets of the training set and was set to the value of the largest input hyperbox for each of these six subsets.

Table 3 Number of hyperboxes and misclassification rates for different value of parameter H. Training set

Parameter

Top 5 alternatives

H

Loop flows state estimates computed for accurate measurements including confidence limits

distinguish between different typical behavior of the water network (e.g. night load) while the second level is responsible for detection of anomalies for some characteristic load patterns. The second level is viewed as ‘‘experts’’. By doing so, the distinctive variations in the typical network behavior for different days of the week or seasons of the year, can be accommodated without the need to retrain the existing networks. In exchange, a new expert network is added to the second level and the size of the first level network is increased accordingly. Thus the GFMMNN is able to grow so that to meet the demands of the problem. It was noticed in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) that if the input patterns obtained with LS nodal heads state estimator are processed by the two level neural networks, the dimension of the training data can be reduced in comparison to the full training data. Also, the fact that only one of the ‘‘experts’’ is selected for further processing means that the other n-1 ‘‘experts’’ are not active. This way another dimensionally reduction is achieved since each of the second level networks covers only a part of the day rather than 24 h period. The two level recognition system is trained with the 9144 labeled patterns of data obtained with the LS loop flows state estimator and the training generation data scheme from Fig. 4. The same as in the original system six characteristic inflow patterns (i.e. subset) can be found for six periods during the 24 h water network operation Gabrys and Bargiela (1999), Arsene (2004), Arsene (2011), Arsene et al. (2004a), Gabrys and Bargiela (2000): 1–5, 6– 8, 9–12, 13–17, 18–20 and 21–24. The misclassification rates for the testing set consisting of the 9144 examples of LS loop flows state estimates with and without confidence limits computed for accurate measurements is shown at Table 2. The first interesting result is the comparison of the performance of the recognition system trained for patterns of LS loop flows state estimates with the performance of the recognition system trained for patterns of LS nodal heads state estimates: the misclassification rates obtained here are slightly higher with 2–4% on average in comparison with the similar misclassification rates reported in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela

0.2 0.1 0.009 0.008 0.005

Misclassification rates

Number of hyperboxes

Patterns of data

3.71 3.2 2.9 0.1 0

6411 7062 8597 8700 8777

9144 9144 9144 9144 9144

(2000). This is due to the high sensitivity of the state estimates calculated with the LS loop flows state estimator to the available pressure measurements. Hence the LS loop flows state estimates used for training the two levels neural network are defining a space of patterns of data which are overlapping making more difficult to solve the classification task. The second observation is that in the context of the training with the patterns obtained with the LS loop flows state estimates and confidence limits, the misclassification rates compares well with what was reported in Gabrys (1997), Gabrys and Bargiela (1999), Gabrys and Bargiela (2000) in the context of the training set consisting of LS nodal heads state estimates with confidence limits. However, after an examination of the number of hyperboxes obtained during the training process, it has been observed that in order to solve all the overlappings, there were necessary a number of hyperboxes almost equal to the number of patterns of LS loop flows state estimates and confidence limits (Table 3). The attempt to solve all the overlappings for the training set resulted in an unacceptable number of hyperboxes representing identical classes of operation (i.e. leakages). Next, the GFMMNN training is performed for patterns of variation of nodal consumptions and confidence limits.

5. Classification of the water network state based on patterns of variation of load measurements and confidence limits In the case of the LS loop flows state estimator, the difference between the actual measurement and the value of the measured quantity as computed by the state estimator should be zero for a pressure measurement (Arsene, 2004; Arsene, 2011). It means that the variation of the load measurements (i.e. nodal consumptions) while in the presence of a pressure measurement will contain strong information about the location of topological errors. In terms of the pattern recognition system, this ensures that the hyperboxes representing topological errors of different magnitudes will move away from the zero reference point representing the normal operating point for each operational period of the consid-

13222

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

Table 4 The significance of the parameter H on the number of hyperboxes and misclassification rates for a test set consisting of 9144 patterns of variations of nodal demands and confidence limits. Training set

Parameter

Number of hyperboxes

Patterns of data

Misclassification rates

Variation of nodal demands computed for accurate measurements without confidence limits

0.2 0.25 0.5

53 47 39

9144 9144 9144

0 0 0

Variation of nodal demands computed for accurate measurements including confidence limits

0.2 0.25 0.5

62 52 39

9144 9144 9144

0 0 0

ered 24 h operational period. This can solve the overlappings of the hyperboxes in a robust manner. To conclude, the presence of topological error in the vicinity of pressure measurements will result in the modification of the nodal demands Dd located in the region of the water network containing the respective pressure measurements. If the pseudo-measurements (i.e. nodal consumptions / nodal demands) are in agreement with the real measurements then the calculated variations of nodal demands Dd should be zero irrespectively of the operating state. Also by using patterns of variations of nodal demands it is possible to classify the operational state of the water network in a way similar to the classification of the state of the water network based on the residuals obtained with the LS nodal heads state estimator. The training data consisting of the variations of nodal demands with confidence limits are scaled and mapped onto the [0 1] range. A single neural network of the type shown at Fig. 6 is used for the entire operational period of 24 h. The initial maximum size of a hyperbox was set to the value H = 0.1. The training was completed after one run through the entire training data of 9144 patterns of variations of nodal demands and confidence limits. There were no misclassifications. The testing procedure showed excellent recognition rates for both patterns of variations of nodal demands as well as patterns of variations of nodal demands with confidence limits. In Table 4 are shown the number of hyperboxes created during the training process. Since no information about the level of leakage in a pipe was included in the training set, then by increasing the size of the hyperboxes, it would eventually be obtained a single hyperbox that is representing all the levels of leakage in a pipe. To understand the reasons for the excellent recognition rates of the classification system based on the variation of nodal demands, let us show a couple of examples on the behavior of the variations of nodal demands for different levels of leakages in pipes. In Fig. 7 is showed the behavior of the variation of nodal demands for different levels of leakages in the pipe between the new labeled nodes 3 and 4 from Fig. 5. Fig. 7a shows an example of variation of nodal demand at the new labeled node 3 determined in the course of the state estimation carried out for accurate measurements. Even for a small leakage of 2 (l/s) the variation of nodal demand at node 3 is distinctive from the zero reference point (i.e. normal operating point). In Fig. 7b it can be observed the influence that given random measurement errors can have on the variation of nodal demand at node 3. The monotonic trend caused by the leakage is not too much distorted by the considered measurement noise. In Fig. 7c is shown the effective ranges within which the variation of nodal demand at node 3 can vary because of the associated measurement noise with the nodal consumption at the respective node. This is shown again for the different levels of leakage in the pipe between nodes 3 and 4. Based on Fig. 7, it can be noticed the ability of the LS loop flows state estimator to make use of the pressure measurements which

Fig. 7. Variation of nodal demand at the new labeled node 3 for different levels of leakage in the pipe between the new labeled nodes 3 and 4 (i.e. old labeled nodes 23 and 19 in Fig. 5): (a) variation of nodal demand at the new labeled node 3 for accurate measurements; (b) variation of nodal demand affected by typical measurements inaccuracies; (c) tight confidence limits marked with ‘‘⁄’’ for variation of nodal demand at new labeled node 3 represented by solid line and corresponding to 10 different levels of leakage: 2 (l/s), 5 (l/s), 8 (l/s), 11 (l/s), 14 (l/s), 17 (l/s), 20 (l/s), 23 (l/s) 26 (l/s), 29 (l/s).

produced patterns of variation of nodal demands that are distinctive from the zero value corresponding to the normal operating

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

13223

point. Consequently, this resulted in the identification of the topological errors. The conclusions from above are reinforced in Fig. 8. The tight confidence limits on the variation of nodal demands representing normal operating states are pictured in form of dashed lines, while examples of the input patterns representing different levels of leakage in the pipe between the new labeled nodes 27 and 26 (i.e. nodes 4 and 3 with the old labels in Fig. 5) are marked by a star ‘⁄’. The water network shown in this example contains a high number of pressure measurements, which makes it not difficult to spot the leakages. For the purposes of comparison, it was kept the same number of measurements and accuracy of the measurements as the measurement data used in the original study performed in Gabrys (1997); Gabrys and Bargiela (1999); Gabrys and Bargiela (2000) which used patterns of state estimates obtained with the LS nodal heads state estimator.

In Fig. 8 is easy to identify the presence of the leakage by looking to the variation of the nodal demands that are situated outside of the upper and the lower bounds shown with dashed lines. Leaks as small as 2 (l/s) were identified in the respective water network. The examples shown at Fig. 8 were obtained for the same operational time period, which explains the similarity of the confidence intervals for the normal operating state. The tight confidence limits for the normal operating state together with the distinctive variation of the nodal demands located in the vicinity of the leakage are the reason for the excellent performance of the detection system based on patterns of variation of load measurements. It is clear that training the GFMMNN with patterns of variation of nodal consumption with or without confidence limits produces the best (i.e. smallest) misclassification rates, the best size (i.e. single GFMMNN) of the neural network

Fig. 8. Variation of nodal demands for levels of leakage between the new labeled nodes 27 and 26 (i.e. old labeled nodes 3 and 4 in Fig. 5): (a) leakage of 29 (l/s); (b) leakage of 17 (l/s); (c) leakage of 2 (l/s).

Fig. 9. Examples of variation of nodal demands for different levels of leakage between nodes 14 and 16 (nodes 15 and 10 if you refer to the original notations in Fig. 5); (a) leakage of 29 (l/s); (b) leakage of 17 (l/s); (c) leakage of 2 (l/s).

13224

C.T.C. Arsene et al. / Expert Systems with Applications 39 (2012) 13214–13224

and the best (i.e. smallest) computational time necessary to train and to test the neural network. In Fig. 9 are shown again examples of variation of nodal demands for different levels of leakage between nodes 14 and 16, that are nodes 15 and 10 if referring to the original notations in Fig. 5. In Fig. 9a is shown the variation of nodal demands for a leakage of 29 (l/s), in Fig. 9b for a leakage of 17 (l/s) and in Fig. 9c for a leakage of 2 (l/s). 6. Conclusions The assumption that fault diagnosis can be based on pattern analysis without a need to employ any heuristic or specialist knowledge has been again highlighted within the context of patterns of LS loop flows equations state estimates and confidence limits and patterns of variation of nodal demands and confidence limits. First, it has been shown that the LS loop flows state estimates can be successfully used to train the neural recognition system. Slightly higher misclassification rates have been obtained when compared to the training of the recognition system with the state estimates obtained with the LS nodal heads state estimator. This is because of smaller separation of patterns obtained with the LS loop flows state estimator representing different classes (topological errors, operational time periods) and due to a much higher ratio of the sensitivity of the nodal heads to the existing set of pressure measurements in the case of the LS loop flows state estimator. Second, the recognition system based on confidence limits for the LS loop flows state estimates performed better for the data for which it was trained. However, this came at the expense of a high number of hyperboxes necessary to cover the space of input patterns. The classification of the water network state based on patterns of variation of load measurements and confidence limits gave the best results. The neural network used for fault detection and identification in this last case was essentially a simpler version of the recognition system used in the first two cases. It consisted of a single GFMMNN architecture of the type shown at Fig. 6, which was used for the entire 24 h period of operations of the water network. The overlapping was solved in a robust way. This was because the presence of topological error of different magnitudes reflected in the variation of the nodal demands which sprung up the hyperboxes from the zero reference point representing the normal operating point for the entire 24 h operational period. The GFMMNN relied on the ability of the LS loop flows state estimator of making full use of the pressure/nodal heads measurements existent in the water network which resulted in the respective variations of nodal demands. While the overlappings could be resolved in a robust manner, it also kept the number of hyperboxes representing different classes to a small number. This is because identical topological errors which were producing at different operational times resulted in similar variations of the nodal consumptions, which in turn were represented by the same hyperbox. Acknowledgments Dr. Corneliu T.C. Arsene thanks to the School of Science and Technology, Nottingham Trent University, UK, which supported this work through a Ph.D. studentship.

References Andersen, J. H., & Powell, R. S. (1999). Simulation of water networks containing controlling elements. Journal of Water Resources Planning and Management, 125(3), 33–41. Arsene, C.T.C., & Bargiela, A. (2001). Decision support for forecasting and fault diagnosis in water distribution systems– robust loop flows state stimation. Water software systems: Theory and applications, In Coulbeck, B., & Ulanicki, B. (Series Eds.) & Ulanicky, B., Coulbeck, B., & Rance, J.P. (Eds.) (Vol. 1) (pp. 133– 145). Arsene, C.T.C. (2004). Operational Decision Support in the Presence of Uncertainties, Ph.D. Thesis, Nottingham Trent University, Nottingham, UK. Arsene, C. T. C., Bargiela, A., & Al-Dabass, D. (2004a). Modelling and simulation of water systems based on loop equations. IJSSST, 5(1 & 2), 61–72. Arsene, C.T.C., Bargiela, A., & Al-Dabass, D. (2004b). Simulation of network systems based on loop flows algorithms. In The Proceedings of the 7th Sim. Soc. Conf. – UKSim 2004, Oxford, UK. Arsene, C.T.C. (2011). Operational Decision Support in the Presence of Uncertainties– Water Distribution Systems, CreateSpace, US, ISBN978-1463535285. Arsene, C. T. C., Al-Dabass, D., & Hartley, J. (2011). Confidence limit analysis of water distribution systems based on a least squares loop flows state estimation technique. EMS, 94–101. Arsene, C. T. C., Al-Dabass, D., & Hartley, J. (2012). A study on modelling and simulation of water distribution systems based on loop corrective flows and containing controlling elements. Intelligent Systems, Modelling and Simulation, 423–430. Bargiela, A., Arsene, C.T.C., & Tanaka, M. (2002). Knowledge-based neurocomputing for operational decision support. In International Conference on Knowledge-Based Intelligent Information and Engineering Systems, Milan, Italy. Belsito, S., Lombardi, P., Andreussi, P., & Banerjee, S. (1998). Leak detection in liquefied gas pipelines by artificial neural networks. AIChE Journal, 44(12), 2675–2688. Caputo, A. C., & Pelagagge, P. M. (2003). Using neural networks to monitor piping systems. Process Safety Progress, 22(2), 119–127. Carpentier, P., & Cohen, G. (1993). Applied mathematics in water supply network management. Automatica, 29(5), 1215–1250. Epp, R., & Fowler, A. G. (1970). Efficient code for steady flows in networks. Journal of the Hydraulics Division, 96, 43–56. Feng, J., & Zhang, H. (2006). Algorithm of pipeline leak detection based on discrete incremental clustering method. ICIC 2006, LNAI 4114, In Huang, D. -S., Li, K., Irwin, G.W. (eds.), Springer-Verlag, PP. 602–607. Gabrys, B. (1997). Neural network based decision support: Modeling and simulation of water distribution networks. Ph.D. Thesis, Nottingham Trent University. Gabrys, B., & Bargiela, A. (1999). Neural networks based decision support in presence of uncertainties. Journal of Water Resources Planning and Management, ASCE, 125(5), 272–280. Gabrys, B., & Bargiela, A. (2000). General fuzzy min–max neural network for clustering and classification. IEEE Transactions on Neural Networks, 11(3), 769–783. Gabrys, B. (2002a). Agglomerative learning algorithms for general fuzzy min–max neural network, the special issue of the Journal of VLSI Signal Processing Systems entitled. Advances in Neural Networks for Signal Processing, 32(1/2), 67–82. Gabrys, B. (2002b). Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning, 30(3), 149–179. Izquierdo, J., López, P. A., & Martínez, R. P. (2007). Fault detection in water supply systems using hybrid (theory and data-driven) modelling. Mathematical and Computer Modelling, 46, 341–350. Jeppson, R. W., & Davis, A. L. (1976). Pressure reducing valves in pipe network analyses. Journal of the Hydraulics Division, 102(7), 987–1001. Kumar, S. M., Narasimhan, S., & Bhadllamudi, S. M. (2008). State estimation in water distribution networks using graph–theoretic reduction strategy. Journal of Water Resources Planning and Management, 134, 395–403. Mashford, J., De Silva, D., Marney, D., & Burn, S. (2009). An approach to leak detection in pipe networks using analysis of monitored pressure values by support vector machine. In Proceedings of the 3rd international conference on network and system security (NSS 2009). Australia: Gold Coast. Rao, H. S., Markel, L. C., & Bree, D. (1977). Extended period simulation of water systems – Part A. Journal of the Hydraulics Division, 103, 97–108. Rossman, L.A. (1994). EPANET – Users Manual, Risk Reduction Engineering Laboratory, Office of Research and Development, US Environmental Protection Agency, Cincinnati, Ohio 45268. Rahal, H. (1995). A co-tree flows formulation for steady-state in water distribution networks. Advances in Engineering Software, 22, 169–178. Shinozuka, M., Liang, J., & Feng, M. Q. (2005). Use of supervisory control and data acquisition for damage location of water delivery systems. Journal of Engineering Mechanics, 225–230.