Environmental Modelling & Software 26 (2011) 969e972
Contents lists available at ScienceDirect
Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft
Short communication
Topological clustering for water distribution systems analysis Lina Perelman 1, Avi Ostfeld* Faculty of Civil and Environmental Engineering, Technion e Israel Institute of Technology, Haifa 32000, Israel
a r t i c l e i n f o
a b s t r a c t
Article history: Received 2 December 2010 Accepted 17 January 2011 Available online 15 February 2011
Municipal water distribution systems may consist of thousands to tens of thousands of hydraulic components such as pipelines, valves, tanks, hydrants, and pumping units. With the capabilities of today’s computers and database management software, “all pipe” hydraulic simulation models can be easily constructed. However, the uncertainty and complexity of water distribution systems interrelationships makes it difficult to predict its performances under various conditions such as failure scenarios, detection of sources of contamination intrusions, sensor placement locations, etc. A possible way to cope with these difficulties is to gain insight in to the system behavior by simplifying its operation through topological/connectivity analysis. In this study a tool of this kind based on graph theory is developed and demonstrated. The algorithm divides the system into clusters according to the flow directions in pipes. The resulted clustering is generic and can be utilized for different purposes such as water security enhancements by sensor placements at clusters, or efficient isolation of a contaminant intrusion. The methodology is demonstrated on a benchmark water distribution system from the research literature. Ó 2011 Elsevier Ltd. All rights reserved.
Keywords: Clustering Analysis Water distribution systems Graph theory Simplification
1. Background Water distribution systems can reach a substantial size of hundreds to thousands of nodes and links. Hydraulic and water quality models are available and capable of performing extended time period simulations and provide detailed hydraulic and water quality analysis. However, for large networks these fully detailed models result in a substantial amount of data making it difficult to manage, monitor, and understand how the main structure of the systems work. This requires the development of appropriate techniques of simplification for network visualization which will support network monitoring, management, and understanding of the interaction of its components. There has been limited work to date on topological/connectivity analysis of water distribution systems with most of the literature concentrating on reliability or aggregation/skeletonization related problems (e.g., Jacobs and Goulter, 1988; Hamberg and Shamir, 1988; Walters and Lohbeck, 1993; Davidson and Goulter, 1995; Savic and Walters, 1995; Ulanicki et al., 1996; Yang et al., 1996; Ostfeld, 2005; Davidson et al., 2005; Giustolisi et al., 2008; Perelman and Ostfeld, 2008; Grayman et al., 2009; Xu et al., 2010).
* Corresponding author. Tel.: þ972 4 8292782; fax: þ972 4 8228898. E-mail addresses:
[email protected] (L. Perelman),
[email protected] (A. Ostfeld). 1 Tel.: þ972 4 8292630; fax: þ972 4 8228898. 1364-8152/$ e see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2011.01.006
This study proposes a new clustering framework for topological/ connectivity analysis. Cluster analysis is the process of partitioning a set of objects into subsets of similar properties. The objects are “alike” in a given cluster and “different” with respect to objects in other clusters. Clustering in general refers to unsupervised learning and statistical data analysis including machine learning, data mining, pattern recognition, image analysis and bioinformatics. A survey of graph clustering methods and applications can be found in Schaeffer (2007). The topology of water distribution systems is naturally composed of several sub networks which are hydraulically connected or disconnected. Connectivity properties vary in time as a result of changes in dynamic loading conditions. The developed methodology herein suggests a partition of a water distribution system as a function of its structural and connectivity properties (i.e., topology and hydraulics) to groups, which could be interpreted as cluster structures of the network. The developed clustering algorithm provides an improved understanding of the main structure of the system and the connections between its components. The model capabilities are demonstrated on the battle of the water sensors network 1 (BWSN1) (Ostfeld et al., 2008) benchmark system. 2. Problem statement A cluster herein is thought to be a set of nodes that has more and/ or better connections between its inner nodes than to the remaining
970
A
L. Perelman, A. Ostfeld / Environmental Modelling & Software 26 (2011) 969e972
Strongly connected cluster
A
B
Weakly connected cluster
A
B
B
C
C NODE A B C Legend:
A 1 1 a
B 2a 1
C 1 1
NODE A B C
A 0 0
B 2
C 1 0
1
= two directed paths from node A to node B Fig. 1. Strongly and weakly connected clusters.
nodes of the system. In this study the metric for grouping nodes is based on different levels of connectivity between pairs of nodes. Clusters can be either strongly or weakly connected, having at least one directed path between them or not. The objective is to partition the network into strongly and weakly connected clusters. The depth first search (DFS) (Tarjan, 1972) and breadth first search (BFS) (Pohl, 1969) graph algorithms are utilized to compute graph connectivity and clustering, as further outlined below. 3. Methodology This section incorporates some notions of graph theory followed by the proposed clustering algorithm.
3.1. Strongly connected component A subgraph is strongly connected (Fig. 1, part A) if at least one directed path exists between every pair of nodes of that component, i.e., u / v and v / u. A strongly connected cluster (SCC) is a directed cyclic component (i.e., flow direction in links belonging to that component may reverse). Weakly connected component: A directed subgraph is weakly connected (Fig. 1, part B) if its underlying undirected graph is connected (i.e., it contains a path between every two nodes u and v). A weakly connected component can comprise of acyclic looped subgraphs and/or tree components. Depth first search: The depth first search (DFS) (Tarjan, 1972) algorithm is used to explore the connectivity of a graph. The algorithm starts at some node, traversing in the direction of the outgoing edges as far as possible before retrieving. Breadth first search: The breadth first search (BFS) (Pohl, 1969) is used in this study to find all nodes within one weakly connected component. It begins at a root node and explores all of the adjoining nodes, and then for each of those nearest nodes it explores all theirs unexplored adjoining nodes continuing until there are no more adjacent unvisited nodes. 3.2. The proposed clustering algorithm The clustering scheme attempts to group network nodes based on their connectivity as to attain a simplified representation of the distributions system. The procedure is based on weakly and strongly definitions of nodal connectivity, utilizing the two graph search algorithms of DFS and BFS. The proposed methodology is comprised of the following stages: (1) Water distribution systems mapping: The distribution system is mapped into a graph in which the nodes represent the consumers, sources, and tanks, and the edges the connecting pipes, pumps, and valves; (2) Strongly connected clusters (SCCs): Here, using the DFS algorithm graph vertices are classified as weakly or strongly connected. As previously outlined, nodes which are part of cyclic subgraph are strongly connected. A strongly connected cluster (SCC) is defined as the maximal strongly connected subgraph. (3) Weakly connected clusters (WCCs): In this stage the BFS algorithm is invoked and for each root node a new weakly connected component is identified. A weakly connected cluster (WCC) is defined as the maximal weakly connected subgraph (i.e., all accessible nodes from the root node). (4) Cluster structure formalization: On the completion of the previous two steps, all nodes of the network are identified as strongly or weakly connected, and grouped into
Fig. 2. Example application clusters formation (BWSN1, Ostfeld et al., 2008).
L. Perelman, A. Ostfeld / Environmental Modelling & Software 26 (2011) 969e972 Table 1 Clusters connectivity ranked matrix for the example application. SC1 SC1 SC2 SC3 WC1 WC2 WC3 WC4 WC5
SC2
0 1
0
2
1
SC3 1 0 2
WC1
0
WC2
WC3
WC4
WC5
1 2
1
2 1 3
1
3 0
2
2
0 0 1
2
0
SC1 ¼ Strongly connected cluster 1, WC1 ¼ Weakly connected cluster 1.
independent strongly connected or weakly connected clusters. The clusters are connected by edges in which flow direction is, by definition, known and constant (i.e., since if flow direction could reverse than the link would have been part of a strongly connected cluster). As a result, a new network topology is formulated, describing the clusters and their connecting links. Furthermore, a connectivity matrix can be formed representing a new system topology and the connections between its clusters.
4. Example application The methodology is demonstrated on the battle of the water sensors network 1 (BWSN1) (Ostfeld et al., 2008). The system (Fig. 2) is subject to a varying demand pattern of 96 h, consists of 126 nodes, 168 pipes, one constant head source, two tanks, two pumps, and eight valves (http://centres.exeter.ac.uk/cws/ benchmarks). The following steps were utilized for clustering: (1) Water distribution systems mapping e the system was mapped into a directed graph using hydraulic simulation results for 12 h starting midnight; (2) Strongly connected clusters (SCCs) e the DFS was executed once identifying all strongly connected nodes and grouping them into unique strongly connected clusters. For this system and the selected clustering time duration, three strongly connected clusters SC1eSC3 were identified as shown in Fig. 2. (3) Weakly connected clusters (WCCs) e the remaining nodes are weakly connected and can be partitioned into weakly connected clusters. Starting the BFS from the source, all reachable nodes, which were not previously assigned to a strongly connected cluster, constitute the first weakly connected cluster WC1 (Fig. 2). Repeating the BFS from Node 20 (i.e., a boundary node of SC1), WC2 is formed, from Node 31 (i.e., a boundary node of SC2) WC3 is established, etc. Fig. 2 presents the results of this process: WC1eWC5. (4) Cluster structure formalization e Once all the clusters are defined, their interconnections are constructed based on the simulation mapping (i.e., stage 1 above). Each cluster is represented as a node and a clusters connectivity matrix is formed (Fig. 2), with the number of rows and columns equal to the number of clusters, a ‘1’ entry describing a direct connection between clusters (i.e., a “downstream” or “upstream” relationship), and a ‘0’ entry, otherwise. For example the column for SC1 has a ‘1’ entry for SC1, SC2, and WC5 as SC2 and WC5 are directly “upstream” of SC1; and a ‘1’ entry at the row of SC1 at WC2 which is directly “downstream” of SC1. Using the clusters connectivity matrix a clusters topology chart is drawn which graphically shows the clusters interconnections (Fig. 2). A further interpretation of the clusters connectivity matrix is presented in Table 1 which describes a clusters connectivity ranked matrix. Each row in Table 1 represents all the reachable (downstream) clusters from the corresponding cluster and their level of connectivity, where each column shows the source (upstream) clusters and their related connectivity level. For example, the row for SC2 represents flow from cluster SC2 first to clusters SC1, SC3, WC3, and WC5 (i.e., level 1) and further to clusters WC2 and WC4
971
(i.e., level 2). The column for SC2 shows that flow into cluster SC2 can originate only from cluster WC1 (level 1). This clusters connectivity ranked matrix can enhance for example a response modeling approach in case of a contamination intrusion. 5. Conclusions This study utilized the notion of clustering for water distribution systems in the context of topological/connectivity analysis, with the objective of developing and demonstrating a graph theory connectivity based algorithm for water distribution systems analysis. The algorithm divides the system into strongly and weakly connected clusters for a given simulation starting and time duration according to the pipes flow directions. The resulted clusters create a connectivity relationship which further aggregates and simplifies the system interconnections. It should be noted that the identification of a weakly connected cluster is not unique in oppose to a strongly connected cluster. This is due to two main reasons: (1) the order in which BFS is executed influences the identified set of nodes reachable from the root node, and (2) a weakly connected set of nodes can be formulated based on different criteria, for example, setting upper and lower bounds on the size of each set, minimizing the conductance of the sets of nodes, etc. Another point that is important to emphasize as impacting the clustering algorithm outcome is the initial time for which the clustering simulation is invoked and the assumed clustering time duration. Both influence the resulted clustering topology and size, and are problem dependent. The above issues as well as possible extensions of the proposed algorithm to problems such as contaminant containment and response modeling; Bayesian networks for contaminant source detection; and sensor placement, need further research explorations. Acknowledgments This research was supported by the Institute for Future Defense Technologies Research Named for The Medvedi, Shwartzman and Gensler families, by the Technion Grand Water Research Institute (GWRI), and by NATO [Science for Peace (SfP) project no. CBD.MD.SFP 981456]. References Davidson, J.W., Goulter, I.C., 1995. Evolution program for design of rectilinear branched networks. Journal of Computing in Civil Engineering, ASCE 9 (2), 112e121. Davidson, J., Bouchart, F., Cavill, S., Jowitt, P., 2005. Real-time connectivity modeling of water distribution networks to predict contamination spread. Journal of Computing in Civil Engineering, ASCE 19 (4), 377e386. Giustolisi, O., Kapelan, Z., Savic, D., 2008. Algorithm for automatic detection of topological changes in water distribution networks. Journal of Hydraulic Engineering, ASCE 134 (4), 435e446. Grayman, W.M., Murray, R., Savic, D.A., 2009. Effects of redesign of water systems for security and water quality factors. In: Starrett, S. (Ed.), Proceedings of the World Environmental and Water Resources Congress. doi:10.1061/41036(342) 49 Kansas City, MO. Hamberg, D., Shamir, U., 1988. Schematic models for distribution systems design. I: combination concept. Journal of Water Resources Planning and Management Division, ASCE 114 (2), 129e140. Jacobs, P., Goulter, I., 1988. Evaluation of methods for decomposition of water distribution networks for reliability analysis. Civil Engineering Systems 5 (2), 58e64. Ostfeld, A., 2005. Water distribution systems connectivity analysis. Journal of Water Resources Planning and Management Division, ASCE 131 (1), 58e66. Ostfeld, A., et al., 2008. The battle of the water sensor networks: a design challenge for engineers and algorithms. Journal of Water Resources Planning and Management Division, ASCE 134 (6), 556e568.
972
L. Perelman, A. Ostfeld / Environmental Modelling & Software 26 (2011) 969e972
Perelman, L., Ostfeld, A., 2008. Water distribution system aggregation for water quality analysis. Journal of Water Resources Planning and Management Division, ASCE 134 (3), 303e309. Pohl I.S., 1969. Bi-directional and heuristic search in path problems. Doctoral Thesis. Stanford Linear Accelerator Center, Stanford University, Stanford, CA, USA, 169p. Savic, D.A., Walters, G.A., 1995. An evolution program for optimal pressure regulation in water distribution networks. Engineering Optimization 24 (3), 197e219. Schaeffer, S.E., 2007. Graph clustering. Computer Science Review 1, 27e64. doi:10.1016/j.cosrev.2007.05.01. Tarjan, R., 1972. Depth-first search and linear graph algorithms. SIAM Journal of Computing 1 (2), 146e160.
Ulanicki, B., Zehnpfund, A., Martinez, F., 1996. Simplification of water distribution network models. In: Proceedings of the Second International Conference on Hydroinformatics, pp. 493e500. Zurich, Switzerland. Walters, G.A., Lohbeck, T., 1993. Optimal layout of tree networks using genetic algorithms. Engineering Optimization 22 (1), 27e48. Xu, J., Small, M., Fischbeck, P., VanBriesen, J., 2010. Integrating location models with Bayesian analysis to inform decision making. Journal of Water Resources Planning and Management Division, ASCE 136 (2), 209e216. Yang, S.-L., Hsu, N.-S., Loule, P.W.F., Yeh, W.W.-G., 1996. Water distribution network reliability: connectivity analysis. Journal of Infrastructure Systems, ASCE 2 (2), 54e64.