Evaluating online data of water quality changes in a pilot drinking water distribution system with multivariate data exploration methods

Evaluating online data of water quality changes in a pilot drinking water distribution system with multivariate data exploration methods

ARTICLE IN PRESS WAT E R R E S E A R C H 42 (2008) 2421 – 2430 Available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/watres ...

712KB Sizes 0 Downloads 56 Views

ARTICLE IN PRESS WAT E R R E S E A R C H

42 (2008) 2421 – 2430

Available at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/watres

Evaluating online data of water quality changes in a pilot drinking water distribution system with multivariate data exploration methods Satu M. Mustonena, Soile Tissaria, Laura Huikkoa, Mikko Kolehmainena, Markku J. Lehtolab, Arja Hirvonena, a

Department of Environmental Science, University of Kuopio, P.O. Box 1627, FI-70211 Kuopio, Finland Laboratory of Environmental Microbiology, National Public Health Institute, Department of Environmental Health, P.O. Box 95, FI-70701 Kuopio, Finland

b

art i cle info

ab st rac t

Article history:

The distribution of drinking water generates soft deposits and biofilms in the pipelines

Received 16 March 2007

of distribution systems. Disturbances in water distribution can detach these deposits

Received in revised form

and biofilms and thus deteriorate the water quality. We studied the effects of simulated

25 November 2007

pressure shocks on the water quality with online analysers. The study was conducted

Accepted 17 January 2008

with copper and composite plastic pipelines in a pilot distribution system. The online

Available online 25 January 2008

data gathered during the study was evaluated with Self-Organising Map (SOM) and

Keywords: Distribution system Water quality Pressure shock Online monitoring Self-Organising Map Sammon’s mapping

Sammon’s mapping, which are useful methods in exploring large amounts of multivariate data. The objective was to test the usefulness of these methods in pinpointing the abnormal water quality changes in the online data. The pressure shocks increased temporarily the number of particles, turbidity and electrical conductivity. SOM and Sammon’s mapping were able to separate these situations from the normal data and thus make those visible. Therefore these methods make it possible to detect abrupt changes in water quality and thus to react rapidly to any disturbances in the system. These methods are useful in developing alert systems and predictive applications connected to online monitoring. & 2008 Elsevier Ltd. All rights reserved.

1.

Introduction

Drinking water produced in waterworks almost invariably fulfils the water quality requirements set for example in European Union drinking water directive (DWD) (Council Directive 98/83/EC, 1998). However, the DWD requires that water quality should also meet the requirements at the consumer’s tap. When drinking water is distributed through pipelines, biofilms will grow on the inner surfaces of the pipes and soft deposits consisting of organic and inorganic matter and several metals will accumulate to the pipelines (Gauthier Corresponding author. Tel.: +358 17 163208; fax: +358 17 162139.

E-mail address: [email protected] (A. Hirvonen). 0043-1354/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.watres.2008.01.015

et al., 1999; Zacheus et al., 2001; Lehtola et al., 2004a). The soft deposits in pipelines originate mainly from water, corrosion products of pipelines and particles and chemicals from water treatment processes (Gauthier et al., 1999). Rapid changes in water flow or pressure can detach biofilms and deposits and thus temporarily deteriorate the drinking water quality (Vreeburg and Schaap, 2004; Lehtola et al., 2004b, 2006a, b). This can be seen as elevated concentrations of bacteria, iron and turbidity in the water. There are several studies showing the enhanced survival of pathogenic microbes in biofilms (Buswell et al., 1998; Percival and Walker,

ARTICLE IN PRESS 2422

WAT E R R E S E A R C H

42 (2008) 2421– 2430

1999; Skraber et al., 2005). This means that sloughing of the biofilms to water may also pose a health risk to drinking water consumers. Waterworks are required to check their water quality by analysing water samples taken from the waterworks and distribution system. The minimum number of analyses for check monitoring is defined in DWD. The sampling frequency depends on the size of the waterworks, e.g. for the largest waterworks producing more than 100,000 m3/day of water, DWD requires that check monitoring should be done virtually on a daily basis (Council Directive 98/83/EC, 1998). However, even this frequency is not sufficient for detecting abrupt but transient events that nevertheless create major disturbances in drinking water distribution system. Earlier it has been found that the number of bacteria in water correlates with turbidity and the number of particles in water (Lehtola et al., 2006a, b). Nowadays, it is possible to monitor turbidity and number of particles with online sensors. There are also other parameters such as water flow, pressure, pH and temperature, which can be monitored online; in fact several waterworks have applied these parameters in their process monitoring. However, evaluating large amounts of data with different time dimensions and pinpointing the abnormal changes in water quality is challenging. These heterogeneous data need more sophisticated analysing tools than can be provided with traditional methods. Computationally intelligent methods provide a novel approach for analysing multivariate data. Computational intelligence (CI) possesses similarities with the function of the human brain in accomplishing learning, memory and pattern recognition (King, 1998). The CI approach utilises measurement data and constructs abstract models using different algorithms such as neural networks (Kolehmainen, 2004). There are many ways for defining computational intelligence (Kolehmainen, 2004). One approach is to list generally used methods such as neural networks, fuzzy logic, genetic algorithms and Bayesian networks. Another way is to define CI through data mining. Data mining concentrates in finding methods and tools for enriching information from huge amounts of data (Hand et al., 2001). The data-mining process consists of several steps. The first stage is called exploratory data analysis (EDA), which can then be followed by more complicated modelling and pattern recognition tasks. The goal of EDA is to explore data without expertise or even any clear idea of what to look for (Kolehmainen, 2004). Visualising data using simple plotting of one or two variables, histograms and box plots are the basic tools used in EDA. These are not very suitable methods, as the dimension of the data increases, and more advanced tools are needed for these multivariate data exploration tasks. Suitable methods in these cases are for example Self-Organising Map (SOM) and Sammon’s mapping or the combination of these. In this study, we generated pressure shocks to create disturbances in a pilot-scale drinking water distribution system. Distribution system’s water quality was monitored with several online analysers and the alterations resulting from the pressure shocks were studied. The entire online data gathered during that same experimental series was analysed with SOM and Sammon’s mapping. The objective was to test

the usefulness of these methods in detecting the water quality changes caused by the simulated pressure shocks from large amounts of multivariate online data.

2.

Materials and methods

2.1.

Distribution system

Experiments were carried out in a pilot-scale drinking water distribution system. The system consisted of two 100 m pipes: 10 mm (ID) copper (Cu) and 12 mm (ID) composite (polyethylene–aluminum–polyethylene) plastic (PE) (Fig. 1). The drinking water was produced in a pilot-scale waterworks, where lake water was coagulated with ferric sulphate (Kemwater PIX-322, Kemira, Finland), flotated, rapid sand filtrated and finally water hardness, alkalinity and pH were adjusted with lime and carbon dioxide. The water was disinfected with UV irradiation (approx. 60 mWs/cm2) and chlorinated with NaOCl (0.6 Cl2 mg/l). Water flow rate in the pipes was 1 l/min (corresponding flow velocity in Cu pipe 0.22 m/s and in PE pipe 0.15 m/s, Reynolds number in Cu pipe 2118 and in PE pipe 1732), pressure was 2.8 bars and temperature was 10 1C. Disturbances in water distribution system were simulated by generating pressure shocks. Pressure shocks were done 7 times during the experiment. The first shock was done after 4 weeks of constant water flow (1 l/min). The subsequent were done 2, 1 weeks, 4, 3, 2 and 1 day after the previous shock. This experimental set up made it possible to study the effects of deposit accumulation time on the amount of matter resuspended into the drinking water after the pressure shocks. Pressure shocks were created with compressed air

compressed air valve

particle counter f pT pT pH

pilot water samples pump

water tank 4m 3

flow adjustment valve

ec

turb outlet

water from pilot waterworks

10 mm Cu- or 12 mm PE-pipe

Fig. 1 – Schematic representation of one pipeline in the pilot distribution system. There were two 100 m loops, of which one was 10 mm copper (Cu) pipeline and the other was 12 mm plastic (PE) pipeline. Pipelines were equipped with online analysers: flow (f), pressure (p), temperature (T), pH, electrical conductivity (ec), turbidity (turb) and particle counter.

ARTICLE IN PRESS WAT E R R E S E A R C H

42 (2008) 2421 – 2430

valves, which were situated at the end of the pipes. The valves halted the water flow very rapidly. The water flow was stopped for 5 s and subsequently, the valves were quickly opened.

2.2.

Online water analyses

Pipelines were equipped with online analysers. Water flow (transmitter 8055, Bu¨rkert, Germany), pressure (transmitter 8320, Bu¨rkert), temperature (ST21, Bu¨rkert), pH (transmitter 8205, Bu¨rkert), electrical conductivity (transmitter 8225, Bu¨rkert) and turbidity (1720E low-range turbidimeter, Hach Company, USA) were measured every other second. Both pipelines had their own analysers for measuring the abovementioned parameters. The numbers of particles in the water were detected with a Pamas water viewer particle counter (Pamas Gmbh, Germany). There was only one particle counter for the purpose of two pipelines and it measured particles at 40 min intervals from both pipes. The result of every measurement was an average of 5 min. The particle counter separated the number of particles per millilitre into 8 size classes in the ranges 1–1.5, 1.5–2, 2–4, 4–8, 8–15, 15–25, 25–50 mm and particles over 50 mm. The locations of the online analysers in the distribution system are presented in Fig. 1.

2.3.

Other water analyses

The water quality of the pilot waterworks was monitored routinely by analysing several parameters in the water samples. pH was measured with WTW pH320-meter (Germany). Water hardness and the levels of chloride, sulphate and total iron (dissolved and suspended) were analysed with an HACH DR/2010 spectrophotometer (Loveland, Colorado, USA) according to the manufacturers’ instructions. Alkalinity was analysed with potentiometric titration according to the Finnish Standards SFS 3005 (1981). Chemical oxygen demand (CODMn) was analysed according to the SFS 3036 (1981).

2.4.

Multivariate data exploration

The online measurement data were collected into a database. One minute averages were calculated for other parameters than particle measurements and data were separated into two different datasets, one for the Cu pipe and one for the PE pipe. These 1 min average datasets were combined with the particle measurement data, which had been collected at 40 min intervals. Thus, the final datasets contained measurements at 40 min intervals. Imputing missing and erroneous data (e.g. originating from interruptions of online analysers) was necessary because the methods used require consistent data. The data reduction that would have occurred if missing or erroneous values were not replaced was avoided by imputing these values. Imputing was done using linear interpolation and SOM methods with Missing Data Toolbox (MDT) software for Matlab 7.0 (MathWorks). The MDT was developed by the research group of Environmental Informatics in the University of Kuopio (Junninen et al., 2004).

2423

SOM is one of the best-known unsupervised learning algorithms of the neural networks (Kohonen, 1997). Unsupervised learning is based on learning meaningful internal statistical representation of the data without an external teacher. The aim of the SOM is to find weight vectors that can represent the input data in compressed format and at the same time to achieve a continuous mapping from the input space to a lattice, consisting of cells (Kolehmainen et al., 2000). The weight vectors are easier to use in the subsequent analysis, e.g. in Sammon’s mapping, than the original measurement vectors due to the reduced number of data. In the SOM algorithm, the rows of the data (i.e. simultaneous measurements of different parameters) are inserted into the cells and each cell represented by a weight vector, has similar rows regarding the training variables selected. The cells close to each other are very similar, in relation to the data they represent, whereas cells in the opposite corners of the map are very different. A different set of training variables gives a different classification. Selected variables can also be presented as bars on the surface of SOM, which increases the information value of the SOM. Changes in values of the measurement parameters and interactions between them can easily be seen with SOM. This helps in defining the best training pattern and also in manual clustering. Sammon’s mapping is a non-linear mapping algorithm aiming to represent points of the original p-dimensional space in two dimensions so that the original structure of the measurement vectors in p-dimensional space is preserved maximally (Sammon, 1969). Sammon’s mapping is very useful in determining the shape and density of clusters and the relative differences between these clusters (Kolehmainen et al., 2003), which cannot be detected with SOM. In Sammon’s mapping, the cells are located so that the distance between the cells represents the dissimilarity. Sammon’s mapping algorithm is however considerably more time consuming than the SOM algorithm (Kolehmainen et al., 2003). In this implementation, computation time was reduced by using weight vectors of SOM as the starting point of Sammon’s mapping algorithm. Consequently, the strengths of these methods were combined. The software used for visualising the data was Visual Data, developed by Visipoint Oy (Kuopio, Finland), and it is based on the Neural Data Analysis (NDA) software package. The data were preprocessed by variance scaling of the variables, i.e. by scaling linearly each variable so that its average was zero and variance was one. This was done using Visual Data software before applying the SOM and Sammon’s mapping algorithms to the data. Variance scaling of the variables is useful in cases when the data consist of variables that have very different scales. Variance scaling is also not sensitive to the effect of outliers (values that differ greatly from the normal variation of values) in the data.

2.5.

Statistical methods

The statistical significances were analysed with pairedsamples t-test and independent-samples t-test with SPSS 14.0 for Windows program. Pearson correlations were calculated with the same program.

ARTICLE IN PRESS 2424

WAT E R R E S E A R C H

3.

Results

3.1.

Water quality

42 (2008) 2421– 2430

The water produced in the pilot waterworks during the study was of uniform quality (Table 1). There were no significant changes in water pressure or temperature in the distribution system. The water flow was also rather constant, except during the pressure shocks when the flow was stopped for 5 s. The pressure shocks detached biofilms and soft deposits from the surfaces of pipes, which was reflected as an increased number of bacteria and concentrations of iron and copper (in Cu pipe). These results can be seen from our previous publication concerning this same experimental setup (Lehtola et al., 2006b). The detachment was also detected as elevated values in online analysers. Particles of different sizes were detached as a result of pressure shocks from both Cu and PE pipes (Fig. 2). The particle results are 5 min averages, 35 min before and 5 min after the pressure shocks. The effect of the pressure shocks was most evident on the smallest particles (1–8 mm). The release of these particles into water was up to several thousands per millilitre, which meant that their number had grown by many tenfolds. The larger the particle size, the less its number increased in water in response to the pressure shocks. The increase in the largest particles (over 50 mm) was only a few particles per millilitre. On the other hand, the concentrations of the three largest particle fractions (415 mm) were also normally very small, only a few dozens in a millilitre. There was no statistical difference between the pipe materials in terms of the released amount of particles. Deposit accumulation time had some effect on the detachment of particles under 15 mm. In the PE pipe, the release of these particles was greatest after the first three pressure shocks. In the Cu pipe, the release of 1–2 mm particles was greatest in response to the first two pressure shocks, but the release of particle sizes 2–15 mm became less extensive after every subsequent pressure shock. Larger particles (415 mm) were not affected by the accumulation time at all. At a time of 45 min after the pressure shocks, when the particles were measured again, the numbers of all particle sizes had returned back to the level before the disturbance.

Table 1 – Water quality characteristics in the pilot waterworks Parameter pH Alkalinity (mmol/l) Hardness (mmol/l) CODMn (mg/l) Chloride (mg/l) Sulphate (mg/l) Iron (mg/l) Symbols: CODMn, chemical oxygen demand.

Average7stdev (n) 7.970.2 (10) 0.8470.09 (17) 0.8370.03 (17) 2.470.2 (17) 3.570.4 (17) 4670.7 (17) 0.1570.01 (17)

The pressure shocks increased turbidity in both pipes (Fig. 3). The results have been calculated in 1 min averages, 35 min before the pressure shocks and 5 min after them. In this way, the turbidity results can be expressed from the same moment as particle data were collected. The increase of turbidity in the PE pipe was on average 185% (0.32 NTU, po0.001). In the Cu pipe, the turbidity increased on average by 216% (0.35 NTU, p ¼ 0.003). The deposit accumulation time did not affect the increase in turbidity. Turbidity was at its highest about 5 min after the pressure shocks and then it reverted back to its normal level within about 15 min. Electrical conductivity increased in both pipes as a consequence of pressure shocks (Fig. 4). The results have been calculated, in the same way as the turbidities, in 1 min averages, 35 min before and 5 min after the pressure shocks. The average increase of conductivity was 0.5% (0.8 mS/cm, po0.001) and there was no statistical difference between the pipes (p ¼ 0.628). The increase of conductivity was not affected by the accumulation time. Conductivity was at its highest about 5 min after the pressure shocks and then it declined back to the level before the disturbance within about 10 min. During the second and the third pressure shock, the conductivity was about 6% smaller than at other times; this being attributable to a lower carbon dioxide feed. Conductivity in PE pipe was about 3 mS/cm higher than in Cu pipe. The pH values of water were not affected by pressure shocks. Middle-sized particles correlated with turbidity in the data collected 5 min after the pressure shocks (n ¼ 7). In the Cu pipe, turbidity correlated with particle sizes 4–8 mm (r ¼ 0.884, p ¼ 0.008), 8–15 mm (r ¼ 0.956, p ¼ 0.001) and 15–25 mm (r ¼ 0.951, p ¼ 0.001). In the PE pipe, turbidity correlated with particle sizes 4–8 mm (r ¼ 0.932, p ¼ 0.002) and 8–15 mm (r ¼ 0.978, po0.001).

3.2.

Multivariate data exploration

SOM and Sammon’s mapping algorithms were applied for visualising the online data (at 40 min intervals) gathered during the study and for identifying the abnormal changes in water quality within this time. SOMs and Sammon’s mappings were constructed for the data from the Cu and PE pipes separately. Different types of training patterns were tested with the SOM algorithm to identify the data set giving the best classification, to make the abnormal situations most visible. Pressure and temperature did not have any effect on the division of the data because they were constant during the study. And for this reason they were not included into the final training pattern. Water flow was not used as a training variable in order to avoid a circular argument, because the disturbances in the distribution system were created by stopping the flow. The best training pattern for detecting water quality changes in this study were particle measurement data (eight particle size classes) and electrical conductivity. Sammon’s mapping was then constructed using the weight vectors of SOM. The SOM and Sammon’s mapping are presented for the Cu pipe in Fig. 5a and for the PE pipe in Fig. 5b. pH values are presented as grey background tone in SOM and as cell diameter in Sammon’s mapping. The background tone in Sammon’s mapping represents the number of measurements in the cells. The bars on the

ARTICLE IN PRESS WAT E R R E S E A R C H

Particles 1 - 1.5 µm

Particles 1.5 - 2 µm

10000

8000 particles/ml

particles/ml

8000 6000 4000 2000

6000 PE before

4000

PE after

2000

Cu before Cu after

0

0 28

14

7

4 Days

3

2

28

1

particles/ml

particles/ml

7

4 Days

3

2

1

3000

6000 4000 2000

2500 2000 PE before

1500 1000

PE after Cu before

500

Cu after

0

0 28

14

7

4 Days

3

2

28

1

Particles 8 - 15 µm

particles/ml

500 400 300 200 100 0 28

14

7

4 Days

3

14

7

4 Days

3

2

1

Particles 15 - 25 µm

600 particles/ml

14

Particles 4 - 8 µm

Particles 2 - 4 µm 8000

2

70 60 50 40 30 20 10 0

1

PE before PE after Cu before Cu after 28

Particles 25 - 50 µm

14

7

4 Days

3

2

1

Particles over 50 µm

80

25

60

20

particles/ml

particles/ml

2425

42 (2008) 2421 – 2430

40 20 0

15

PE before PE after Cu before Cu after

10 5 0

28

14

7

4 Days

3

2

1

28

14

7

4 Days

3

2

1

Fig. 2 – The numbers of particles (eight size classes) 35 min (40–35 min) before and 5 min (0–5 min) after the pressure shocks. Results are the average of 5 min (n ¼ 1). Number of days describes the time between the pressure shocks.

surface of the SOM represent average values of turbidity within the cells. Water quality changes caused by the pressure shocks can be easily detected from Sammon’s mapping as obvious outliers (cells that are clearly separated from the others). In Fig. 5, these cells are marked with number 1. These outliers represent the data collected 5 min after the pressure shocks. In SOM, these cells are surrounded by empty (black colored) cells, which means that these cells do not contain any data. The empty cells thus indicate that the data collected after the pressure shocks differ clearly from the other data. The cells

marked with number 1 are separated from the others because the values of the training variables, i.e. particles and electrical conductivity, are higher in these cells (Figs. 2 and 4). The bars of these cells in SOM are clearly higher than in other cells, which reflects the effect of pressure shocks as increased turbidity. Fig. 5 highlights that the pressure shocks do not change the pH values because the background tone in SOM and the cell diameter in Sammon’s mapping do not change in those cells compared with others. The tone of these cells in Sammon’s mapping is rather dark, which indicates that they represent only one or few measurements. This is a

ARTICLE IN PRESS 2426

WAT E R R E S E A R C H

42 (2008) 2421– 2430

1.2 1.0

NTU

0.8 0.6 PE before

0.4

PE after

0.2

Cu before Cu after

0.0 28

14

7

4

3

2

1

Days

Fig. 3 – Turbidity 35 min before and 5 min after the pressure shocks. Results are the average of 1 min (n ¼ 30). Number of days describes the time between the pressure shocks.

representing data collected during the lime feed disruption, even though it is not separated as an outlier. Turbidity increased in the Cu pipe by 64% (0.09 NTU) and in the PE pipe by 74% (0.12 NTU), and therefore the bars on the surface of SOM are higher than in most other cells. The cells marked with number 2 form a scattered cluster, because they represent measurements from the beginning to the end of the lime feed disruption and therefore the data are not uniform in respect to the applied training variables. Cells that are unmarked, representing the normal state, form a very tight cluster in Sammon’s mapping. This means that there were no other significant changes in the water quality during the study in respect to particles and electrical conductivity. When observing the background tone and the bars in SOM, it can be noticed that pH and turbidity were also quite stable during the study.

165 160 155

4.

Discussion

µS/cm

150 145 140

PE before

135

PE after

130

Cu before

125

Cu after

120 28

14

7

4

3

2

1

Days

Fig. 4 – Electrical conductivity 35 min before and 5 min after the pressure shocks. Results are the average of 1 min (n ¼ 30). Number of days describes the time between the pressure shocks.

consequence of the short duration of the disturbances and the sparse measurement interval. The shape of the cluster that these cells form is scattered, which means that the data they represent is not uniform. This results from the fact that the water quality changes caused by different pressure shocks differed regarding the values of the applied training variables (Figs. 2 and 4). In Fig. 5, the cells marked with number 2 represent data collected during a problem with the lime feed at the pilot waterworks. Excess lime, both dissolved and insoluble, gained access to the water distribution system during this 7 h episode. This disturbance was not done on purpose but it was detected by means of SOM and Sammon’s mapping. The data collected during this period are concentrated into a few cells, all but one of which are, separated from the other cells. In Sammon’s mapping, these cells are outliers and in SOM they are surrounded by empty cells. The separation of these cells has occurred because of increased values of the training variables. The numbers of the smallest particles (1–8 mm) had increased by tenfold during the lime feed disruption, but there was no increase in the numbers of larger particles. Electrical conductivity increased by 8% (11 mS/cm) during this time. The pH value increased by 18% (1.4) and this can be visualised by as the substantially lighter background colour in the cells of SOM. In SOM analysis for the PE pipe, the light background colour was useful in detecting the third cell

In this study, pressure shocks were generated to evoke disturbances in the pilot water distribution system. Pressure shocks were done after different deposit accumulation times (from 1 to 28 days). The distribution system’s water quality was monitored with several online analysers. Since the times of the pressure shocks were known, it was possible to pick out just the right data, where the water quality changes were detectable, from the large amounts of collected online data. On the grounds of this selected data, the effects of the pressure shocks on the water quality were studied. The pressure shocks led to detachment of biofilms and soft deposits from both copper and plastic pipes, and this was observed to increase electrical conductivity, turbidity and the number of particles in drinking water. The increase in electrical conductivity after pressure shocks was probably a consequence of the release of dissolved forms of iron, copper and calcium from the distribution system. Concentrations of total iron (dissolved and suspended) and dissolved copper (free and complexed) have been observed to increase in response to the pressure shocks (Lehtola et al., 2006b). Iron probably originated from the accumulated coagulation chemicals. A significant proportion of iron might have been in a dissolved form resulting from complexes with organic matter, like humic substances, as demonstrated by Theis and Singer (1974). The copper most likely originated from the uniform corrosion of the copper pipe material (Schock et al., 1995) or it might have been a result of microbially influenced copper solvation (Critchley et al., 2001). The content of calcium was not analysed in this study but it is possible that sedimented calcium, originating from the pH adjustment chemicals, was also released as a result of pressure shocks. This proposal is supported by the significant increase noted in electrical conductivity as a consequence of the lime feed disruption in the pilot waterworks during this study. However, pressure shocks increased conductivity only very marginally, which indicates that this parameter is not very sensitive to abrupt water quality changes resulting from this type of event. There was no difference between the pipe materials in the increase of conductivity noted after the pressure shocks. However, conductivity was constantly

ARTICLE IN PRESS WAT E R R E S E A R C H

42 (2008) 2421 – 2430

2427

Fig. 5 – SOM analysis (left) and Sammon’s mapping (right) of eight particle size class measurements and electrical conductivities for Cu (a) and PE (b) pipe. pH measurements are given as grey background tone in SOM, where a lighter colour means higher pH value and vice versa. The bars on the surface of SOM represent average values of turbidity within the cells (squares). Each circle (cell) in Sammon’s mapping corresponds to a cell in SOM. In Sammon’s mapping, the diameters of the cells correspond to pH value and grey level to the number of measurements in that cell, where a lighter colour means more measurements and vice versa. Distances between the cells in Sammon’s mapping visualise the relative distances in the original nine-dimensional measurement space. The cells marked with the number 1 represent the data measured after the pressure shocks. The cells marked with the number 2 represent the data collected during a disruption of the lime feed at the pilot waterworks.

somewhat higher in the PE pipe than in the Cu pipe, resulting probably from the calibration level of online analysers. In this study, turbidity and the numbers of particles (subdivided into eight size classes) were found to increase as a result of the pressure shocks. Turbidity is monitored

routinely in many waterworks but it is, however, a sum parameter measuring factors affecting cloudiness of water. A particle counter on the other hand offers more advanced measuring technique because it provides quantitative results of different sized particles. However, both turbidity and

ARTICLE IN PRESS 2428

WAT E R R E S E A R C H

42 (2008) 2421– 2430

particle analysers detect at least to some extent the same things, i.e. particulate water quality, which could also be observed in this study as the correlation between middlesized particles and turbidity. There are several possible origins for the particulate matter present in the distribution system, i.e. incomplete removal of suspended solids from raw water at the pilot waterworks, precipitation of iron oxides and calcium carbonates, biological growth and corrosion products from the fittings and the Cu pipe. Some of these particles had sedimented and formed soft deposits in the distribution system. These deposits were then resuspended into water as a consequence of the pressure shocks. The composition of soft deposits on the inner surfaces of distribution pipelines has been studied; they consist of inorganic and organic matter, metals and microbes (Gauthier et al., 1999; Zacheus et al., 2001). In the current study, the pressure shocks probably detached, not only the soft deposits, but also fragments of biofilm, which were detected as larger particles. Some of the smallest particles (1–4 mm) released were most likely detached bacteria. Previously, it has been noted that this particle size fraction correlates with microbial numbers in water (Lehtola et al., 2006a, b). Substantially, more small particles (1–8 mm) were found to be released than larger particles as a result of the pressure shocks. This indicates that either the largest particles are bound more tightly and require more energy to become detached or that the deposits on the pipes simply contain more small particles. Irrespective of the reason, this means that small particles are more sensitive indicators of this type of episode than their larger counterparts. The deposit accumulation time had some influence on particles under 15 mm. The greatest detachment of these particles occurred in response to the first two or three pressure shocks, which indicates that the recovery of these particle sizes in deposits requires 1–2 weeks. In the Cu pipe, the recovery was somewhat slower than in the PE pipe and the recovery of the particle sizes 2–15 mm can take more than 4 weeks. Particles over 15 mm were not affected by the deposit accumulation time at all, which indicates that either they had recovered in less than 1 day or more likely that the detachment of these size classes did not depend only on the pressure shocks. Although turbidity correlated with the middle-sized particles, its increase as a consequence of pressure shocks was not affected by the accumulation time. In the current study, it was possible to explore the water quality changes resulting from the pressure shocks by traditional methods using the selected online data, because the times of the pressure shocks were known. However, in real life huge amounts of collected online data would have to be analysed first to expose the abnormal changes in water quality before the reasons and consequences could be investigated. For this reason the other objective of this study was to test the usefulness of multivariate data exploration methods in pinpointing the abnormal water quality changes in the collected online data. In this study, online monitoring generated hundreds of thousands measuring results of several parameters. This much data would have been very difficult to preprocess and analyse by traditional methods. SOM, along with Sammon’s

mapping, has been successfully applied to evaluate large numbers of multi-dimensional datasets in several fields, including e.g. air quality (Kolehmainen et al., 2000) and bioprocesses (Kolehmainen et al., 2003). Similarly, the concept was applied to the water quality data in this study. All eight particle size classes and electrical conductivity were used as training variables for the SOM algorithm, and the weight vectors of SOM were used as a starting point of Sammon’s mapping algorithm. The most powerful response for the pressure shocks was found to be related to the particles. Consequently, the particle numbers were the best training variables for separating the data related to the pressure shocks from the other data. Using electrical conductivity as an additional training variable helped to separate the lime feed disruption episode from the other data and also from the pressure shocks. Increase in the values of these two parameters appeared in SOM and Sammon’s mapping as clear separation of the cells, which represented the data from the moments of disturbances. Utilisation of pH measurements as the background tone of the cells in SOM made it easier to detect the data related to the lime feed disruption, since in addition to the level of electrical conductivity, the pH value was also very high during that time. The turbidity results were represented as bars on the surface of SOM and those bars visualised the water quality difference between the episodes and the normal situation. The rare incidence of these two disturbance types could be seen from the background tone of the cells in Sammon’s mapping. By observing the shapes of the clusters in Sammon’s mapping, it was possible to deduce how uniform were the data represented by them, regarding particles and electrical conductivity. By applying SOM and Sammon’s mapping, the collected multi-dimensional online data could be presented in very visual, two-dimensional compressed format. Therefore it was possible to pinpoint each and every simulated pressure shock and also the lime feed disruption episode, which occurred without our knowledge, in the online data. With these maps it was possible to detect simultaneous changes in several water quality parameters. The outliers in Sammon’s mapping, made it easier to identify situations when the number of particles or electrical conductivity had altered. In SOM, the background tone helped to detect the changes in pH values and the bars on surface clarified the level of turbidity in different situations. The cells that represented the data related to pressure shocks and lime feed disruption, displayed most of those properties. It is important to evaluate the data as a whole, i.e. values of all parameters, because a change in only one water quality parameter does not necessarily mean that there has been a disturbance in either water treatment or in the distribution system. The evaluation of several parameters can also be helpful in determining the cause of a possible disturbance. For example, small particles and turbidity increased during both lime feed disruption and pressure shocks, however, pH value increased only during the lime feed disruption. The water quality changes could be easily detected by evaluating the online data with SOM and Sammon’s mapping, even though the utilisation of the whole dataset was limited. Only one particle counter was available in the study assessing both pipelines. Therefore particle results were collected at

ARTICLE IN PRESS WAT E R R E S E A R C H

42 (2008) 2421 – 2430

relatively long intervals (40 min). However, multivariate data exploration methods require consistent data and thus a large part of data gathered by other online analysers had to be left unused this time. In general, this problem could be avoided if it were possible to collect all of the data with the same time intervals. This study showed that the water quality changes caused by the pressure shocks were transient and their duration was only a few dozens of minutes. It is practically impossible to detect this kind of transient event by using conventional offline sampling methods. Nevertheless, it is important to monitor and control transient alterations in the water quality since they can help to draw attention to possible black spots in water treatment or its distribution. Even though water works have utilised some online monitoring of their distribution systems, differentiating the abnormal changes in water quality from normal variation is a challenging task. One possibility to detect these quality alterations is to utilise an alert system based on monitoring the changes of Euclidean distance of the selected parameters with respect to their normal level. An alert will be raised when this distance reaches a selected value. SOM and Sammon’s mapping are also useful in developing alert systems connected to online monitoring, even though in the current study these methods were utilised only in analysing the data. An alert system based on SOM and Sammon’s mapping offers clear advantages compared with the utilisation of Euclidean distance. The measured vector is not compared with just one selected operation point, like when applying Euclidean distance, but with the whole history of collected data that SOM possesses. This kind of system can identify the problem connected to the changed values if SOM possesses data related to that particular disturbance. By applying SOM and Sammon’s mapping, the situation can also be presented in a visual format, which helps making a better strategy for fixing the problem in question. Ra¨sa¨nen et al. (2006), for instance, have tested a self-refreshing modification of the SOM on process state monitoring of a circulating fluidised bed energy plant. This version of SOM could also be applicable to online monitoring of water quality and detecting abnormal situations. SOM and Sammon’s mapping could also be useful in modelling to build up predictive applications for controlling water quality in e.g. water distribution systems. However, the utilisation of multivariate data exploration methods in the field of drinking water research needs to be studied more.

5.

Conclusions

The simulated pressure shocks detached biofilms and soft deposits from the surfaces of pipes, and this increased the number of particles, turbidity and electrical conductivity in drinking water. The effect of the pressure shocks was most evident on the particles, especially on the smaller sizes. The effect on turbidity was also remarkable, but the electrical conductivity increased only slightly. Water quality changes resulting from the pressure shocks were very transient and therefore could be detected practically only with online analysers.

2429

SOM and Sammon’s mapping proved to be valuable tools in detecting abnormal changes in water quality from large amounts of multivariate online data. They provided visual, two-dimensional interpretation of the collected data and separated the abnormal situations from the normal data. Therefore SOM and Sammon’s mapping enable detecting abrupt changes in water quality and reacting rapidly to any disturbances in the system. This means that in future it is possible to develop alert systems or even predictive applications based on these methods for controlling water quality in drinking water distribution systems. However, further studies are needed of these application areas.

Acknowledgements This study was supported by the National Technology Agency of Finland (project 40407/04). We acknowledge the help from the personnel of Savonia Polytechnic and National Public Health Institute in Kuopio. We also acknowledge Outokumpu Pori Tube Ltd., Uponor Finland Ltd., Kemira Ltd., Kuopio Water, Finnish Water and Waste Water Works Association and Ministry of Social Affairs and Health for supporting this study. R E F E R E N C E S

Buswell, C.M., Herlihy, Y.M., Lawrence, L.M., McGuiggan, J.T.M., Marsh, P.D., Keevil, C.W., Leach, S.A., 1998. Extended survival and persistence of Campylobacter spp. in water and aquatic biofilms and their detection by immunofluorescent-antibody and -rRNA staining. Appl. Environ. Microbiol. 64, 733–741. Council Directive 98/83/EC of 3 November 1998 on the quality of water intended for human consumption. Off. J. Eur. Commun. 330, pp. 32–54. Critchley, M.M., Cromar, N.J., McClure, N., Fallowfield, H.J., 2001. Biofilms and microbially influenced cuprosolvency in domestic copper plumbing systems. J. Appl. Microbiol. 91 (4), 646–651. Gauthier, V., Ge´rard, B., Portal, J.-M., Block, J.-C., Gatel, D., 1999. Organic matter as loose deposits in a drinking water distribution system. Water Res. 33 (4), 1014–1026. Hand, D.J., Smyth, P., Mannila, H., 2001. Principles of Data Mining. MIT Press, Cambridge, MA, USA. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M., 2004. Methods for imputation of missing values in air quality data sets. Atmos. Environ. 38 (18), 2895–2907. King, R.L., 1998. Artificial neural networks and computational intelligence. Comput. Appl. Power 11, 14–25. Kohonen, T., 1997. Self-Organizing Maps, second ed. Springer, Heidelberg, Germany. Kolehmainen, M., Martikainen, H., Hiltunen, T., Ruuskanen, J., 2000. Forecasting air quality parameters using hybrid neural network modeling. Environ. Monit. Assess. 65, 277–286. Kolehmainen, M., Ro¨nkko¨, P., Raatikainen, O., 2003. Monitoring of yeast fermentation by ion mobility distribution measurement and data visualisation with self-organizing maps. Anal. Chim. Acta 484, 93–100. Kolehmainen, M.T., 2004. Data exploration with self-organizing maps in environmental informatics and bioinformatics. Doctoral Dissertation, Kuopio University Publications C. Natural and Environmental Sciences 167, Kuopio, Finland. Lehtola, M.J., Juhna, T., Miettinen, I.T., Vartiainen, T., Martikainen, P.J., 2004a. Formation of biofilms in drinking water distribution

ARTICLE IN PRESS 2430

WAT E R R E S E A R C H

42 (2008) 2421– 2430

networks, a case study in two cities in Finland and Latvia. J. Ind. Microbiol. Biotechnol. 31 (11), 489–494. Lehtola, M.J., Nissinen, T.K., Miettinen, I.T., Martikainen, P.J., Vartiainen, T., 2004b. Removal of soft deposits from the distribution system improves the drinking water quality. Water Res. 38 (3), 601–610. Lehtola, M.J., Laxander, M., Miettinen, I.T., Hirvonen, A., Vartiainen, T., Martikainen, P.J., 2006a. The effects of changing water flow velocity on the formation of biofilms and water quality in pilot distribution system consisting of copper or polyethylene pipes. Water Res. 40 (11), 2151–2160. Lehtola, M.J., Miettinen, I.T., Hirvonen, A., Vartiainen, T., Martikainen, P.J., 2006b. Resuspension of biofilms and sediments to water from pipelines as a result of pressure shocks in drinking water distribution system. In: International Conference (IWA) Biofilm Systems VI, Amsterdam, September 24–27, 2006, CD-Rom. Percival, S.L., Walker, J.T., 1999. Potable water and biofilms: a review of the public health implications. Biofouling 42, 99–115. Ra¨sa¨nen, T., Kettunen, A., Niemitalo, E., Hiltunen, Y., 2006. Selfrefreshing SOM for dynamic process state monitoring in a circulating fluidized bed energy plant. In: IS06 2006 3rd International IEEE Conference on Intelligent Systems, University of Westminster, September 4–6, 2006. IEEE IM/SMC and Harrow School of Computer Science, pp. 344–349.

Sammon Jr., J.W., 1969. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C-18 (5), 401–409. Schock, M.R., Lytle, D.A., Clement, J.A., 1995. Effect of pH, DIC, orthophosphate and sulfate on drinking water cuprosolvency. EPA Report EPA/600/R-95/085. SFS 3005, 1981. Determination of alkalinity and acidity in water. Potentiometric titration. Finnish Standards Association SFS, Helsinki, Finland. SFS 3036, 1981. Determination of chemical oxygen demand (CODMn or KMnO4 number) in water. Oxidation with permanganate. Finnish Standards Association SFS, Helsinki, Finland. Skraber, S., Schijven, J., Gantzer, C., de Roda Husman, A.M., 2005. Pathogenic viruses in drinking-water biofilms: a public health risk? Biofilms 2 (2), 105–117. Theis, T.L., Singer, P.C., 1974. Complexation of iron(II) by organic matter and its effect on iron(II) oxygenation. Environ. Sci. Technol. 8 (6), 569–573. Vreeburg, J.H.G., Schaap, P.G., 2004. Measuring discoloration risk: resuspention potential method. In: Second IWA Leading-Edge Conference on Water and Wastewater Treatment Technologies, London, UK. Zacheus, O.M., Lehtola, M.J., Korhonen, L.K., Martikainen, P.J., 2001. Soft deposits, the key site for microbial growth in drinking water distribution networks. Water Res. 35 (7), 1757–1765.