Analytica Chimica Acta 705 (2011) 243–252
Contents lists available at ScienceDirect
Analytica Chimica Acta journal homepage: www.elsevier.com/locate/aca
Modelling spatial and temporal variations in the water quality of an artificial water reservoir in the semiarid Midwest of Argentina Fabricio D. Cid a,b,c,∗ , Rosa I. Antón d , Rafael Pardo e , Marisol Vega e , Enrique Caviedes-Vidal a,b,c a
Laboratory of Biology “Prof. E. Caviedes Codelia”, Facultad de Ciencias Humanas, Universidad Nacional de San Luis, San Luis, Argentina Laboratory of Integrative Biology, Institute for Multidisciplinary Research in Biology (IMIBIO-SL), Consejo Nacional de Investigaciones Científicas y Técnicas, San Luis, Argentina Department of Biochemistry and Biological Sciences, Facultad de Química, Bioquímica y Farmacia, Universidad Nacional de San Luis, San Luis, Argentina d Department of Analytical Chemistry, Facultad de Química, Bioquímica y Farmacia, Universidad Nacional de San Luis, San Luis, Argentina e Department of Analytical Chemistry, Facultad de Ciencias, Universidad de Valladolid, Valladolid, Spain b c
a r t i c l e
i n f o
Article history: Received 29 November 2010 Received in revised form 23 May 2011 Accepted 8 June 2011 Available online 15 June 2011 Keywords: Water quality Water reservoir Modelling N-way principal component analysis Parallel Factor Analysis Tucker3
a b s t r a c t Temporal and spatial patterns of water quality of an important artificial water reservoir located in the semiarid Midwest of Argentina were investigated using chemometric techniques. Surface water samples were collected at 38 points of the water reservoir during eleven sampling campaigns between October 1998 and June 2000, covering the warm wet season and the cold dry season, and analyzed for dissolved oxygen (DO), conductivity, pH, ammonium, nitrate, nitrite, total dissolved solids (TDS), alkalinity, hardness, bicarbonate, chloride, sulfate, calcium, magnesium, fluoride, sodium, potassium, iron, aluminum, silica, phosphate, sulfide, arsenic, chromium, lead, cadmium, chemical oxygen demand (COD), biochemical oxygen demand (BOD), viable aerobic bacteria (VAB) and total coliform bacteria (TC). Concentrations of lead, ammonium, nitrite and coliforms were higher than the maximum allowable limits for drinking water in a large proportion of the water samples. To obtain a general representation of the spatial and temporal trends of the water quality parameters at the reservoir, the three-dimensional dataset (sampling sites × parameters × sampling campaigns) has been analyzed by matrix augmentation principal component analysis (MA-PCA) and N-way principal component analysis (N-PCA) using Tucker3 and PARAFAC (Parallel Factor Analysis) models. MA-PCA produced a component accounting for the general behavior of parameters associated with organic pollution. The Tucker3 models were not appropriate for modelling the water quality dataset. The two-factor PARAFAC model provided the best picture to understand the spatial and temporal variation of the water quality parameters of the reservoir. The first PARAFAC factor contains useful information regarding the relation of organic pollution with seasonality, whereas the second factor also encloses information concerning lead pollution. The most polluted areas in the reservoir and the polluting sources were identified by plotting PARAFAC loadings as a function of the UTM (Universal Transverse Mercator) coordinates. © 2011 Elsevier B.V. All rights reserved.
1. Introduction The pollution of terrestrial water bodies is one of the most serious threats to freshwater supplies in the world [1]. Toxic chemicals can be introduced to the environment through a great variety of human activities, the major sources of water pollution being mining, manufacturing, farming, power production and runoff from urban and suburban sprawl [2]. Many of these polluting chemicals pose a serious risk to human health, damage the environment or delay the recovery of the ecological systems [3,4]. There has been
∗ Corresponding author at: Chacabuco 917, Post Code: 5700, San Luis, Argentina. Tel.: +54 02652 423789x129. E-mail address:
[email protected] (F.D. Cid). 0003-2670/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2011.06.013
an increasing concern during the last decades about the entry of potential contaminants into the food chain [5–7]. It is therefore mandatory to understand the general fate and effects of chemicals to assess the health of ecosystems and to provide early warnings of changes in the environment that might indicate adverse effects [8]. Currently, to avoid the health risk, many countries perform regular monitoring of the water quality of their more important water systems [9], producing large amounts of multivariate datasets (e.g. many variables measured on many objects) whose handling and interpretation can be difficult by conventional statistical techniques [9]. Additionally, the complex nature of the environmental topics usually requires finding and proposing simple models to identify and study those variables having a greater environmental impact. Chemometric multivariate techniques are powerful tools
244
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
Fig. 1. Geographic location of the “Embalse La Florida” water reservoir and water sampling points superimposed. The dot lines and grey lines in the Embalse La Florida map represent the two concrete dams and the two spillways respectively.
that can help to find statistically important factors explaining the data variability and also to formulate general conclusions about the model [9–11]. In this sense, the dimensionality of a multivariate dataset can be reduced by principal component analysis (PCA) providing an easy visualization of the relationships existing amongst variables and objects [12]. Classical 2-PCA is of direct application for the interpretation of datasets arranged in a bi-dimensional matrix (e.g. objects and variables); however, when the dataset has a more complicated multidimensional structure (e.g. objects, variables and time) it becomes essential to apply a multi-way analysis, such as N-PCA, to fully explore and extract the hidden structure of data and their relationships [13]. The potential of some of these multivariate techniques (e.g. Tucker3 and PARAFAC) to understand and model the temporal and spatial variations of polluting substances in the environment has been fully demonstrated [9,10,12–16], allowing the finding of useful conclusions not available at a first glance. Midwestern Argentina is a semiarid region with a seasonal rainfall regime, so many dams have been built to maintain a continuous water supply throughout the year. In the Sierras Pampeanas (Pampas Mountains) region of San Luis and Córdoba provinces there are more than twenty-four water reservoirs, whose main function is to supply water for human consumption, irrigation and livestock [17,18]. The Embalse La Florida is one of the most important water reservoirs providing drinking water for around 70% of the human population (cities of San Luis, Juana Koslay, La Florida, El Trapiche, El Volcán, Villa Mercedes y Justo Daract) as well as irrigation water for
around 20% of the cultivable land of San Luis province. The shorelines of the reservoir have a wide variety of habitats, such as forests, shrubs, recreational zones (campsites), and also includes an area with high environmental value (nature reserve) Reserva Natural La Florida [19]. At present, there is a growing concern about the water quality of this reservoir, after the detection of organochlorines and toxic metals in the water and organisms living in this ecosystem [19–21]. Additionally, as the Embalse La Florida belongs to the Río Quinto river basin, the water quality will have a downstream impact, affecting areas of the provinces of San Luis, Córdoba, Santa Fé, La Pampa and Buenos Aires. The main goal of this study is to analyze the spatial and temporal variations of the water quality of Embalse La Florida and to identify contamination patterns that could help to model the potential environmental risks of reservoir zones having different degrees of human disturbance. With this purpose, a variety of chemometric multivariate tools, including matrix augmentation-PCA (MA-PCA) and N-PCA (Tucker3 and PARAFAC) have been applied to visualize and interpret the information contained in the dataset resulting from the determination of 30 physicochemical and biological variables in 418 water samples, collected in 38 sampling points located in the Embalse La Florida water reservoir (San Luis, Argentina) during 11 sampling campaigns carried out in 1998–2000. Since these data are the first systematic study on this reservoir, they represent valuable information to set up a baseline to compare with in future studies and management.
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252 Table 1 Monthly precipitations and mean water temperature at the reservoir during the studied period. Month-year
Precipitation (mm)a
Water temp. (◦ C)
October-1998 December-1998 February-1999 May-1999 June-1999 August-1999 November-1999 January-2000 March-2000 April-2000 June-2000
104.1 63.5 25.9 0.0 0.3 0.0 126.5 153.9 145.8 38.1 16.5
13.9 15.1 13.7 6.3 5.7 5.3 12.0 14.0 13.7 6.0 5.4
a The precipitation data correspond to the weather station of San Luis airport (33◦ 16 23 S–66◦ 21 23 W), which is the nearest station to Embalse La Florida.
2. Materials and methods
245
GPS system. The study was conducted during the period comprised between October 1998 and June 2000 with a bimonthly sampling frequency, making a total of 11 sampling campaigns covering the warm/rainy (October–April) and cold/dry (May–September) seasons. Three surface water samples (0.1–0.3 m) were collected at each sampling point in sterile bottles of different materials depending on the chemical analysis to be performed. Samples for metal analysis were taken in sterile amber glass bottles (100 ml) and immediately acidified with nitric acid to a pH below 2.0. Amber sterile glass bottles of 1000 ml capacity were used to obtain the samples for bacteriological studies, COD and BOD analysis. Finally, the samples for other analytical determinations were collected with plastic polyethylene bottles (1000 ml) previously rinsed twice with water of the sampling point. Samples were transported to the laboratory within 2 h after collection and stored at 4 ◦ C until determination. All instruments and equipment used in the sample collection were acid-washed and rinsed several times with de-ionized water.
2.1. Study area The study was performed in the Embalse La Florida (33◦ 07 S–66◦ 02 W; 1030 m a.s.l.), located 46 km Northeast of San Luis city in a sierras system in the geographical center of San Luis province, Coronel Pringles Department, Argentina (Fig. 1). The reservoir was built more than 50 years ago on the Río Quinto basin and has two tributary rivers (Río Grande and Río Trapiche). It has two big concrete dams (310 m of length × 66 m of height and 553 m of length × 48 m of height respectively), three small barriers and two spillways (194 m and 93 m of length) with a discharge capacity of 2000 m3 s−1 (Fig. 1). The surface area of this reservoir covers about 6.52 km2 , extending some 5 km above the dam averaging 1.8 km in width, with a perimeter of about 36 km and a water capacity of 100.97 hm3 . Depth along the major axis averages 15 m, reaching up to 45 m in the middle of the reservoir. The climate of the studied zone is semiarid, with annual rainfalls of 500–600 mm concentrated mostly in the warm season (October–April) and the mean temperature varies from 23 ◦ C in January to 10 ◦ C in July. Thus, the reservoir refill is produced by summer rains that enter the reservoir mainly through the tributary rivers, Río Grande and Río Trapiche (Fig. 1). Monthly precipitations in the San Luis area and reservoir surface water temperatures during the studied period are shown in Table 1. The rainfall seasonality has strong effects on tributary rivers and surface runoffs, increasing the water inputs in several orders of magnitude, originating concomitantly an increase in the concentration of materials, including pollutants, discharging in the reservoir. The Embalse La Florida possesses a variety of environments on its shorelines represented by several plant communities, as well as zones with a high human interaction. At present, there are five camping areas located in the south shore of the water body, hosting a considerable amount of tourists in summer (January and February). Also, during summer the tourist village of El Trapiche (located on Trapiche river) trebles its population. In addition, near the south shoreline of the reservoir there is a sewage system used to transport wastewater from El Trapiche to the La Florida wastewater treatment plant. 2.2. Sampling The reservoir was divided in eight sub-zones for purposes of sampling, each one having 4 or 5 sampling sites, with a total of 38. The division was made on the basis of previous studies about the shore shape and surface water dynamics of the water reservoir. The Universal Transverse Mercator (UTM) coordinates of each sampling site were accurately determined with a GARMIN 12XL
2.3. Analytical procedures The following 30 water quality parameters were analyzed in each of the 418 surface water samples: dissolved oxygen (DO), electrical conductivity (EC), pH, ammonium, nitrate, nitrite, total dissolved solids (TDS), alkalinity, hardness, bicarbonate, chloride, sulfate, calcium, magnesium, fluoride, sodium, potassium, iron, aluminum, silica, phosphate, sulfide, arsenic, chromium, lead, cadmium, chemical oxygen demand (COD), biochemical oxygen demand (BOD), viable aerobic bacteria (VAB) and total coliform bacteria (TC). Dissolved oxygen (DO), conductivity, pH and temperature were determined in situ using portable sensors (OAKTON RS 232 and pHMeter ORION-290A). The analytical determinations were performed in accordance with the American Public Health Association standard methods [22] and validated with the appropriate standards. Quality was ensured through careful standardization, procedural blank measurements, and use of spiked and duplicate samples. All chemical determinations were carried out also in duplicate. 2.4. Multivariate analysis The resulting dataset of spatial and temporal water quality assessment of the Embalse La Florida has three modes or dimensions: 30 parameters or variables (nvar) measured in 38 sampling sites (nsamp) along 11 sampling campaigns (ntime). The dataset can be arranged into a three-dimensional array (parallelepiped matrix) X of dimensions (nsamp × nvar × ntime), whose complexity requires multivariate data analysis to provide a more comprehensive interpretation and to formulate general conclusions about the underlying model. Multivariate analysis was performed through matrix augmentation principal component analysis (MA-PCA) and N-way principal component analysis (PARAFAC and Tucker3). 2.4.1. MA-PCA The main application of classic PCA (or 2-PCA) is to detect the relationship between objects and variables of a twodimensional dataset. However, three-dimensional datasets can be still analyzed by 2-PCA if they are previously reordered to obtain a two-dimensional array [23]. This reordering, usually known as matrix augmentation, consists in the j-unfolding of X, a parallelepiped of size (nsamp × nvar × ntime), into a two-dimensional Xaug ((nsamp × ntime) × nvar) matrix having nsamp × ntime = 38 × 11 = 418 rows and nvar = 30 columns,
246
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
allowing the application of classic 2-PCA. Thus, MA-PCA decomposes the Xaug matrix according to: aug
Xij
=
NF
3.1. Descriptive statistics aug
uif vif + eij
f =1 aug
where uif and vif are the elements of the scores and loading matrices Uaug and V of ((nsamp × ntime) × NF) and (nvar × NF) dimensions, respectively, and eij is the error term of the element Xij of the X data array. NF is the number of factors, kept as low as possible, but accounting for the greater amount of explained variance. Whereas the loadings vif will provide useful insights about the relationships amongst the nvar variables, the information of the other two dimensions or modes becomes mixed in the uif scores, making their interpretation difficult and limiting the usefulness of MA-PCA. 2.4.2. N-way principal component analysis (N-PCA) Unlike MA-PCA, N-PCA models take into account the three dimensions of X, so the information present in each mode can be completely separated. Several methods have been proposed, but Tucker3 and PARAFAC are the most common models to carry out N-PCA. Tucker3 model is based upon the following decomposition of X according to [24]:
Xijk =
Q R P
aip bjp ckr gpqr + eijk
p=1 q=1 r=1
Being aip , bjq and ckr the elements of the loading matrixes A, B and C, with (nsamp × P), (nvar × Q) and (ntime × R) dimensions respectively, accounting for the information contained in the three modes of the original dataset, gpqr the elements (p, q, r) of the core array G (P × Q × R), accounting for the interactions amongst A, B and C, and eijk the error term of the element Xijk in the X dataset. The squared 2 element reflects the strength of the interactions amongst the gpqr three modes, and P, Q and R are also kept as low as possible but still accounting for a significant amount of variance. On the other hand, PARAFAC model requires usually fewer degrees of freedom or dimensions than classic PCA or Tucker3 [25]. In this case, the decomposition is carried out as follows [25]:
Xijk =
NF
3. Results and discussion
aif bif ckf + eijk
f =1
As in Tucker3 model, there are three loading matrices, A, B, and C of (nsamp × NF), (nvar × NF) and (ntime × NF) with elements aif , bjf , and ckf respectively, accounting for the information of the three modes of X, while eijk and Xijk have also the same meanings as above. PARAFAC model can be seen as a constrained Tucker3 model having P = Q = R = NF that is, as above, kept as low as possible, and a G core matrix with all the superdiagonal elements equal to 1. Whereas PARAFAC loadings are interpreted in a similar way to MA-PCA, the interpretation of Tucker3 is more difficult and takes into account the magnitude and sign of the elements of the core matrix G. 2.4.3. Calculations MINITAB 13.0 and MATLAB 6 software were used for statistical calculations. PARAFAC and Tucker3 analysis were carried out by using the N-way toolbox for MATLAB [26] (http://www.models.kvl.dk/courses/). Contour plots were drawn with SURFER 8.00 using the Kriging interpolation procedure.
Table 2 shows a summary of the descriptive statistics of the 30 water quality variables analyzed in the 418 surface water samples collected at the Embalse La Florida. In this table it is possible to observe some parameters, such as nitrite, lead, cadmium, COD, VAB or TC showing very large variation ranges, leading to differences between maximum and minimum values of two or more orders of magnitude. Such large variations may be associated with the sampling site or with seasonal changes (temperature, precipitation, etc.), a fact that it is not possible to discriminate by only considering the parameters individually, as can be seen in Fig. 2, that shows the variations of nitrite, lead, COD, and TC as a function of both sampling point and sampling campaign. Such graphs are necessarily complex and difficult to interpret and, in our case, only the seasonal variability and the existence of sampling points having extreme values of the water quality parameters can be surely ascertained. However, the bulk of the information, corresponding to the behavior of the individual sampling points or to the relationships amongst the different water quality variables, remains hidden and can be only highlighted by using multivariate techniques, as it will be described below. Some parameters have values above the recommended limits for drinking water [27,28]. For example, the Argentinian Food Code [27] specifies that total coliform values higher than 3 MPN per 100 ml pose a health risk for human consumption. In our case, 92% of the analyzed water samples exceeded that critical value of coliforms. In this same way, 51% of the samples had ammonia concentrations above the Argentinean drinking water limit of 0.2 mg L−1 [27]. Also, the nitrite concentrations found in 61% of the water samples were above the guideline values for human water consumption [27]. All water samples contained measurable concentrations of lead, 27% having lead concentrations above 50 g L−1 , the Argentinean limit for drinking water [27]. However, if the more restrictive EPA (15 g L−1 ) or WHO (10 g L−1 ) lead guideline values are considered instead, then 78% or 89% of water samples exceeded the corresponding limits [28,29]. Cadmium was detected in almost all water samples, but only three samples (1%) exceeded 5 g L−1 , which is the maximum cadmium level accepted in drinking water by both the Argentinean food code and EPA [27,28]. These results indicate that, in general, water coming from the Embalse La Florida was not suitable for direct human consumption without a previous treatment. 3.2. MA-PCA The tridimensional dataset matrix of water quality parameters, X, with dimensions (38 × 30 × 11) was initially j-unfolded to give a two-way data array, Xaug , with dimensions (418 × 30) that was then analyzed with classic 2-PCA. The analysis was based on the eigenvalue decomposition of the correlation matrix in order to compensate the differences in magnitude and measurement scales of the experimental parameters. As mentioned above, the number of factors should be kept as low as possible, and in order to best visualize the relationships amongst samples (418) and variables (30), only the two first components, accounting for a 32% of the variability in the augmented dataset, were taken in consideration. Fig. 3a shows the loadings plot for the first and second components (PCs), explaining respectively 20% and 12% of the total variance or variability in the dataset. As can be seen, PC1 has strong positive loadings, greater than 0.5, for ammonium, nitrite, nitrate, sulfide, alkalinity, bicarbonate, phosphate, BOD, TC, COD and VAB. On the contrary, sodium, pH, chloride and DO have moderate negative
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
247
Table 2 Mean values, standard deviation (SD), minimum and maximum values of different water-quality parameters at Embalse La Florida. Variable
Mean
SD
Min
Max
Dissolved oxygen—DO (mg L−1 ) Conductivity—EC (S cm−1 ) pH Ammonium (mg L−1 ) Nitrate (mg L−1 ) Nitrite (mg L−1 ) Total dissolved solids—TDS (mg L−1 ) Alkalinity (mg L−1 CaCO3 ) Hardness (mg L−1 CaCO3 ) Bicarbonate (mg L−1 ) Chloride (mg L−1 ) Sulfate (mg L−1 ) Calcium (mg L−1 ) Magnesium (mg L−1 ) Fluoride (mg L−1 ) Sodium (mg L−1 ) Potassium (mg L−1 ) Iron (mg L−1 ) Aluminum (mg L−1 ) Silica (mg L−1 ) Phosphate (g L−1 ) Sulfide (mg L−1 ) Arsenic (g L−1 ) Chromium (g L−1 ) Lead (g L−1 ) Cadmium (g L−1 ) COD (mg O2 L−1 ) BOD (mg O2 L−1 ) Viable anaerobic bacteria—VAB (MPNb ) Total coliform—TC (MPNb )
9.27 178.08 7.46 0.34 1.28 0.17 157.60 74.12 106.25 90.36 10.23 29.31 26.85 9.49 0.33 14.61 2.86 0.21 0.26 9.72 1.69 0.22 12.36 20.60 44.60 0.44 13.79 9.44 1617.00 108.88
1.85 18.63 0.39 0.30 1.05 0.12 17.04 6.25 6.99 7.64 2.61 10.68 3.87 2.33 0.08 4.58 0.70 0.08 0.08 1.17 1.10 0.09 3.39 6.29 51.32 0.57 5.67 3.94 1859.00 141.22
4.00 117.70 6.07 0.02 0.05 0.01 120.50 55.00 62.55 67.10 6.97 10.75 15.03 3.10 0.15 7.10 1.25 0.05 0.10 7.25 0.16 0.05 4.15 4.30 0.25 NDa 6.00 3.70 75.00 ND
12.80 229.70 8.30 1.80 7.12 0.87 204.00 90.00 131.40 109.98 19.40 66.67 41.30 15.60 0.56 32.10 6.90 0.74 0.54 15.10 6.24 0.75 27.18 45.70 386.40 6.25 45.13 32.15 12,700.00 900.00
a b
Fig. 2. Seasonal variation of raw data of total coliform (TC), chemical oxygen demand (COD), nitrite and lead in water samples of the Embalse La Florida.
248
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
season (October–April, austral spring–summer) appearing mainly at positive PC1 values, whereas the samples collected during the cold/dry season (March–September, austral autumn–winter) are located at negative PC1 values. PC2 does not seem to contribute to sample differentiation. By superimposing Fig. 3a and b to make a bivariate plot, we can see the correlation of samples collected in the warm-wet season having, in general, higher levels of the chemical parameters associated with the organic/anthropogenic pollution described above, as well as the association of the samples collected during the cold–dry season with variables such as DO and pH. This strong seasonal variability is the more striking feature of the dataset. Fig. 3c is also a PC2 vs. PC1 score plot, with the samples labeled as a function of the sampling sub-zones. In this case, we cannot appreciate any pattern for the distribution of the points. No further insight on the behavior of either sampling zones or individual points can be provided by MA-PCA, beyond seeing that some points of sub-zones 6 and 7 systematically appear towards the zones of the components having higher environmental risk, i.e. the positive zone of PC1 and the negative one of PC2. No conclusions about the behavior of sampling sites and campaigns can be found by using MA-PCA, because the corresponding information is mixed up in the scores of Xaug . The re-folding and averaging of those scores [13,30], an intermediate method used to separate the information of the confounded modes, did not provide satisfactory results on this occasion. 3.3. N-way PCA Unlike the MA-PCA refolding method cited above, N-PCA models (Tucker3 and PARAFAC) take into account the true three-dimensional structure of X thus allowing to separate the information of each mode.
Fig. 3. MA-PCA applied to the augmented water dataset: (a) loading plot, (b) score plot grouped by weather seasons—() warm/rainy (䊉) cold/dry, and (c) sampling sites.
3.3.1. Tucker3 All possible Tucker3 models having different number of factors in each mode (P, Q, R = 1, . . ., 11) were evaluated. The (3,3,2) Tucker3 model produced a percentage of core variation (i.e. the strengths of the interaction) of 34%. However, to fully separate the information present in the three ways in Tucker3 modelling, the structure of the corematrix G must be also considered [31]. In order to allow a more straightforward interpretation of the model, G should ideally be a super-diagonal matrix with non-zero body-diagonal elements and zero values for the rest [32,33], but in our case, the model (3,3,2) originated the following core matrix: G(:,:,1)
contributions to this component, comprised between −0.25 and −0.5; the rest of the parameters have negligible contributions to this component, ranging from −0.25 to 0.25. Therefore and because of its composition, PC1 is mainly related to organic/anthropogenic pollution, a conclusion also supported by the simultaneous negative presence in the component of DO and pH, two chemical parameters that always diminish when the organic/anthropogenic water pollution increases. On the other hand, PC2 (12% of explained variance) is mainly characterized by its strong negative correlation, lesser than −0.5, with lead and cadmium. Thus, this component could be interpreted in terms of heavy metal pollution, but this assignation is not so clear than that found for PC1, because of the simultaneous presence in PC2 of strong or moderate positive loadings for ammonium, TDS, conductivity, hardness and magnesium. Fig. 3b shows a score plot (PC2 vs. PC1) for the water samples, with the points labeled as a function of the weather season of the sampling campaign: a clear division along the PC1 axis can be observed, with the water samples collected during the warm/rainy
42.1696 −6.3006 4.0742
0.9357 −6.6637 3.9132
4.8683 0.6491 −20.2618
G(:,:,2) −9.7393 10.6587 17.3782
−13.5112 −28.0379 4.0206
−11.0005 0.5683 −13.3411
This G matrix does not accomplish the superdiagonality condition [32], so we can conclude that Tucker3 method is not appropriate for modelling the experimental data. 3.3.2. PARAFAC As mentioned above, PARAFAC decomposes X in the product of three matrixes A, B and C with dimensions (nsamp × NF), (nvar × NF) and (ntime × NF), containing respectively the information present in sampling sites, variables and sampling campaigns. The experimental dataset was first j-autoscaled, with all variables having zero mean and unit variance, thus avoiding problems due to different
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
249
Fig. 4. Plot of loadings obtained from PARAFAC model analysis of the water dataset.
data magnitudes and measurement scales. NF was chosen by means of the core consistency parameter [34] implemented in the N-way toolbox for MATLAB [26,35]. The optimal complexity was found for a two-factor model (core consistency = 100%) explaining 25.3% of the data variance, a similar variance to that explained by the first two components of MA-PCA. Explained variance increased until 33.0% for a three-factor model, but the core consistency decreased to 33.9%. According to Bro [36], the core consistence of a PARAFAC model would be near 100% and models with values below 50% would be considered as not appropriate, so in this case the twofactor PARAFAC seems to be the best option. The stability of the model was confirmed by performing a splithalf analysis [37]. The mean values found for the split-half stability coefficients were 0.089 and 0.093 for A mode (sampling points), 0.002 and 0.012 for B mode (variables), and 0.008 and 0.005 for C mode (sampling campaigns). All values were below 0.1 thus showing the stability of the model [38]. Therefore, by considering both the core consistency parameter and the results of the split-half analysis, the optimal complexity was obtained for a two-factor PARAFAC model. Although the explained data variance for this model is low, 25.3%, this is a common situation with environmental data and has been reported in several environmental n-way studies [31,39] whose authors ascribe this fact to the great variability of weather conditions. Fig. 4 shows a summary of the two components of the PARAFAC model, i.e. the loadings of the three matrixes: sampling points (A), variables (B) and sampling campaigns (C). Firstly, the components of each mode will be analyzed separately, and then the interactions between the different modes will be discussed. First, the loadings of the physicochemical variables on the two components of matrix B are examined, because the interpretation is similar to that used for MA-PCA loadings. The first component,
B1, has high positive loadings for ammonium, nitrate, nitrite, alkalinity, bicarbonate, magnesium, phosphate, sulfide, COD, BOD, VAB and TC; and high negative loadings for dissolved oxygen, pH, chloride and sodium (Fig. 4a2). On the other hand, lead and cadmium are strong and positively loaded on the second component, B2 (Fig. 4b2), but the lead concentration was 100-fold higher than the cadmium levels (Table 2). The groupings and distributions of the physicochemical variables into the first two PARAFAC components are very similar to those found for the corresponding MA-PCA components and will be more extensively discussed below. The spatial information of the dataset can be explained in terms of the loadings of each sampling point into the two components of matrix A. The first component, A1, (Fig. 4a1) has positive loadings on all sites, with 6D and 7D sites having the highest values. A very different behavior of sampling points can be seen in the A2 loadings (Fig. 4a2). Again, sites 6D and 7D, behave differently having the highest positive loadings. Additionally, it is possible to observe that sites of sub-zones 6 and 7, located near the recreational areas (campsites) at the south shore of the reservoir (see Fig. 1), have positive A2 loadings, whereas the remaining sampling sites have negative ones. Regarding seasonal variations of water quality, i.e. the third mode, Fig. 4c shows the loadings of matrix C that explain the variation of data along the time. The first seasonal component, C1, shows strong fluctuations, with positive loading values for the warm/rainy season (October–April) and negative ones for the cold/dry period (May–September) (Fig. 4c1). On the other hand, loadings for C2 are all positive (Fig. 4c2), but the seasonal oscillation still persists, although in an opposite way to that observed for C1, with the loadings for the cold/dry period being now greater than those found in the warm/rainy season. This oscillating behavior is confirmed with the strongly significant correlation of the third-mode
250
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
Fig. 5. Contour plots of the PARAFAC A1 loadings onto the Embalse La Florida showing the zones with greater organic contamination.
loadings, positive for C1 and negative for C2, with both precipitation and temperature during the realization of this study, as it is shown in Table 3. Prior to analyzing the interactions between the different modes (space, variables and time) of the PARAFAC model, the meaning of the two PARAFAC factors or components will be briefly discussed. As indicated above, the B mode (variables) of the first component, B1, is mainly made from high positive contributions of ammonium, nitrate, nitrite, phosphate, sulfide, COD, BOD, VAB and TC; and high negative ones for DO, pH. All these grouped variables mainly explain the relationships between organic nutrients, bacteria and parameters related with oxygen consumption. Natural organic detritus and organic waste act as a food source for water-borne bacteria, so an increase in organic nutrients, such as ammonium, nitrate, nitrite, phosphate, or sulfide, will cause an increase in the number of bacteria, VAB and TC, [40], which also will increase BOD and COD. Simultaneously, pH and DO will decrease because bacteria will decompose organic matter consuming dissolved oxygen [41,42]. There are other five variables, with positive or negative loadings, in this first factor, whose presence and behavior can be explained in terms of alkalinity/salinity. Thus, the grouping of alkalinity, bicarbonate and magnesium, all positively loaded, seems logical because water alkalinity is mainly due to ions such as hydrogen carbonate/carbonate and magnesium. On the other hand, sodium and chloride, negatively charged, seems to point to the presence of sodium chloride. However, the most striking feature of
Table 3 Correlation analysis between precipitation, water temperature and the factor loadings of PARAFAC matrix C. Pearson’s correlation coefficient and p value (in parenthesis). PARAFAC loadings
Precipitation
Water temp.
C1
0.702 (0.016) −0.895 (0.000)
0.985 (0.000) −0.850 (0.001)
C2
the B1 PARAFAC component is the presence of up to eleven physicochemical parameters related to organic/anthropogenic pollution. On the other hand, the predominant positive loadings in B2 belong to lead and cadmium. Because the lead levels in the water samples were two orders of magnitude higher than those of cadmium (Table 2), this second PARAFAC component can be interpreted in terms of lead pollution, maybe originating from lead fishing sinkers and/or leaded-fuel used in the past in recreational boats, so this second factor can be assigned to leisure activities. The interactions amongst the different modes (space, time and variables) can be studied by interpreting the loadings of Fig. 4 in terms of the respective signs of the A, B and C loadings for each PARAFAC factor. Factor 1, organic/anthropogenic pollution, has positive A1 loadings for all sampling points, so the resulting cross-product of A, B and C will have a large contribution to the model if B1 and C1 loadings have the same sign. Thus, during the warm/rainy season (positive C1) there will be an increase of the organic pollution parameters (ammonium, nitrate, nitrite, phosphate, sulfide, VAB, TC, COD and BOD) with positive B1 loadings. On the contrary, in the cold/dry season (negative C1) DO and pH will contribute to the cross product because of their negative sign. The global explanation of this seasonal variability lies on a combination of interrelated factors. Initially, through the warm/rainy (October–April) season there is an increase of the organic matter concentration in the reservoir; this fact, in combination with high temperatures, causes bacteria proliferation and the subsequent increment of oxygen consumption, which decreases the DO concentration and pH [42]. Additionally, the DO seasonal variation also is related with seasonal temperatures, because gas solubility diminishes at high temperatures [16,43]. The seasonal variations of nitrate concentration at the reservoir were opposite to the patterns generally reported in other water bodies, where nitrate levels are higher in winter than in summer, due to a decreased biological activity (bacterial denitrification and algal assimilation) in winter [41]. In this context, one possible explanation for the increase of organic parameters during the warm/rainy period lies in the rain runoff, washing the
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
organic compounds from the basin to the reservoir. However, the most probable explanation is human activity in and near the reservoir, which increases strongly during the spring–summer, i.e. the warm/rainy season. The great impact of human activity is made clear by the large A1 loadings of sampling points 6D and 7D, located near the campsites, but a clearer picture of the global behavior of the reservoir regarding the PARAFAC B1 factor, i.e. the organic/anthropogenic pollution, can be found in Fig. 5, in which the A1 loads of the sampling points are plotted as a function of their UTM coordinates. The shading intensity is directly related with the organic/anthropogenic pollution, so the darkest zones correspond to areas of the reservoir where the water surface layer was highly polluted with ammonium, nitrate, nitrite, phosphate, sulfide, VAB, TC, COD and BOD. In this map it is also possible to illustrate the behavior of organic contamination in the water reservoir. The less organic polluted areas were located in the central (sampling points 8C–8F) and Río Grande subzones (sampling points 3) of the water reservoir (Fig. 1). However, the highest levels of physicochemical parameters indicating organic pollution and coliform bacteria were found in the 6D and 7D sampling points, located at the bays of the south shoreline, where human activity is very high during spring–summer (warm/rainy season) due to the presence of recreational areas where, in addition, the water flow is very small because of the orographic configuration of the coast. As a consequence, the levels of ammonium, nitrate, nitrite, phosphate, sulfide, VAB, TC, COD and BOD found in 6D and 7D sampling sites are much higher than those measured in the other sampling sites, indicating punctual sources of contamination. All these considerations seem to point towards anthropic causes, such as a faulty septic-well construction or poor maintenance of the sewage disposal systems of the recreational areas, or even of the sewage system, used to transport wastewater from El Trapiche to La Florida treatment plant, that runs near the shoreline on the south coast of the reservoir. It is also possible to observe in Fig. 5 a large grey area extending from one of these bays (sub-zone 7) towards the areas were the reservoir spills are located (sub-zones 1 and 2) and partially coinciding with the general water flow thus indirectly confirming the validity of the model. In the case of PARAFAC factor 2, the temporal mode has positive C2 loadings for all sampling campaigns and since, as was mentioned above, lead and cadmium have the greater and positive contributions to B2, the only significant contribution of this second factor to the cross product will be due to sampling points with high and positive A2 loadings. As can be seen in Fig. 4b1, only sampling sites of sub-zones 6 and 7 have the required positive A2 loadings, with points 6D and 7D, the nearest to the campsites, standing out over the rest. Again, anthropic influences seem to be the most probable cause: lead fishing sinkers used by local anglers for roughly 55 years or boats navigating the reservoir and propelled with leaded-fuel engines for around 45 years. These recreational activities were most frequent near the south shore, because the piers are located near the campsites and recreational areas. It is very important to note that currently the navigation should not be a source of lead pollution, because Argentinean fuels have been unleaded for approximately a decade [44].
4. Conclusions This study presents the application of chemometric techniques including MA-PCA, PARAFAC and Tucker3 to model the spatial and temporal variation of the water quality of the Embalse La Florida, Argentina. Therefore, 30 physico-chemical parameters were investigated in 418 surface water samples, collected during eleven sampling campaigns carried out in the rainy-summer and drywinter seasons of 1998–2000.
251
The water from some areas of the reservoir can be considered as polluted because some of the quality parameters such as lead, ammonium and TC have values higher than the maximum admitted values in drinking water. MA-PCA was useful to find the general behavior patterns for physico-chemical parameters, but no information concerning individual sampling sites and/or seasons could be found. The re-folding method did not allow to separate spatial and temporal information either. Therefore, the N-way PCA models (Tucker3 and PARAFAC) were applied, because they allow visualizing information of the three ways that can be jointly interpreted. However, the Tucker3 model was not appropriate to understand the dataset information, because the G matrix did not accomplish the superdiagonality condition. A two-factor PARAFAC model resulted more suitable to interpret the dataset information. The first factor can be used to understand the behavior and relation of organic compounds and bacteria with seasonality, whereas the second factor also contains useful information in relation to lead pollution. Additionally, the PARAFAC model was combined with UTM coordinates to visualize the reservoir areas showing a more pronounced environmental risk. As a result, it was possible to identify that organic contamination and/or coliform bacteria are apparently due to punctual sources of pollution, like faulty construction or maintenance of sewage systems. Also, the PARAFAC method was useful to ascertain that the reservoir zones most affected by lead contamination were located in the Southern shore. Thus, these multivariate statistical techniques provided a powerful and useful tool to understanding the spatial and temporal variations in water quality in this reservoir and their causes. Finally, these findings may help in establishing guidelines and regulation actions on the reservoir. The implementation of actions to prevent degradation, and to maintain and improve the reservoir water quality is imperative, because since 2010 the Provincial Government is actively promoting new urban settlements on the shores of the reservoir through the financing of tourist and recreational real estate projects [45–47]. Thus, bearing in mind a likely exponential human population growth in the reservoir shores, and considering the importance of this reservoir, it is compulsory to eradicate the pollution sources of lead and organic compounds, and to monitor the human activities in order to minimize the negative impacts on the water reservoir. The hydrochemical data used to generate the statistical models are the first dataset available for this reservoir, thus providing an outstanding opportunity to establish the baseline water composition that will be used to assess present and future changes in water quality at the reservoir, which is a fundamental tool for water management. Acknowledgements FONCYT 7-7488 and UNSL-CyT 22Q751 grants to EC-V funded the study. During the research Fabricio Damián Cid (FDC) held a postdoctoral fellowship from EADIC-ERASMUS MUNDUS. References [1] A. Cundy, J.S. Neil, B.B. Paul, International Encyclopedia of the Social & Behavioral Sciences, Pergamon, Oxford, 2001, p. 16377. [2] H.I. Zeliger, Human Toxicology of Chemical Mixtures, William Andrew Publishing, Norwich, NY, 2008, p. 79. [3] C.H. Walker, S.P. Hopkin, R.M. Sibly, D.B. Peakall, Principles of Ecotoxicology, Taylor & Francis, Glasgow, 2001. [4] D.A. Wright, P. Welbourn, Environmental Toxicology, Cambridge University Press, New York, 2002. [5] J. Burger, M. Gochfeld, Arch. Environ. Contam. Toxicol. 32 (1997) 217. [6] S.G. Donaldson, J. Van Oostdam, C. Tikhonov, M. Feeley, B. Armstrong, P. Ayotte, O. Boucher, W. Bowers, L. Chan, F. Dallaire, R. Dallaire, É. Dewailly, J. Edwards, G.M. Egeland, J. Fontaine, C. Furgal, T. Leech, E. Loring, G. Muckle, T. Nancarrow, D. Pereg, P. Plusquellec, M. Potyrala, O. Receveur, R.G. Shearer, Sci. Total Environ. 408 (2010) 5165.
252
F.D. Cid et al. / Analytica Chimica Acta 705 (2011) 243–252
[7] E. van Wyk, F.H. van der Bank, G.H. Verdoorn, D. Hofmann, S. Afr. J. Anim. Sci. 31 (2001) 57. [8] J. Burger, Environ. Res. 90 (2002) 33. [9] A. Astel, M. Biziuk, A. Przyjazny, J. Namiesnik, Water Res. 40 (2006) 1706. [10] T. Kowalkowski, R. Zbytniewski, J. Szpejna, B. Buszewski, Water Res. 40 (2006) 744. [11] S. Tsakovski, B. Kudlak, V. Simeonov, L. Wolska, G. Garcia, M. Dassenakis, J. Namiesnik, Talanta 80 (2009) 935. [12] K.P. Singh, A. Malik, N. Basant, V.K. Singh, A. Basant, Chemometr. Intell. Lab. Syst. 87 (2007) 185. [13] R. Pardo, M. Vega, L. Deban, C. Cazurro, C. Carretero, Anal. Chim. Acta 606 (2008) 26. [14] K.P. Singh, A. Malik, D. Mohan, S. Sinha, Water Res. 38 (2004) 3980. [15] P. Barbieri, G. Adami, S. Piselli, F. Gemiti, E. Reisenhofer, Chemometr. Intell. Lab. Syst. 62 (2002) 89. [16] P. Barbieri, C.A. Andersson, D.L. Massart, S. Predonzani, G. Adami, E. Reisenhofer, Anal. Chim. Acta 398 (1999) 227. [17] R. Quirós, Classification and State of the Environment of the Argentinean Lakes, Workshop on Sustainable Management of the Lakes of Argentina, International Lake Environment Committee Foundation (ILEC), San Martin de los Andes, Argentina, 1998, p. 29. [18] R. Quirós, E. Drago, Lakes Reserv.: Res. Manage. 4 (1999) 55. [19] F.D. Cid, R.I. Antón, E. Caviedes-Vidal, Sci. Total Environ. 385 (2007) 86. [20] F.D. Cid, C. Gatica-Sosa, R.I. Anton, E. Caviedes-Vidal, J. Environ. Monit. 11 (2009) 2044. [21] M. Jofré, R. Antón, E. Caviedes-Vidal, Arch. Environ. Contam. Toxicol. 55 (2008) 471. [22] APHA, Standard Methods for the Examination of Water and Wastewater, American Public Health Association (APHA), American Water Works Association (AWWA) and Water Environment Federation (WEF), Washington, DC, 1998. [23] R. Pardo, B.A. Helena, C. Cazurro, C. Guerra, L. Debán, C.M. Guerra, M. Vega, Anal. Chim. Acta 523 (2004) 125. [24] R. Henrion, Chemometr. Intell. Lab. Syst. 25 (1994) 1. [25] R. Bro, Chemometr. Intell. Lab. Syst. 38 (1997) 149. [26] C.A. Andersson, R. Bro, Chemometr. Intell. Lab. Syst. 52 (2000) 1.
[27] CAA, Código Alimentario Argentino, Capítulo XII: Bebidas Hídricas, Agua y Agua Gasificada., Código Alimentario Argentino, Capítulo XII: Bebidas Hídricas, Agua y Agua Gasificada, Artículo 983 - (Res Conj. SPRyRS y SAGPyA N◦ 68/2007 y N◦ 196/2007), 2007. [28] USEPA, National Primary Drinking Water Regulations, National Primary Drinking Water Regulations, United States Environmental Protection Agency, 2009. [29] WHO, Guidelines for Drinking-water Quality Incorporating 1st and 2nd Addenda, Recommendations, vol. 1, 3rd ed., World Health Organization, Geneva, 2008. [30] R. Tauler, D. Barcelo, E.M. Thurman, Environ. Sci. Technol. 34 (2000) 3307. [31] R. Leardi, C. Armanino, S. Lanteri, L. Alberotanza, J. Chemometr. 14 (2000) 187. [32] O. Abollino, M. Malandrino, A. Giacomino, E. Mentasti, Anal. Chim. Acta 688 (2011) 104. [33] P.M. Kroonenberg, Three-Mode Principal Component Analysis, DSWO Press, Leiden, The Netherlands, 1983 (reprint 1989). [34] I. Stanimirova, K. Zehl, D. Massart, Y. Vander Heyden, J. Einax, Anal. Bioanal. Chem. 385 (2006) 771. [35] V. Pravdova, B. Walczak, D.L. Massart, H. Robberecht, R. Van Cauwenbergh, P. Hendrix, H. Deelstra, J. Food Compos. Anal. 14 (2001) 207. [36] R. Bro, H.A.L. Kiers, J. Chemometr. 17 (2003) 274. [37] M. Timmerman, H. Kiers, Psychometrika 68 (2003) 105. [38] K.P. Singh, A. Malik, S. Sinha, D. Mohan, V.K. Singh, Anal. Chim. Acta 596 (2007) 171. [39] K.P. Singh, A. Malik, V.K. Singh, S. Sinha, Chemometr. Intell. Lab. Syst. 83 (2006) 1. [40] M. Boualam, L. Mathieu, S. Fass, J. Cavard, D. Gatel, Water Res. 36 (2002) 2618. [41] D.D. Ratnayaka, M.J. Brandt, K.M. Johnson, Water Supply, 6th ed., ButterworthHeinemann, Boston, 2009, p. 195. [42] P.A. Araoye, Int. J. Phys. Sci. 4 (2009) 271. [43] S.E. Manahan, Environmental Chemistry, Lewis Publishers (CRC Press), Boca Ratón, FL, 1994. [44] SE, Subsecretaría de Combustibles, República de la Argentina, 1998. [45] Ley N◦ VIII-0664-09, Fomento a las inversiones para el desarrollo económico provincial, Gobierno de la Provincia de San Luis, 2009. [46] Resolución N◦ 231-SLASE, 2010. [47] Decreto regulatorio N◦ 946-MHP/10, Gobierno de la Provincia de San Luis, 2010.