Ecological Modelling 360 (2017) 80–93
Contents lists available at ScienceDirect
Ecological Modelling journal homepage: www.elsevier.com/locate/ecolmodel
Multi-model approach to predict phytoplankton biomass and composition dynamics in a eutrophic shallow lake governed by extreme meteorological events Carolina Crisci a,∗ , Rafael Terra b , Juan Pablo Pacheco c , Badih Ghattas d , Mario Bidegain e , Guillermo Goyenola c , Juan José Lagomarsino f , Gustavo Méndez f , Néstor Mazzeo c a Polo de Desarrollo Universitario Modelización y Análisis de Recursos Naturales, Centro Universitario Regional del Este, Ruta nacional n◦ 9 intersección con Ruta n◦ 15, CP 27000, Rocha, Uruguay b Instituto de Mecánica de los Fluidos e Ingeniería Ambiental, Facultad de Ingeniería, Universidad de la República, Julio Herrera y Reissig 565, CP 11300, Montevideo, Uruguay c Departamento de Ecología Teórica y Aplicada, Centro Universitario Regional del Este, Universidad de la República, Tacuarembó entre Av. Artigas y Aparicio Saravia, CP 20000, Maldonado, Uruguay d Institut de Mathématiques de Marseille, UMR 7373, CNRS, 163 Avenue de Luminy, CP 13288, Marseille, France e Instituto Uruguayo de Meteorología, Javier Barrios Amorín 1488, CP 11200, Montevideo, Uruguay f Obras Sanitarias del Estado, Unidad de Gestión Desconcentrada, Ruta 12 km 6, CP 20003, Maldonado, Uruguay
a r t i c l e
i n f o
Article history: Received 9 November 2016 Received in revised form 16 June 2017 Accepted 17 June 2017 Keywords: Extreme meteorological events Inorganic turbidity dynamics Chlorophyll-a dynamics Phytoplankton morphology-based functional groups Water quality Predictions
a b s t r a c t A multi-model approach to predict phytoplankton biomass and composition was performed in a eutrophic Uruguayan shallow lake which is the second drinking water source of the country. We combined statistical (spectral analysis and Machine learning techniques) and physically based models to generate, for the first time in this system, a predictive tool of phytoplankton biomass (chlorophyll-a) and composition (morphology-based functional groups). The results, based on a 11-year time series, revealed two alternating phases in the temporal dynamics of phytoplankton biomass. One phase is characterized by high inorganic turbidity and low phytoplankton biomass, and the other by low inorganic turbidity and variable (low and high) phytoplankton biomass. A threshold of turbidity (29 TNU), above which phytoplankton remains with low biomass (<15–20 ug/l) was established. The periods of high turbidity, which in total cover 30% of the time series, start abruptly and are related to external forcing. Meteorological conditions associated with the beginning of these periods were modeled through a regression tree analysis. These conditions consist of moderate to high wind intensities from the SW direction, in some cases combined with high antecedent precipitation or low water level. The results from the physically-based modeling indicated that the long decaying time-scale of turbidity and intermediate resuspension events could explain the prolonged length of the high turbidity periods (∼1.5 years). Random Forests models for the prediction of phytoplankton biomass and composition in periods of low turbidity resulted in a proportion of explained variance and a classification error over a test sample of 0.46 and 0.34 respectively. Turbidity, conductivity, temperature and water level were within the most important model predictors. The development and improvement of this type of modeling is needed to provide management tools to water managers in the current water supply situation. © 2017 Elsevier B.V. All rights reserved.
1. Introduction
∗ Corresponding author. E-mail addresses:
[email protected] (C. Crisci), rterra@fing.edu.uy (R. Terra),
[email protected] (J.P. Pacheco),
[email protected] (B. Ghattas),
[email protected] (M. Bidegain),
[email protected] (G. Goyenola),
[email protected] (J.J. Lagomarsino),
[email protected] (G. Méndez),
[email protected] (N. Mazzeo). http://dx.doi.org/10.1016/j.ecolmodel.2017.06.017 0304-3800/© 2017 Elsevier B.V. All rights reserved.
Predicting phytoplankton biomass and composition is a central issue for freshwater ecosystems used as drinking water reservoirs, mainly due to harmful cyanobacterial blooms and the potential presence of cyanotoxins that can cause serious and occasionally fatal human diseases (Carmichael, 2001; Paerl et al., 2001; Falconer and Humpage, 2006). In addition, the occurrence of cyanobacterial blooms leads to ecosystems simplification and loss of multiple ecosystem services, affecting trophic dynamics and biodiversity,
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
and consequently hindering the reversion of this non desirable configuration (Scheffer et al., 1993). The relation between anthropogenically induced nutrient over-enrichment (eutrophication) and the occurrence of cyanobacterial blooms in lakes is well documented (Heisler et al., 2008; Paerl and Huisman, 2009; Carpenter et al., 2011; O’Neil et al., 2012; Martinuzzi et al., 2014). However, despite the importance of nutrient availability, several additional drivers control the spatial and temporal variability of phytoplankton dynamics, such as light availability, temperature, residence time, grazing pressure and climatic variability (Cymbola et al., 2008; Moss et al., 2011, Kosten et al., 2012). In lakes with frequent and intense sediment resuspension events, light attenuation due to high concentrations of suspended solids in the water column can significantly affect the planktonic primary productivity (Beaver et al., 2013). However, light attenuation is not the only effect of sediment resuspension. During periods of high winds, sediment resuspension can also generate inoculation of phytoplankton from cysts or akinetes and nutrient release into the water column, leading to enhanced phytoplankton biomass in the water column (Søndergaard et al., 1992). The importance of wind induced sediment resuspension in shallow lakes points out to the fact that lake morphometry and the wind regime, including extreme events, can strongly modulate the lake dynamics by affecting turbidity, nutrients and primary productivity (Evans, 1994; Scheffer and van Nes, 2007). Phytoplankton species differ broadly in their responses to environmental change, including resources acquisition (light and nutrients), and mortality avoidance (sinking, washout and grazing) (Margalef, 1978; Reynolds, 1984a; Naselli-Flores et al., 2007). These features can be combined to describe the species habitat template (sensu Southwood, 1977). This concept considers the habitat as a template on which evolution forges characteristic traits of the species (Southwood, 1988), and can be used to predict community organization (Keddy, 1992). Morphological traits are relatively easy to measure and have clear relationships with the functional properties of phytoplankton (Lewis, 1976; Reynolds 1984b; Naselli-Flores et al., 2007; Kruk et al., 2010; Pacheco et al., 2010). The morphology-based functional groups (hereafter MBFGs) approach, classifies organisms in seven groups in terms of morphological traits (e.g. volume, presence of flagella) independently from the organism’s taxonomy (Kruk et al., 2010, Kruk and Segura, 2012). We implemented a multi-model approach, combining statistical and physically-based models, to predict and better understand the dynamics of phytoplankton biomass (through chlorophyll-a concentration, hereafter chl-a) and composition (through MBFGs) in Laguna del Sauce, a shallow eutrophic lake (the second drinking water source of Uruguay) strongly influenced by wind regime (Inda and Steffen, 2010). Given the eutrophication process observed in the last decades, the intensification of cyanobacterial blooms (higher frequency of occurrence, higher biomasses, wider distribution, increased persistence and toxicity) and associated drinking water supply crisis, the prediction of phytoplankton dynamics becomes crucial for water management authorities. In this sense, we confer great importance to the setting up of predictive models of phytoplankton biomass and composition using records provided by the drinking water supply company and the meteorological station at El Sauce lake airport.
2. Materials and methods 2.1. Study site Laguna del Sauce (34◦ 43 S, 55◦ 13 W, Maldonado, Uruguay, Fig. 1a) conforms a system of three connected shallow lakes: El Sauce (4045 ha), De los Cisnes (205 ha) and El Potrero (411 ha)
81
(Fig. 1b). The mean depth is 2.5 m while the maximum depth reaches 5 m. Laguna del Sauce has two main freshwater inputs (Arroyo Pan de Azúcar and Arroyo del Sauce) and one main output (Arroyo del Potrero), which is a natural connection to the Río de la Plata (Inda and Steffen, 2010) (Fig. 1b). Laguna del Sauce is a eutrophic system. Total phosphorus has reached values above 80 ug/L in several periods (and even higher than 100 ug/L during low water periods) and total nitrogen shows considerable temporal oscillations, with values between 200 and 1000 ug/L (Inda and Steffen, 2010). Cyanobacterial blooms, which have been observed since the 1960s, have become more intense in recent years (Rodríguez et al., 2010a). The phytoplankton biomasses are shown in the Results section. The increase of cyanobacterial blooms has been related to enhanced nutrient inputs from the basin during the last decades as a consequence of agricultural and cattle production intensification, and the significant growth of urbanization and tourism (Mazzeo et al., 2010; Rodríguez et al., 2010b). While Laguna de los Cisnes and Laguna del Potrero exhibit a large fraction covered by submerged vegetation, Laguna del Sauce does not present submerged macrophytes (except in the littoral zone), mainly because of wave action and higher inorganic turbidity (Inda and Steffen, 2010). 2.2. Data availability Next, we present all the data sets considered, which are available to OSE-UGD managers. 2.2.1. Water quality data Water quality data for the period 2002–2012 was provided by OSE-UGD. Daily data is available for the following water attributes: conductivity (Orion DuraProbeTM , 4-Electrode Conductivity Cells, model 013010), chl-a, temperature (LDO101 HACH) and turbidity (HACH CatN◦ 47000-88, model 2100N). Daily (except weekends) measurements and analyses are performed at the OSE-UGD lab on samples taken at 6 am from the water intake point, located 100 m offshore from OSE-UGD water treatment unit (Fig. 1b). Temperature is measured in situ. Chl-a is considered as proxy of total phytoplankton biomass and is estimated by extraction from GF/F Whatman filter using hot ethanol (90%), according to Nusch (1980) and spectrophotometry (Macherey-Nagel UV/VIS). Starting in 2004, samples are taken from the lake every one or two weeks to estimate phytoplankton species abundance (org/ml); the sampling becomes daily during periods when cyanobacteria blooms occur. Phytoplankton samples are counted from fixed Lugol samples, by using the settling technique and random fields (Utermöhl, 1958) considering at least 100 individuals of the most frequent species (Lund et al., 1959). We did not include the picoplankton fraction (less than 2 m in greatest dimension) or tychoplanktonic groups (associated with periphyton). The abundance data was transformed into biovolume considering the volume of organisms of each species. Biovolume estimations were performed based on Hillebrand et al. (1999). The sum of the biovolumes of all species included in the same morphologically-based functional group (MBFG, see below) was performed to obtain the biovolume of functional groups. The response of the phytoplankton composition to environmental variables was analyzed following the morphologically-based functional approach (MBFGs) proposed by Kruk et al. (2010). This classification scheme is based on morphological traits, which reflect functional abilities for resources uptake, growth, sedimentation, grazing and flushing avoidance. It includes seven groups: Group I: small organisms with high surface/volume (S/V) ratio (e.g. Chroococcales, Chlorococcales), Group II: small flagellated organisms with siliceous exoskeletal structures (i.e.,
82
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
Fig. 1. a) Location of Laguna del Sauce in Maldonado Department (southeast Uruguay), and b) Laguna del Sauce and its three connected lakes (P: El Potrero, C: De los Cisnes, S: El Sauce). The black and white dots indicate the location of the water treatment unit (OSE-UGD) and the meteorological station of the Uruguayan Institute of Meteorology, respectively. Main water inputs (A◦ del Sauce and A◦ Pan de Azúcar) and output (A◦ del Potrero) are also indicated.
Chrysophyceae); Group III: large filaments with aerotopes (e.g., Planktothrix, Anabaena, Cylindrospermopsis); Group IV: organisms of medium size, lacking specialized traits (e.g., Closterium, Monoraphidium, Pediastrum); group V: unicellular, medium to large-sized flagellates (e.g., Cryptophyceae, Euglenophyceae, Dinophyceae); Group VI: nonflagellated organisms with siliceous skeletons (i.e., Bacillariophyceae); Group VII: large mucilaginous colonies (e.g., Botryococcus, Aphanocapsa, Microcystis). As nutrient data was scarce and unevenly distributed in time during the study period, we decided not to consider this information in the analysis.
2.2.2. Water level data Daily water level information was provided by the Uruguayan National Water Direction (DINAGUA). The water level is measured daily at a dam located at the mouth of El Potrero stream (Fig. 1b). Positive values indicate water levels above the dam crest associated with discharge from Laguna del Sauce to El Potrero stream. Negative values imply no outflow from the reservoir.
2.2.3. Meteorological data Meteorological data was recorded at the Laguna del Sauce conventional meteorological station (WMO N◦ 86586) belonging to the Uruguayan Institute of Meteorology (INUMET) (Fig. 1b). Data was available for the 2002–2012 period (same as the water quality data). Daily precipitation was available at 10 UTC. Daily wind speed and temperature were estimated by averaging tri-hourly synoptic observations. Wind speed and direction are measured at 10 m above the surface, while temperature was recorded in the meteorological shelter located at 1.5 m.
2.2.4. Bathymetric data Bathymetric data of the lake and its surrounding area was obtained through the digitalization of a bathymetric map provided by the National Army. Data was digitalized in AutoCad and ArcGis and interpolated to a 10 m × 10 m grid.
2.3. Multi-model approach Inda and Steffen (2010), as well as exploratory analyses on turbidity and chl-a time series in Laguna del Sauce, describe periods of high inorganic turbidity (established after sudden peaks of turbidity as cause of sediment resuspension) and low chl-a concentrations, and subsequent periods of low turbidity with significant chl-a fluctuations (see Fig. 3 in Section 3.1.). This led us to consider a multi-model approach to address: 1) The dynamics of the relation between turbidity and chl-a (proxy of algal biomass). 2) The dynamics of inorganic turbidity during periods of high turbidity and the identification of the meteorological events that prompt the sudden changes in turbidity. 3) The relevance of environmental drivers (water quality, meteorological and water level variables) in controlling algal biomass (chl-a) and composition (dominance of MBFGs) in periods with low turbidity. A schematic representation of the multi-model approach is shown in Fig. 2. 2.4. Time series analyses of turbidity and chlorophyll-a First, we analyzed the turbidity and chl-a temporal variability. In order to detect characteristic periodicities and the relation between both time series, we performed a power spectral density (PSD) analysis using Welch’s overlapped segment averaging estimator (Welch, 1967). The series were divided into eight sections with 50% overlap. Each section was windowed with a Hamming window and then the modified periodograms were averaged to obtain the PSD estimate. Considering the noise due to imperfect and finite data, Welch’s noise reduction method is often preferred even though it affects frequency resolution. The analyses were done with the pwelch function of Matlab 7.0.4 version. To assess the statistical significance of the periodogram peaks, we compared them to the spectrum of a reference red noise with it
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
83
Fig. 2. Schematic representation of the four step multi-model approach proposed in this study. 1) Time series analysis of turbidity and chl-a dynamics. 2) Machine learning models to characterize the inorganic turbidity peaks. 3) Physically-based models to analyze the turbidity decay. 4) Machine learning models to predict phytoplankton biomass and composition in periods of low inorganic turbidity.
Fig. 3. Turbidity and chl-a time series during the study period (2002–2012). Series are presented considering weekly averages over daily data to avoid gaps associated with missing data during weekends. The starting of each PHT is indicated.
95% confidence interval assuming a 2 distribution with 2 degree of freedom. Peaks that exceeded the upper limit of the confidence interval were considered significant. Next, we estimated the cross power spectral density of turbidity and chl-a. This analysis determines the relationship between two time series as a function of frequency. It allows to evaluate if statistically significant peaks of the two series are correlated and, if so, which is the phase shift (or lag) between them (which peak is occurring before) enabling to postulate causality. To perform cross-spectral analyses we used the cpsd function in Matlab 7.0.4 version, which estimates the cross power spectral density of two discrete-time signals using Welch’s averaged, modified periodogram method of spectral estimation.
Prior to performing the analyses, we took weekly averages over the daily time series because of missing data during weekends. The weekly time series were standardized and detrended.
2.5. Modelling turbidity: machine learning models and physically-based equations 2.5.1. Machine learning models The observed temporal evolution of inorganic turbidity (abrupt peaks in very short periods of time) cannot be explained by the internal lake dynamics, hence external forcing factors, such as extreme meteorological events (i.e. wind and/or precipitation) are proposed as potential triggers for turbidity peaks. To model the response of turbidity to meteorological variables we performed
84
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
Classification and Regression Trees models (CART, Breiman et al., 1984). The basic idea behind CART is to split the space of explicative variables into multidimensional rectangles, and to use a very simple local predictor (for instance, a constant) on each rectangle. This method has some interesting advantages in the context of our problem: 1) the results are very easy to understand, 2) CART is robust to outliers in the output, and 3) CART can deal with data sets with complex structures and is more powerful compared to alternative methods as the complexity increases (Crisci et al., 2012) (for a more detailed description of CART we refer to the Supplementary Methods file of the supplementary material). Since the goal is to learn which meteorological events are associated with extreme –and scarce- turbidity values; we do not expect high explained variance in the global model. However, we do expect to accurately predict extreme turbidity values, as CART has the ability to locate outliers in separate nodes and hence to give information regarding the conditions that favor such events. We considered the following predictors on a daily basis: • • • • • • •
maximum southwest wind intensity (Max SW wind) precipitation of the previous day (Precip) cumulative precipitation of the 2 previous days (Cum precip 2) cumulative precipitation of the 5 previous days (Cum precip 5) cumulative precipitation of the 15 previous days (Cum precip 15) cumulative precipitation of the 30 previous days (Cum precip 30) water level (Water levxel)
The water level, which is related to antecedent precipitation but also to the water intake by the water treatment unit for human supply and to evaporation, was incorporated with the following rationale: low water level periods may foster sediment resuspension even in moderate wind conditions. As the response variable is numerical (turbidity daily values) we performed a regression tree (if it had been a categorical variable we would have considered a classification tree, see the Supplementary Methods). To perform the models, we used the rpart function of the rpart package (Therneau et al., 2015) in R software (version 3.2.1, R Development core team, 2008). We retained the tree associated with a complexity parameter of 0.01 (default rpart value) and we set the minsplit value to 5. 2.5.2. Physically-based equations After the sudden increase of turbidity, the lake remained with high inorganic turbidity levels for long periods (∼ 1.5 years) before returning to pre-peaks values, preventing the occurrence of high levels of chl-a (see Fig. 3 in Section 3.1.). We modeled such periods after the abrupt peaks explicitly representing the decay and resuspension processes. We considered the resuspension estimate proposed by Clifton and Dingler, 1984, where um2 can be interpreted as a measure of how much will the bottom be affected by wind stress. um2 =
∗ g ∗ H2 2 ∗ L ∗ cosh( 2∗∗h ) L
(1)
Where: • • • •
um is the speed at the lake bottom, g is the gravitational constant, h is the depth of the water column, H is the wave height,
H=
0.0016 ∗
fetch ∗ wind speed √ g
• L is the wave length,
(2)
L = 1.56 ∗
0.77 ∗ wind speed ∗ tanh
0.077 ∗
g ∗ fetch wind speed2
0.25 2 (3)
• fetch is the distance following the wind from the shore to any given point in the lake, and therefore depends on wind direction. Resuspension is thus computed in each 10 × 10 m grid box (a total of approximately 405,000 values) using the bathymetry analysis (see Section 2.2.4). We then considered two approaches to determine a unique value to use in Eq. (4): (i) the spatial average, which was the option selected and (ii) the value in the grid point closest to the OSE-UGD water intake, which gave very similar results (not shown). We modeled the time derivative of turbidity with a decay term, proportional to turbidity itself with an associated timescale ; and a term proportional to resuspension, with proportionality ␣, dTurb Turb (t) + ␣ ∗ Resusp (t) =− dt
(4)
We limited the estimation of and ␣ to the high turbidity periods, performing the analyses separately in 500-day-long periods that follow each abrupt peak. Since we are interested in the dynamics of the slow turbidity decay, not the high frequency day-to-day variability, we worked with weekly averages. Note that the time derivative of turbidity involves two –weakly- time-steps; in the second high turbidity period (PHT2) this generates too many gaps in data so it was dropped, keeping the other three. First, the effect of resuspension was ignored (imposing ␣ = 0) and solving for . This gives an upper bound for , since in the full model a faster decay (lower ) is partially cancelled by the resuspension. Next, we simultaneously compute and ␣, for which we previously divided Eq. (4) by turbidity, so as to obtain an equation of the form y(t) = a+b.x(t) and solve for the coefficients in a least square sense. With the pair of coefficients thus obtained, Eq. (4) was numerically integrated, starting from the observed value of turbidity in each case so that the simulations can be compared to the observed evolution. 2.6. Machine learning models to predict phytoplankton biomass and composition during low turbidity periods We implemented Machine learning techniques to predict phytoplankton biomass (chl-a concentration) and composition (predominant MBFG) during periods of low turbidity (turbidity <29 TNU, see section 3.1.). The chl-a variable was transformed into log(chl-a+1) because such transformation gave better results (lower model errors). Instead of using total biovolume, we chose log(chl-a+1) because the former resulted in very poor model performances (even log-transformed), probably because of its large variance. The MBFG response variable is a categorical variable and was estimated as the dominant functional group (i.e. the group with highest biovolume). For this variable, we discarded the categories corresponding to the groups I, II and V since they were extremely rare in the data set causing problems in the models’ fit. Therefore, the MBGF response variable has four categories (III, IV, VI and VII). For the predictor variables, we considered two nested sets of environmental variables since we had two different objectives in performing Machine learning models: 1) to determine the predictive power of the models using environmental or meteorological variables that are continuously recorded at both the OSE-UGD water treatment unit and the meteorological station, and 2) to better understand the processes which led to different phytoplankton biomass or the dominance of different MBFGs. Section 2.2. details data availability. The predictor variables used for the first objective were:
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
• • • • • •
daily mean air temperature (Air T) daily water temperature (Water T) cumulative precipitation in the previous week (Cum precip) turbidity (Turb) conductivity (Cond) water level (Water level)
For the second objective, we considered the same variables used in the first one plus the following:
• functional diversity (n◦ of functional groups) in the previous 1–10 days (Div func last days) predominant functional group in the last 1–10 days (MG last days) • second predominant functional group in the last 1–10 days (2nd MG last days) • total biovolume in the last 1–10 days (Biovol last days)
We considered 1–10 previous days to calculate these variables to overcome missing data, since these variables were unevenly registered (daily, weekly, biweekly). In general, each window presented only one day of available data. If more than one day was available, we chose the observation that fell nearest to one week prior. One week is the approximate temporal scale in which variations of the phytoplankton community are observed (Mazzeo, pers. com). Functional diversity was considered as an indicator of the ability of the ecosystem to allow the coexistence of different MBFGs. Contrasting environmental characteristics could promote, or in contrast restrict, the presence of different groups in the lake. At the same time, the presence of one group could limit the presence of others (Litchman and Klausmeier, 2008). In addition, phytoplankton composition in a particular time it also related to the preceding community biocenosis (Reynolds, 2006). In this sense, prior diversity and functional composition provide the framework of previous biocenosis that influences the phytoplanktonic composition in the ecosystem. We choose CART and Random Forests (Breiman 2001) among Machine learning tools to model the log(chl-a+1) and MBFG variables As CART’s principal weakness is its instability with regard to changes in the learning sample (Breiman et al., 1984), we also considered Random Forests models (Breiman 2001), an ensemble learning technique that avoids this problem and has shown to outperform CART with respect to prediction error (Breiman, 2001). Furthermore, this technique presents lower error rates compared to other statistical tools in ecological studies (e.g.: Cutler et al., 2007; Cirsci et al., 2012). We kept CART because, as mentioned above, it has an easily interpretable graphical output. Although Random Forest is a black box procedure it constitutes an interesting tool to estimate the variables importance and, as was previously indicated, has comparative lower error rates than CART (Breiman, 2001) (for a more detailed description of CART and Random Forests we refer to the Supplementary Methods). The models were performed using R software version 3.2.1 with the functions rpart (rpart package) and randomForest (randomForest package, Liaw and Wiener, 2002). When running CART we considered an initial complexity parameter value of 0.005 and a minsplit value of 5. The maximal tree obtained with these settings was pruned considering the 1-SE rule (recommended by Breiman et al., 1984) to obtain the optimal tree. To perform Random Forests, we computed 5000 trees (ntree = 5000) and we considered the squared root of the total number of predictors at each split of every tree. To assess the models performance we computed the prediction error as follows. For the
85
log(chl-a+1) which is a numerical response variable, we calculated the mean squared error (MSE): n 2 (predi − obsi )
n
(5)
i=1
where pred is the vector of predicted values and obs is the vector of the observed values of the response variable. For the MBFG prediction (categorical response variable, we computed the misclassification error which corresponds to the proportion of misclassified cases. When modeling the log(chl-a+1) we also computed the proportion of variance explained by the model, which is easier to interpret than the MSE: 1 − MSE (obs)
(6)
where (obs) corresponds to the variance of the observed values of the response variable. As Machine learning tools may tend to overfit the data, we evaluated the prediction ability of the models by means of the test error (James et al., 2013). To do this, we randomly split the data in a training (2/3 of the data) and a test sample (1/3 of the data), then we trained the model with the training sample and assessed its performance using the test sample. We performed this procedure 100 times and averaged the learning and the test errors/proportion of explained variance to have an unbiased measure of the performance of the models. When modelling the MBFG response variable, as the different classes (functional groups) are imbalanced, samples were stratified so as to maintain approximately the same distribution of categories in the learning and the test sample with respect to the distribution of the original variable. 3. Results 3.1. Time series analyses of turbidity and chlorophyll-a Visual analysis of the turbidity and chl-a time series shows four periods with high turbidity values (hereafter PHT) that appeared after sudden peaks (changes of about 70 − 90TNU). These abrupt changes in turbidity occurred during different seasons of the year (i.e. summer, winter, summer and autumn for PHT 1–4 respectively). The fourth period presents a more gradual peak of turbidity compared to the other three periods (Fig. 3). After the establishment of a PHT, it takes ∼1.2 year on average before turbidity returns to pre-events values (Fig. 3). During PHTs, chl-a present low values, while outside PHTs, chl-a show high variability (Fig. 3). The relation between daily turbidity and chl-a is shown in Fig. 4a. A turbidity threshold of 29 TNU was identified, above which chl-a remains with low values (<∼20 ug/l, but most values are below ∼15 ug/l). Below 29 TNU, chl-a shows significant variability, ranging from 0 to 145 ug/l (Fig. 4a). The distribution of turbidity values indicates that near 30% of the time, turbidity presents values above the identified threshold (≥29 TNU, Fig. 4b). There are a few observations with high chl-a (>20 ug/l) and high turbidity (≥29 TNU, Fig. 4a) possibly meaning that, in those cases, high phytoplankton biomass provoked the high observed turbidity (i.e. turbidity can be attributed to the inertia of phytoplankton biomass). PSD analysis indicates a significant periodicity of 2.8 years in the turbidity series (see Fig. S2a). No periodicities were found for shorter periods. Considering the chl-a series, a characteristic peak (not statistically significant) was found for 2.5 years. As for turbidity, no periodicities were found for shorter periods (Fig. S2b). In summary, both series present similar periodicity, although only turbidity was statistically significant (confidence level 95%).
86
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
Fig. 4. (a) Scatterplot of chl-a vs turbidity. A threshold of turbidity of 29 TNU was identified (vertical black line, see Fig. S1 of the supplementary material), that separates a region with both low and high values of chl-a from one in which chl-a remains with low values (∼<20 ug/l –horizontal black line-, but most values are below ∼15 ug/l −horizontal dashed line-). (b) Distribution of turbidity values, dark grey bars represent samples with turbidity values above the identified threshold while light grey values correspond to turbidity values below the 29 TNU threshold. The data point with very high turbidity (i.e. 125 TNU) does not appear in the scatterplot since the chl-a value was missing in that sample.
Regarding the cross power spectral density results, we observe that both series are correlated, presenting a period of common oscillation of 2.5 years (Fig. S3). The phase spectrum results show that the peak is occurring earlier for the turbidity series, presenting a phase difference with chl-a of 61 weeks (∼1.2 years, Fig. S3), similar to the length of the PHTs.
3.2. Modeling turbidity: machine learning models and physically-based equations 3.2.1. Machine learning models The regression tree analysis was able to predict the beginning of three of the four visually identified PHTs, separating the day when PHTs started (or a few days prior, see below) in separate nodes (Fig. 5). High intensities of southwest winds (>59 km/h) predicts the onset of PHT1 and PHT2; events of such characteristics occur the day before the peak of turbidity (Fig. 5). The observation corresponding to PHT2 s first day and that of two days prior, were segregated in a separate node, characterized (among other
attributes) by low (<21.9 mm) cumulative precipitation in the previous 30 days. The starting date of PHT3 and the previous week were well separated in a node which was characterized by relative high water levels (≥0.45 m) (Fig. 5). To better interpret CART output (which indicates that cumulative precipitation in previous 30 days was <91.2 mm, Fig. 5), we explored the data set. It shows relative high 30 days cumulative precipitation (∼80 mm) for the first PHT3 day and the preceding week. This result is in accordance with the relative high water level associated to this PHT. We also found moderate to high southwest wind intensities (46 km/h) the day prior to the establishment of the PHT3. Finally, the starting date of PHT4 (or earlier days) were not clearly segregated in a separate node as in the case of other periods, but several days preceding this period were grouped in a node characterized with very low water levels (<−0.52 m), indeed, 40 of the 46 observations that conform this node occurred within the two months before of the onset of PHT4 (Fig. 5). Low water level in combination with moderate SW winds (30–40 km/h) in the three previous days (information obtained from inspection of the time series) were probably the main drivers that triggered PTH 4. As expected, predicted turbidity
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
87
Fig. 5. CART output for the prediction of abrupt peaks of turbidity that lead to PHTs. Dashed rectangles indicate well predicted extreme turbidity events associated to the onset of PHTs. The solid line rectangles describe the observations that fall in the terminal nodes with extreme turbidity values.
in the nodes related to the setting of PHTs was always very high (from 42.58 to 76.27 TNU) (Fig. 5). Fig. 6a shows the standardized anomalies of weekly averaged turbidity and water level time series while Fig. 6b presents the standardized anomalies of weekly averaged turbidity and precipitation time series. In addition to the results based on CART, Fig. 6a shows the large negative water level anomaly before the PHT4 onset. Besides, we can see that PHT1 presents positive anomalies of precipitation both before its onset and at the beginning of the period (Fig. 6b). Inspection of the database shows an extreme precipitation event (68 mm in one day) two days before the beginning of the period. 3.2.2. Physically-based equations According to Eq. (4), and assuming no resuspension (␣ = 0), the decay time () of turbidity is 298, 259 and 677 days for PHT1, PHT3 and PHT4, respectively, which constitutes an upper bound for this parameter. When the full model is used, the solutions for become 209, 102 and 177 days and for ␣ = 0.69, 20.1 and 32.0, respectively. The numerical integration of turbidity with these parameters for each period is presented and compared to observed values in Fig. S4. The slow decay of turbidity is well captured in all three PHTs, showing that the decay timescale is indeed of several months. The higher frequency variability of turbidity, presumably associated to resuspension, is more poorly simulated. Consistently, the best fit of the multiplicative parameter ␣ shows values of the same order in PHT3 and PHT4 but much smaller in the case of PHT1. 3.3. Machine learning models to predict phytoplankton biomass and composition in low turbidity periods 3.3.1. Machine learning models predictive accuracy for phytoplankton biomass Table 1 shows CART and Random Forests proportion of explained variance for the transformed chl-a variable (log (chl-a
Table 1 Proportion of explained variance for CART and Random Forests (RF) models considering the log (chl-a + 1) response variable. Average results (±0.01 SD) over 100 random splits of the data are presented both for the training and the test samples. Results for data set 1 and data set 2 are shown. Data set 1
Training Test
Data set 2
CART
RF
CART
RF
0.56 ± 0.12 0.26 ± 0.18
0.89 ± 0.01 0.46 ± 0.08
0.56 ± 0.01 0.28 ± 0.16
0.89 ± 0.01 0.46 ± 0.07
+1)) over the training and test sample, considering both data set 1 and 2. In all cases, CART shows lower prediction accuracy than Random Forests (RF). For data set 1 (environmental + biological variables) RF shows very good results considering the training sample (0.89 ± 0.01), and still very acceptable results considering the prediction accuracy over the test sample (0.46 ± 0.08). Practically identical results are found when comparing the prediction accuracy of data set 1 and 2 (using only environmental variables continuously measured by OSE-UGD water managers). 3.3.2. Machine learning models predictive accuracy for phytoplankton composition Descriptive results of morphology-based functional groups (MBFG) composition during high and low turbidity periods are shown in Fig. 7. During PHTs, Group VI was dominant (Fig. 7). This group is exclusively represented by diatoms, which can prosper in environments with high resuspension, strong mixing and low light availability. The high sinking rate associated to the siliceous frustule, makes its persistence in the photic zone possible only under strong mixing conditions in the water column (Reynolds et al., 2002; Becker et al., 2009; Kruk and Segura 2012). In comparison, Group VII gains in relative dominance (Fig. 7) during low turbidity periods, probably because it is associated with high water column stability and relatively higher light availability (Kruk and Segura,
88
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
Fig. 6. (a) Standardized anomalies of turbidity and water level time series. (b) Standardized anomalies of turbidity and precipitation time series. Standardized anomalies were considered to visualize variables in the same scale. Series are presented considering weekly averages of daily data. The onset of each PHT is indicated.
Fig. 7. Relative importance of morphology-based functional groups (MBFG) in low (a) and high turbidity (b) periods. Each bar represents the percentage of observations where a certain group dominates over the others.
2012). Although in a much smaller extent, Group III is also relatively more abundant during low turbidity periods (Fig. 7). Despite this group’s association with limiting light conditions, other factors, such as nutrient availability (Kruk and Segura, 2012), could be explaining its relative dominance during high light availability periods. Group III and VII are groups that include toxic cyanobacteria such as Dolichospermum spp. (Group III) and Microcystis (Group VII). RF models correctly classified all observations of the training sample (classification error = 0) in both data sets, yielding better results than CART models. The RF test error (prediction error over an independent data set) was still acceptable for both data set 1
Table 2 Classification error for CART and Random Forests (RF) models for the MBFG response variable. Average results (±0.01 SD) over 100 stratified random splits of the data are presented both for the training and the test samples. Results for data set 1 and 2 are presented. Data set 1
Training Test
Data set 2
CART
RF
CART
RF
0.28 ± 0.1 0.43 (± 0.07)
0 0.32 ± 0.06
0.32 ± 0.10 0.45 ± 0.05
0 0.34 ± 0.05
(0.32 ± 0.06) and data set 2 (0.34 ± 0.05) where 32–34% of the cases were misclassified (Table 2).
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
89
Fig. 8. Random Forests variable importance plots. Variables importance for data set 1 (a) and data set 2 (b) considering the log(chl-a + 1) response variable; variables importance for data set 1 (c) and data set 2 (d) considering the MBFG variable. The measure of variable importance (Increase in node purity for regression and Mean Decrease Gini for classification) corresponds to the total decrease in node impurities from splitting on a particular variable, averaged over all the trees of the forest. For classification, the node impurity is measured by the Gini index. For regression, it is measured by the residual sum of squares.
3.3.3. Relation between predictors and response variables Random Forests variable importance plots are presented in Fig. 8 for each data sets and response variable considered, namely log(chl-a + 1) and MBFG. Regarding phytoplankton biomass (log(chl-a + 1)) results, when considering data set 1, biological variables were the less relevant ones with very low importance index (increase in node purity near zero, see legend of Fig. 8) (Fig. 8a). Turbidity was the most important predictor in both data sets, being the other environmental predictors of decreasing importance: conductivity, water temperature, water level, air temperature and cumulative precipitation of the preceding week (Fig. 8a, b). RF models show greater relevance of biological variables when predicting MBFG, as compared to log(chl-a + 1) predictions; still, some environmental variables maintained the highest importance index. For instance, turbidity, conductivity and water level were the most important variables, followed by two biological variables: the dominant MBFG in the previous days (1–10 days) and the preceding phytoplankton biovolume (Fig. 8c). When considering only the environmental variables (data set 2), turbidity was again the most
important variable, followed by conductivity, water level, air temperature, cumulative precipitation of the preceding week and water temperature (Fig. 8d). Although CART models did not give very accurate results, especially regarding the test errors, we considered CART output to better capture the relationship between predictors and response variables. Concerning the log (chl-a +1) response variable, we show results only for data set 1, since biological predictors showed very low importance indices and test errors were very similar for both data sets (Fig. 9). Highest chl-a values (greater than 10 ug/l, which corresponds to a value of 2.4 of the transformed variable, see Fig. 9) were related to relative low turbidity values (<24.4 UNT) and high water temperature (>21.5 ◦ C), to low turbidity (<14.95 TNU), or to relative low turbidity (<24.4 UNT) and low temperatures (<10.7 ◦ C) (Fig. 9). Inspection of the data that fell in the terminal nodes indicate that, in all these cases, group VI was dominant, but in the first and second cases, groups III and VII, and group VII respectively, were also important (Fig. 9).
90
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
Fig. 9. CART output for the prediction of the log (chl-a +1) variable considering the predictor variables of data set 1. For each terminal node, the frequency of dominance of each MBGF is indicated.
The CART output for the MBFG response variable in data set 1 (environmental and biological predictors) indicates the dominance of Group III when there was dominance of groups III or VII in the preceding 1–10 days and in high conductivity conditions (≥172.4 uS/cm) (Fig. 10a). Group VII is also predicted when there was dominance of groups III or VII in the previous days and in intermediate turbidity conditions (>26.6 but not higher than 29 TNU) (Fig. 10a). Considering data set 2 (environmental predictors only), group III is dominant under conductivity values ≥139 uS/cm and in low turbidity conditions (<11.1 TNU), among other variables attributes (Fig. 10b). Group VII is dominant under conditions of intermediate turbidity (≥25.6 but not higher than 29 TNU), or lower turbidity (<25.6 TNU) but high air temperatures (≥23 ◦ C or ≥25 ◦ C) (Fig. 10b). 4. Discussion Our contribution integrates a variety of approaches to predict phytoplankton biomass and composition dynamics in a eutrophic shallow lake where a high frequency record of limnological attributes and meteorological conditions is available. The results confirm that the phytoplankton dynamics in Laguna del Sauce are
strongly influenced by extreme meteorological events (associated mainly to wind and precipitation). Cross spectral analysis of turbidity and chl-a indicates a common periodicity of ∼2.5 years and with a phase shift of ∼1.2 years (average length of the PHTs approximately), turbidity leading chla. The statistical results support the hypothesis turbidity affecting phytoplankton biomass. Indeed, by combining these results and the exploratory analyses, we identified a turbidity threshold of 29 TNU below which chl-a values remains below ∼15–20 ug/l. The evidence demonstrates that about 30% of the time (total approximate length of PHTs), the phytoplankton biomass was limited by light conditions due to the inorganic turbidity. In summary, once a PHT is triggered, phytoplankton biomass remains within low values and when turbidity drops below 29 TNU, chl-a is allowed to vary freely. Interestingly, the results also suggest that inter-annual variability dominates the turbidity and chl-a dynamics in Laguna del Sauce, being the intra-annual (i.e. seasonal) dynamics of secondary relevance. The periods of high turbidity, which constrain the phytoplankton biomass, start with a sudden increase which cannot be attributed to internal lake dynamics. Field evidence confirms the role of the extreme meteorological events, acting as key drivers for the PHTs occurrence. The observed discontinuities caused by turbidity peaks, that reset the systems in terms of chl-a and hence in phytoplankton biomass, were explained by different drivers through a regression tree analysis. We used statistical approaches (as those used to model the length of PHTs) to predict extreme turbidity oscillations at the start of PHTs, because physically-based models were not capable of capturing turbidity peaks (results not shown). Based on the results from CART, we can speculate why the physically-based resuspension model fails to account for the turbidity peaks. Precipitation events associated with some sudden changes could be related to suspended material inputs associated to soil erosion and runoff, processes which were not accounted for in the model. On the other hand, other turbidity peaks are related to extreme wind events that include non-linear effects, which are not considered in the resuspension model either. Onset of PHTs were captured by the regression trees analysis. They show that the initial day of PHTs (or some day prior) have very particular characteristics that CART identify in separate nodes (isolating very few observations among ∼4000). Due
Fig. 10. CART output for the prediction of MBFG variable considering the predictor variables of data set 1 (a) and 2 (b). The numbers in terminal nodes indicate the frequency of the different MBFG inside each node in the following order: III/IV/VI/VII.
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
to the data-splitting nature of these models, they can neutralize outliers, putting them in separate nodes (Härdle and Simar, 2012). We took advantage of this property to learn about the meteorological/hydrological attributes associated to extreme turbidity events (i.e. outliers). The characteristics driving the onset of PHTs are varied. The first PHT (PHT1) was triggered by extreme wind events from the SW direction (>59 km/h). We also found a heavy precipitation event (68 mm in one day) two days before the beginning of the period. The PHT2 was also caused by extreme wind intensities, more precisely, the day before the onset of PHT2, wind speed of 92 km/h from the SW was registered. Furthermore, CART results point out that the presence of relative low cumulative precipitation during the previous 30 days was also a characteristic of the period leading to this event. It is worth noting that this PHT was consequence of one of the most severe extratropical cyclones registered in the last decades in Uruguay. The onset of PHT3 was associated with high water levels and relative high cumulative precipitation during the precedent 30 days. Furthermore, relative high (46 km/h) SW wind intensities were registered just before its onset. Finally, PHT4 was related to very low water levels in the ∼40 days leading to the abrupt turbidity peak which were promoted by moderate SW wind intensities (30–40 km/h). The key wind events were associated exclusively to the SW direction. The protection against winds from the eastern sector (predominant in Uruguay) by the highland Sierra de La Ballena, determine a very particular condition of Laguna del Sauce in comparison with other shallow lakes studied in La Pampa region. There is a strong interaction between sediments and the water column in shallow lakes, which can be altered by an external forcing, such as strong wind events. These events can drastically change those interactions affecting nutrient recycling, light availability and, in turn, phytoplankton biomass and composition (Beaver et al., 2013; Havens et al., 2016). Besides, extreme drought or rainfall will probably affect lake dynamics in terms of phytoplankton biomass and composition. Drought events cause an increase in retention time, conductivity and raise the nutrient concentrations and the water temperature, while heavy rainfall events have contrasting effects, which will depend mainly on the effect of flooding on water nutrient concentrations and transparency (Bakker and Hilt, 2016). Finally, all these effects will be modulated by local conditions such as the maximum fetch in the direction of predominant or more intense winds, lake area, sediment granulometry, among others. The response of the phytoplankton community to extreme meteorological events in shallow lakes is not always easy to predict. It depends on the immediate effect of the meteorological events, particularly on light availability and the time it takes to return to pre-event conditions. The results from the physically-based model during PHTs consistently show that the decaying time-scale of turbidity is well over 100 days. The granulometric features of the sediment of Laguna del Sauce, in particular the predominance of silt and clay-sized particles (Inda and Steffen, 2010) associated to the occurrence of intermediate (and less severe) resuspension events, determine the long-term turbidity decay of PHTs. To both understand and predict phytoplankton biomass (chla) and composition (predominant MBFG) after the system returns to pre-events transparency (or turbidity), we performed CART and Random Forests analyses. While CART did not perform very well, Random Forests results were reasonable, presenting a proportion of explained variance over the test sample of 0.46 for the log(chl-a + 1) variable(for both subsets of variables considered) and a classification error over the test sample of 0.32 and 0.34 for subset 1 and 2 respectively, for the MBFG variable. These are very encouraging results considering that predictions were made over a set of easily measured variables available to water managers.
91
Accordingly to CART output and Random Forests variable importance plots, the phytoplankton total biomass and functional group composition were explained mainly by turbidity, conductivity, temperature and water level. Highest total biomass (chl-a) occurred in periods with relative low turbidity (<24.4 UNT) combined with high water temperature (≥21.5 ◦ C) or just low turbidity (<14.95 TNU). We identified the environmental conditions related to the dominance of different functional groups, especially those associated to harmful cyanobacteria. Group III (related to species such as Dolichospermum spp.) was associated with low turbidity (<11.1 TNU) and high conductivity values (≥139.1 uS/cm). Group VII (related to genera such as Microcystis), was associated to intermediate turbidity (≥25.6 but not higher than 29 TNU) or to low turbidity (<25.6 TNU) combined with high air temperatures (≥23 ◦ C or ≥25 ◦ C). Overall, these conditions are in agreement with those reported in the literature for the MBFG (e.g.: Reynolds et al., 2002; Kruk and Segura, 2012) or associated genera (O’Farrell et al., 2012). Water level, which is probably negatively correlated with conductivity, was identified as an important variable in Random Forests models. In summary, good light conditions, high conductivity (probably associated with low water level, high residence time and hence high nutrients availability), and high air or water temperatures (related to the predominance of these groups in warmer seasons) explain an important part of the variability of total phytoplankton biomass and MBFG dynamics in Laguna del Sauce. The secondary role of the system memory, as measured by previous phytoplankton functional assemblage, demonstrates that the main temporal trajectories are controlled by external meteorological conditions and its influence on turbidity, water residence, conductivity and temperature. The succession process and the internal interactions among phytoplankton functional groups can explain the temporal variability in short temporal scales (weeks). In this sense, models considering community attributes as explanatory variables –in addition to environmental drivers- did not improve (or did so in a small degree) the predictive accuracy with respect to models that did not According to the results, the most desirable conditions from a water quality standpoint are those related to the PHTs, since during these periods turbidity controls chl-a concentrations. But as was mentioned above, these periods could have delayed effects in the system when turbidity drops below certain thresholds. If PHTs occur alternated by clear periods where cyanobacteria dominate, there will be a memory in the system that could generate more intense blooms from one clear period to another because of inoculation of akientes or cists (Søndergaard et al., 1992). Our data set did not include nutrients concentrations nor information about the grazing pressure, which are crucial to understand the phytoplankton dynamics. This information would probably improve the predictive skill of models significantly. Understanding how extreme events and climate variability affect properties of shallow lakes allows us to predict how those ecosystems might be affected by cyanobacteria blooms (Havens et al., 2016). In addition, knowing extreme events dynamics and trends in a specific region could clarify future scenarios and hence contribute to the assessment of risk associated to water supply. In this context, previous studies indicate a trend to wetter conditions as well as to more frequent extreme precipitation events in Uruguay (Haylock et al., 2006). As exposed above, these events could be responsible for the onset of PHTs. To our knowledge, there are currently no studies regarding extreme wind events, and in particular, extra tropical cyclone trends in Uruguay. Previous studies considered a combination of different Machine learning models to explain and predict freshwater phytoplankton biomass and composition (Welk et al., 2005; Recknagel et al., 2006).
92
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93
Further studies used Machine learning tools for modelling phytoplankton time series (Jeong et al., 2001; Welk et al., 2005; Jeong et al., 2006; Recknagel et al., 2006; Ye and Cai, 2009). In most cases supervised and non-supervised Artificial Neural Networks (ANNs) were considered. While these techniques present very interesting attributes in the context of ecological modelling (Lek and Guégan, 1999), they have the disadvantage of being unstable (small changes in the training data used to build the model may result in very different models, Breiman, 1996). In contrast, studies of phytoplankton biomass and composition prediction considering Random Forests are scarce (e.g., Kruk and Segura, 2012; Segura et al., 2017). In our opinion, considering that this technique is reported as one of the most powerful Machine learning tools (James et al., 2013), it application in fresh water ecosystems analyses, and in particular to predict phytoplankton biomass and composition deserves further attention. We combined powerful statistical tools and physically-based models, to analyze key processes occurring in Laguna del Sauce. The combination of models of different nature is, in our opinion, an interesting approach which can better capture the dynamics that determine water quality, especially when working with shallow lakes exposed to strong wind events, where physical processes (i.e. resuspension) are of central importance (Scheffer, 2004). The results, based on the consideration of a 11-year time series of meteorological, hydrological, physicochemical and biological variables, continuously measured by water managers, advances the understanding of relevant components and patterns of temporal variability of phytoplankton biomass and composition in Laguna del Sauce. 5. Conclusions In the current water quality situation, predictive models are crucial to anticipate non- desirable water quality conditions in lakes that serve as drinking water sources. Through the combination of different modeling strategies that consider meteorological, water quality and hydrological variables, we were able to explain a significant part of the variability of phytoplankton biomass and composition dynamics in Laguna del Sauce (Southeast Uruguay). Since such variables are continuously measured by water managers, the results could serve as an input to water supply managers both to anticipate non desirable water quality conditions and to improve monitoring strategies so as to increase accuracy of predictive models in the future. 6. Formatting and funding sources The authors gratefully acknowledge the support given by the Uruguayan National Agency of Research and Innovation (ANII) (CC Postdoctoral fellowship) and to the ECOS-sud program (ECOS n◦ U14E02). Acknowledgments The authors are very grateful to the Uruguayan water supply company (OSE-UGD), the Uruguayan Institute of Meteorology (INUMET) and the Uruguayan National Water Directorate (DINAGUA) for providing long-term data on physicochemical, biological, hydrological and meteorological variables. We also thank to Dra. Carla Kruk for her comments on phytoplankton morphologybased functional groups dynamics, to MSc. Magdalena Crisci for her assistance in the digitalization of bathymetric data, to Dra.Valeria ˜ for her advice on the analysis of extreme wind events in Duranona Uruguay, and to Paula Santos for language revision.
Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ecolmodel.2017. 06.017. References Bakker, E.S., Hilt, S., 2016. Impact of water-level fluctuations on cyanobacterial blooms: options for management. Aquat. Ecol. 50 (3), 485–498. Beaver, J.R., Casamatta, D.A., East, T.L., Havens, K.E., Rodusky, A.J., James, R.T., Tausz, C.E., Buccier, K.M., 2013. Extreme weather events influence the phytoplankton community structure in a large lowland subtropical lake (Lake Okeechobee, Florida, USA). Hydrobiologia 709 (1), 213–226. Becker, V., Huszar, V.L.M., Crossetti, L.O., 2009. Responses of phytoplankton functional groups to the mixing regime in a deep subtropical reservoir. Hydrobiologia 628 (1), 137–151. Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J., 1984. Classification and Regression Trees. Wadsworth, Belmont, CA. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123–140. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Carmichael, W.W., 2001. Health effects of toxin-producing cyanobacteria: the CyanoHABs. Human Ecol. Risk Assess.: Int. J. 7 (5), 1393–1407. Carpenter, S.R., Stanley, E.H., Vander Zanden, M.J., 2011. State of the world’s freshwater ecosystems: physical, chemical, and biological changes. Annu. Rev. Environ. Resour. 36, 75–99. Clifton, H.E., Dingler, J.R., 1984. Wave-formed structures and paleoenvironmental reconstruction. Mar. Geol. 60 (1–4), 165–198. Crisci, C., Ghattas, B., Perera, G., 2012. A review of Supervised Machine Learning algorthims and their applications to ecological data. Ecol. Modell. 240, 113–122. Cutler, D.R., Edwards, T.C.J., Beard, K., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., 2007. Random forest for classification in ecology. Ecology 88 (11), 2783–2792. Cymbola, J., Ogdahl, M., Steinman, A.D., 2008. Phytoplankton response to light and internal phosphorus loading from sediment release. Freshwater Biol. 53 (12), 2530–2542. Evans, R.D., 1994. Empirical evidence of the importance of sediment resuspension in lakes. Hydrobiologia 284 (1), 5–12. Falconer, I.R., Humpage, A.R., 2006. Cyanobacterial (blue-green algal) toxins in water supplies: cylindrospermopsins. Environ. Toxicol. 21 (4), 299–304. Härdle, W.K., Simar, L., 2012. Applied Multivariate Statistical Analysis. Springer Science & Business Media, 516 pp. Havens, K., Paerl, H., Phlips, E., Zhu, M., Beaver, J., Srifa, A., 2016. Extreme weather events and climate variability provide a lens to how shallow lakes may respond to climate change. Water 8, 229. Haylock, M.R., Peterson, T.C., Alves, L.M., Ambrizzi, T., Anunciac¸ão, Y.M.T., Baez, J., Barros, V.R., Berlato, M.A., Bidegain, M., Coronel, G., Corradi, V., García, V.J., Grimm, A.M., Karloy, D., Marengo, J.A., Marino, M.B., Moncunill, D.F., Nechet, D., Quintana, J., Rebello, E., Rusticucci, M., Santos, J.L., Trebejo, I., Vincent, L.A., 2006. Trends in total and extreme South American rainfall in 1960–2000 and links with sea surface temperature. J. Clim. 19 (8), 1490–1512. Heisler, J., Glibert, P.M., Burkholder, J.M., Anderson, D.M., Cochlan, W., Dennison Dortch, Q., Gobler, C.J., Heil, C.A., Humphries, E., Lewitus, A., Magnien, R., Marshall, H.G., Sellner, K., Stockwell, D.A., Stockwell, D.K., Suddleson, M., 2008. Eutrophication and harmful algal blooms: a scientific consensus. Harmful Algae 8 (1), 3–13. Hillebrand, H., Dürselen, C.D., Kirschtel, D., Pollingher, U., Zohary, T., 1999. Biovolume calculation for pelagic and benthic microalgae. J. Phycol. 35 (2), 403–424. Inda, H., Steffen, M. Eds., 2010. Bases técnicas para el manejo integrado de Laguna del Sauce y cuenca asociada. Editorial Montevideo. James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning, Vol. 6. springer, New York. Jeong, K.S., Joo, G.J., Kim, H.W., Ha, K., Recknagel, F., 2001. Prediction and elucidation of phytoplankton dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network. Ecol. Modell. 146 (1), 115–129. Jeong, K.S., Kim, D.K., Joo, G.J., 2006. River phytoplankton prediction model by Artificial Neural Network: model performance and selection of input variables to predict time-series phytoplankton proliferations in a regulated river system. Ecol. Inf. 1 (3), 235–245. Keddy, P.A., 1992. A pragmatic approach to functional ecology. Funct. Ecol. 6 (6), 621–626. Kosten, S., Huszar, V.L.M., Bécares, E., Costa, L.S., Van Donk, E., Hansson, L.A., Jeppesen, E., Kruk, C., Lacerot, G., Mazzeo, N., De Meester, L., Moss, B., Lurling, M., Nõges, T., Romo, S., Scheffer, M., 2012. Warmer climates boost cyanobacterial dominance in shallow lakes. Global Change Biol. 18 (1), 118–126. Kruk, C., Segura, A.M., 2012. The habitat template of phytoplankton morphology-based functional groups. Hydrobiologia 698 (1), 191–202. Kruk, C., Huszar, V.L., Peeters, E.T., Bonilla, S., Costa, L., Lürling, M., Reynolds, C.S., Scheffer, M., 2010. A morphological classification capturing functional variation in phytoplankton. Freshwater Biol. 55 (3), 614–627. Lek, S., Guégan, J.F., 1999. Artificial neural networks as a tool in ecological modelling, an introduction. Ecol. Modell. 120 (2), 65–73.
C. Crisci et al. / Ecological Modelling 360 (2017) 80–93 Lewis, W.M., 1976. Surface/volume ratio: implications for phytoplankton morphology. Science 192 (4242), 885–887. Liaw, A., Wiener, M., 2002. Classification and Regression by randomForest. R News 2 (3), 18–22. Litchman, E., Klausmeier, C.A., 2008. Trait-based community ecology of phytoplankton. Annu. Rev. Ecol. Evol. Syst. 39-, 615–639. Lund, J.W.G., Kipling, C., Le Cren, E.D., 1959. The inverted microscope method of estimating algal numbers and the statistical basis of estimations by counting. Hydrobiologia 11, 143–170. Margalef, R., 1978. Life-forms of phytoplankton as survival alternatives in an unstable environment. Oceanologica acta 1 (4), 493–509. Martinuzzi, S., Januchowski-Hartley, S.R., Pracheil, B.M., McIntyre, P.B., Plantinga, A.J., Lewis, D.J., Radeloff, V.C., 2014. Threats and opportunities for freshwater conservation under future land use change scenarios in the United States. Global Change Biol. 20 (1), 113–124. Mazzeo, N., García-Rodríguez, F., Rodríguez, A., Méndez, G., Iglesias, C., Inda, H., Goyenola, G., García, S., Fosalba, C., Marroni, S., Crisci, C., Del Puerto, L., Clemente, J., Pacheco, P., Carballo, C., Kroeger, A., Vianna, M., Meerhoff, M., Steffen, M., Lagomarsino, J.J., Masdeu, M., Vidal, N., Texeira de Mello, F., González, I., Larrea, D., 2010. stado trófico de Laguna del Sauce y respuestas asociadas. In: Inda, H., Steffen, M. (Eds.), Bases técnicas para el manejo integrado de Laguna del Sauce y cuenca asociada. Editorial Montevideo, pp. 31–49. Moss, B., Kosten, S., Meerhof, M., Battarbee, R., Jeppesen, E., Mazzeo, N., Havens, K., Lacerot, G., Liu, Z., De Meester, L., Paerl, H., Scheffer, M., 2011. Allied attack: climate change and eutrophication. Inland Waters 1 (2), 101–105. Naselli-Flores, L., Padisák, J., Albay, M., 2007. Shape and size in phytoplankton ecology: do they matter? Hydrobiologia 578 (1), 157–161. Nusch, E.A., 1980. Comparison of methods for chlorophyll and phaeopigment determination. Arch. Hydrobiol. Beih. Ergebn. Limnol. 14, 14–36. O’Farrell, I., Bordet, F., Chaparro, G., 2012. Bloom forming cyanobacterial complexes co-occurring in a subtropical large reservoir: validation of dominant eco-strategies. Hydrobiologia 698 (1), 175–190. O’Neil, J.M., Davis, T.W., Burford, M.A., Gobler, C.J., 2012. The rise of harmful cyanobacteria blooms: the potential roles of eutrophication and climate change. Harmful Algae 14, 313–334. Pacheco, J.P., Iglesias, C., Meerhoff, M., Fosalba, C., Goyenola, G., Teixeira de Mello García, S., Gelós, M., García-Rodríguez, F., 2010. Phytoplankton community structure in five subtropical shallow lakes with different trophic status (Uruguay): a morphology-based approach. Hydrobiologia 646 (1), 187–197. Paerl, H.W., Huisman, J., 2009. Climate change: a catalyst for global expansion of harmful cyanobacterial blooms. Environ. Microbiol. Rep. 1 (1), 27–37. Paerl, H.W., Fulton, R.S., Moisander, P.H., Dyble, J., 2001. Harmful freshwater algal blooms, with an emphasis on cyanobacteria. Sci. World J. 1, 76–113. Recknagel, F., Cao, H., Kim, B., Takamura, N., Welk, A., 2006. Unravelling and forecasting algal population dynamics in two lakes different in morphometry and eutrophication by neural and evolutionary computation. Ecol. Inf. 1 (2), 133–151. Reynolds, C.S., Huszar, V., Kruk, C., Naselli-Flores, L., Melo, S., 2002. Towards a functional classification of the freshwater phytoplankton. J. Plankton Res. 24 (5), 417–428.
93
Reynolds, C.S., 1984a. Phytoplankton periodicity: the interaction of form, function and environmental variability. Freshwater Biol. 14, 111–142. Reynolds, C.S., 1984b. The Ecology of Freshwater Phytoplankton. Cambridge University Press. Reynolds, C.S., 2006. The Ecology of Phytoplankton. Cambridge University Press. Rodríguez, A., Méndez, G., Inda, H., Lagomarsino, J.J., Steffen, M., 2010a. Características y problemática de la Laguna del Sauce. In: Inda, H., Steffen, M. (Eds.), Bases técnicas para el manejo integrado de Laguna del Sauce y cuenca asociada. Editorial Montevideo, pp. 15–17. Rodríguez, A., Méndez, G., Kausas, S., Clemente, J., Kroger, A., Mazzeo, N., 2010b. Importancia de la carga externa e interna de nutrientes en el estado trófico de la Laguna del Sauce. In: Inda, H., Steffen, M. (Eds.), Bases técnicas para el manejo integrado de Laguna del Sauce y cuenca asociada. Editorial Montevideo, pp. 53–61. Søndergaard, M., Kristensen, P., Jeppesen, E., 1992. Phosphorus release from resuspended sediment in the shallow and wind-exposed Lake Arresø, Denmark. Hydrobiologia 228 (1), 91–99. Scheffer, M., van Nes, E.H., 2007. Shallow lakes theory revisited: various alternative regimes driven by climate, nutrients, depth and lake size. Hydrobiologia 584 (1), 455–466. Scheffer, M., Hosper, S.H., Meijer, M.L., Moss, B., Jeppesen, E., 1993. Alternative equilibria in shallow lakes. Trends Ecol. Evol. 8 (8), 275–279. Scheffer, M., 2004. Ecology of Shallow Lakes. Springer Science & Business Media. Segura, A.M., Piccini, C., Nogueira, L., Alcántara, I., Calliari, D., Kruk, C., 2017. Increased sampled volume improves Microcystis aeruginosa complex (MAC) colonies detection and prediction using Random Forests. Ecol. Indic. 79, 347–354. Southwood, T.R.E., 1977. Habitat, the templet for ecological strategies? The Journal of Animal Ecology, 337–365. Southwood, T.R.E., 1988. Tactics, strategies and templets. Oikos, 3–18. Therneau, T., Atkinson, B., Ripley, B., 2015. Rpart: Recursive Partitioning and Regression Trees. R Package Version 4. 1–10, http://CRAN.R-project.org/ package=rpart. Utermöhl, H., 1958. Zur Vervollkommnong der quantitativen Phytoplankton metodik. Mitt.int. Ver. Limnol. 9, 1–38. Welch, P.D., 1967. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15 (2), 70–73. Welk, A., Recknagel, F., Burch, M., 2005. Ordination, clustering and forecasting of phytoplankton dynamics in the Myponga drinking water reservoir by means of supervised and non-supervised artificial neural networks. In: Proceeding of the International Congress on Modelling and Simulation MODSIM 2005, December, Melbourne, Victoria. Ye, L., Cai, Q., 2009. Forecasting daily chlorophyll a concentration during the spring phytoplankton bloom period in Xiangxi Bay of the Three-Gorges Reservoir by means of a recurrent artificial neural network. J. Freshwater Ecol. 24 (4), 609–617.