Accepted Manuscript Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees Nataša Mori, Barbara Debeljak, Mateja Škerjanec, Tatjana Simčič, Tjaša Kanduč, Anton Brancelj PII:
S0043-1354(18)30910-2
DOI:
https://doi.org/10.1016/j.watres.2018.10.093
Reference:
WR 14204
To appear in:
Water Research
Received Date: 29 May 2018 Revised Date:
26 October 2018
Accepted Date: 30 October 2018
Please cite this article as: Mori, N., Debeljak, B., Škerjanec, M., Simčič, T., Kanduč, T., Brancelj, A., Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees, Water Research, https://doi.org/10.1016/j.watres.2018.10.093. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees
RI PT
Nataša Mori(1)*, Barbara Debeljak(1), Mateja Škerjanec(2), Tatjana Simčič (1), Tjaša Kanduč(3), and Anton Brancelj(1,4) (1)
SC
National Institute of Biology, Department of Organisms and Ecosystems Research, Večna
pot 111, 1000 Ljubljana, Slovenia (2)
M AN U
University of Ljubljana, Faculty of Civil and Geodetic Engineering, Jamova 2, Ljubljana,
Slovenia (3)
Jožef Stefan Institute, Department of Environmental Sciences, Jamova 39, 1000 Ljubljana,
Slovenia (4)
EP
Gorica, Slovenia
TE D
University of Nova Gorica, School for Environmental Sciences, Vipavska 13, 5000 Nova
AC C
*corresponding author:
[email protected]
1
ACCEPTED MANUSCRIPT Abstract
2
Integrity of freshwater surface- and groundwater ecosystems and their ecological and
3
qualitative status greatly depends on ecological processes taking place in streambed sediments
4
overgrown by biofilm, in the hyporheic zone (HZ). Little is known about the interactions and
5
effects of multiple stressors on biologically driven processes in the HZ. In this study, machine
6
learning (ML) tools were used to provide evidence-based information on how stressors and
7
ecologically important environmental factors interact and drive ecological processes and
8
microbial biomass. The ML technique of decision trees using the J48 algorithm was applied
9
to build models from a data set of 342 samples collected over three seasons at 24 sites within
10
the catchments of five gravel-bed rivers in north-central Slovenia. Catchment-scale land use
11
data and reach-scale environmental features indicating the HZ morphology and physical and
12
chemical characteristics of water were used as predictive variables, while respiration (R) and
13
microbial respiratory electron transport system activity (ETSA) were used as response
14
variables indicating ecological processes and total protein content (TPC) indicating microbial
15
biomass. Separate models were built for two HZ depths: 5-15 cm and 20-40 cm. The models
16
with R as a response variable have the highest predictive performance (67-89%) showing that
17
R is a good indicator of complex environmental gradients. The ETSA and TPC models were
18
less accurate (42-67%) but still provide valuable ecological information. The best model show
19
that temperature when combined with selected water quality elements is an important
20
predictor of R at depth of 5-15 cm. The ETSA and TPC models show the combined effects of
21
temperature, catchment land use and selected water quality elements on both response
22
variables. Overall, this study provides new knowledge on how ecological processes occurring
23
in the HZ respond to catchment and reach-scale variables, and provides evidence-based
24
information about complex interactions between temperature, catchment land use and water
25
quality. These interactions are highly dependent on the selection of the response variable, i.e.,
26
each response variable is influenced by a specific combination of predictive environmental
27
variables.
28
Key words: machine learning, ecosystem processes, water quality, stressors, freshwater
29
biofilm, hyporheic zone
AC C
EP
TE D
M AN U
SC
RI PT
1
30
1
ACCEPTED MANUSCRIPT 31
1. Introduction Freshwaters are affected by a diversified array of anthropogenic stressors that often
32
interact and are, at the same time, directly or indirectly linked to biological responses.
34
Interacting stressors can have combined effects the same (additive), greater (synergistic) or
35
smaller (antagonistic) than the sum of their individual effects (Piggott et al., 2015).
36
Understanding the effects of single stressors, multiple stressors and their interactions is a
37
precondition for effective catchment management and river rehabilitation or restoration
38
(Palmer et al., 2005; Pavlin et al., 2011). Moreover, it is critical to assess the response of
39
ecosystem processes to stressors if their effects on the ecosystem services - that provide
40
benefits to humans – are to be properly understood (von Schiller et al., 2017). Consequently,
41
there is an increasing number of experimental studies examining the relationships between
42
stressors and their effects on aquatic ecosystem functioning (Matthaei et al., 2010; Ferreira
43
and Chauvet, 2011; Ponsatí et al., 2016), Moreover, several conceptual models and analytical
44
frameworks looking at the impact of multiple stressors on surface and groundwater
45
ecosystems have been developed (Jackson et al., 2016; Feld et al., 2016; Kaandorp et al.,
46
2018).
SC
M AN U
TE D
Stressors affect both the structure and function of aquatic ecosystems (Vinebrooke et
EP
47
RI PT
33
al., 2004). Ecosystem structure is characterized by physical features such as river channel
49
morphology, water quality, and biomass or the composition of biological communities,
50
whereas ecosystem functioning describes those processes that regulate energy and matter flux
51
in an ecosystem (Sandin and Solimini, 2009). Biologically driven ecosystem processes
52
include organic matter decomposition, nutrient cycling, metabolism, and pollutant and
53
community dynamics (von Schiller et al., 2017). Although the EU Water Framework
54
Directive (EU, 2000) defines ecological status as “an expression of the quality of the structure
55
and functioning of aquatic ecosystems associated with surface waters”, there remains a greater
AC C
48
2
ACCEPTED MANUSCRIPT scientific understanding of the structural characteristics of freshwater ecosystems. There has
57
been much less progress made towards developing and standardizing methods that measure
58
ecosystem functioning or that incorporate them into a river ecological status assessment
59
(Palmer and Febria, 2012). One of the advantages of functional measures or indicators is that
60
they can be translated directly into the concept of ecosystem services (von Schiller et al.,
61
2017).
62
RI PT
56
Many important ecosystem processes in running waters takes place in the hyporheic zone (HZ), i.e., the transition zone between surface and groundwater (Orghidan, 1959; Krause
64
et al., 2017). Here, active regions of carbon flux occur together with the mixing of ground-
65
and surface water (Boulton et al., 1998; Battin et al., 2009). This hydrologically highly
66
dynamic and biologically sensitive system is a suitable environment for implementing
67
functional indicator framework to identify impacts, new stressors in the catchments, and a
68
reduction in ecosystem services, such as reduced self-purification processes. Sediment
69
respiration (oxygen consumption, R), is frequently used to measure the response of the HZ
70
ecosystem function to natural conditions and anthropogenic pressures (Hill et al., 2000;
71
Hadwen et al., 2010; Doering et al., 2011). Other functional parameters include microbial
72
metabolic activity assays, such as fluorescein diacetate hydrolysis (FDA), potential
73
denitrification enzyme activity (DEA), substrate-induced respiration (SIR), extracellular
74
phosphatase activity (EPA), and respiratory electron transport system activity (ETSA)
75
(Simčič and Mori 2007; Aristi et al., 2015; Debeljak et al., 2016; Ponsati et al., 2016).
76
Various studies have shown the sensitivity of these parameters to temperature (Hill et al.,
77
2000; Doering et al., 2011), nutrients (Hill et al., 2000), pollution gradient (Aristi et al., 2015),
78
hydro-morphological features (Simčič and Mori, 2007; Nogaro et al., 2013), land use
79
(Debeljak et al., 2017) and newly emerging pollutants (Ponsati et al., 2016). Since multiple
80
stressors and naturally fluctuating environmental factors simultaneously influence most
AC C
EP
TE D
M AN U
SC
63
3
ACCEPTED MANUSCRIPT ecosystems, their consequences are often unpredictable, especially when based on single
82
stressor (Mathaei et al., 2010). To date, only a few studies have investigated the response of
83
functional variables on the combined effects of selected stressors, such as temperature and
84
nutrients (Ferreira and Chauvet, 2011; Rosa et al., 2013), or nutrient enrichment, the amount
85
of fine sediment, and water abstraction (Matthaei et al., 2010). In a recent review, Nõges et al.
86
(2017) concluded that despite an extensive basic knowledge of aquatic ecology, only a few
87
studies actually provide measurable evidence on multi-stress effects, and most models either
88
represent a single water body or are based on a single survey.
SC
RI PT
81
Nowadays, new tools and approaches of data processing exist that can help to explain
90
some of the phenomena discussed above. For example, machine-learning (ML) methods have
91
the ability to discover complex patterns in data sets and allow in-depth analyses (Gal et al.,
92
2013). Decision tree induction (Quinlan, 1986) is a ML approach that allows the user to apply
93
recursive data-partitioning techniques to construct automatically a model (decision tree) for
94
predicting variables with nominal values. The advantage of decision tree methods, compared
95
to statistical methods, is that they are nonparametric methods, i.e., do make no assumption
96
about the distribution of the dependent variable, and have a higher interpretative power than
97
majority of the statistical methods (Gal et al., 2013). Decision trees have been successfully
98
applied in predicting chemical parameters of river water quality from bioindicator data
99
(Džeroski et al., 2000), predicting stream invertebrates and algal blooms (Dakou et al., 2007;
100
Volf et al., 2011), modelling lake zooplankton dynamics (Gal et al., 2013), and analysing the
101
impacts of exotic species on ecosystems (Boets et al., 2013).
AC C
EP
TE D
M AN U
89
102
In this study, the interaction between catchment- and reach-scale environmental
103
factors, some of which exceed their natural ranges and hence act as stressors, and their impact
104
on selected ecological indicators in the HZ were investigated by induction of decision trees.
105
Three types of ecological indicators were used as response variables: respiration (oxygen 4
ACCEPTED MANUSCRIPT consumption measured in situ; R), respiratory potential measured as respiratory electron
107
transport system activity at 15°C (ETSA), and total protein content (TPC) as a proxy for
108
microbial biomass. These indicators are relatively simple to measure, have been well studied
109
in aquatic ecosystems, and are sensitive indicators of ecosystem stress (e.g., Hill et al., 2000;
110
Franken et al., 2001; Simčič et al., 2015; Debeljak et al., 2015). To encompass spatial and
111
temporal variability, data was collected over three seasons (spring, summer, winter), five
112
catchments and at two HZ depths. The objectives of this paper are to: a) evaluate the
113
sensitivity of ecological indicators to multiple stressors occurring together with naturally
114
fluctuating environmental factors, and b) identify the interactions between multiple stressors
115
and environmental factors to find the best predictors of the measured ecological indicators,
116
both by using ML tools
M AN U
SC
RI PT
106
117
2.
119
2.1 Study area
120
Methods
TE D
118
This study was carried out in five pre-Alpine catchments (Gradaščica, Kamniška Bistrica, Kokra, Selška Sora, and Tržiška Bistrica) located in north-central Slovenia in SE
122
Europe (Fig. 1). The catchment areas range from 146 km2 to 539 km2 with rivers from 27 to
123
34 km in length. The elevation of study sites ranges from 262 and 490 m a.s.l. in area with
124
predominant carbonate and/or silicate geology (e.g., upper Triassic limestone and dolomite,
125
tufa, sandstones, conglomerate, clay, and marls) (Komac, 2005). Mean annual precipitation
126
ranges from 1100 to 2200 mm and the mean annual discharges are from 2.9 to 8.8 m3 s-1
127
(Slovenian Environmental Agency).
AC C
EP
121
128
The dominant land use in all five catchments is natural, mixed coniferous and
129
deciduous forest. A moderate increase of agricultural (including arable land and grasslands)
130
and urban land use (including towns, residential areas, industrial zones) is observed 5
ACCEPTED MANUSCRIPT longitudinally. The rivers are partially channelized, especially in urban areas. Instream weirs
132
(i.e., low head dams) and embankments are also present, which are used to moderate flow
133
prevent flooding. In most agricultural areas, riparian zone is dominated by willow, alder and
134
species typical of the region, while in urban areas embankments prevent the development of a
135
riparian strip.
RI PT
131
136
138
2.2. Data collection
Land use data were extracted from the polygon database of the Slovenian Ministry for
SC
137
the Agriculture, Forestry and Food. The proportion of land use types (forest, agricultural,
140
urban) was determined for the contributing part of the catchment upstream of each sampling
141
site and for the 250 m impact zone adjacent to the studied river segment. The samples were
142
collected at 24 (autumn and winter 2013) and 9 (spring 2014) locations within five
143
experimental catchments to encompass the variability in adjacent land use pressures, reach-
144
scale environmental features, and ecological responses. Samples were takenat riffle
145
mesohabitats of the river where the HZ is rarely studied, but are of great ecological
146
importance for biota and extend over large parts of the streambed (Storey et al., 2003). At
147
each sampling site, three spatial replicates were selected within the river channel in order to
148
take into the account the reach-scale variability in the data. The data were gathered at two
149
depths: 5-15 and 20-40 cm, in order to observe differences in ecological processes biomass
150
amount along a vertical gradient, which is an important controlling factor of ecological
151
processes in the HZ (Storey et al., 2003). During the spring, the number of sampling sites was
152
lower due to high water levels that prevented sampling as specific locations. Respiration at
153
depth of 20-40 cm was measured only in spring, resulting in 27 data records.
154 155
AC C
EP
TE D
M AN U
139
At each site and at each depth, the water temperature, conductivity, pH and oxygen levels were measured in triplicate using field probes (WTW Multi 3430 set), while water 6
ACCEPTED MANUSCRIPT samples were collected for laboratory analysis. Once in the laboratory, alkalinity was
157
measured using Gran titrations, ion chromatography (Metrohm, 761 Compact IC) was applied
158
to analyse the cations and anions, total phosphate (Ptot) and total nitrogen (Ntot) were
159
determined spectrophotometrically (Perkin Elmer, Lambda 25), and dissolved organic carbon
160
(DOC) was determined using the non-purgeable organic carbon (NPOC) method (Analytic
161
Jena Multi C/N 3100).
162
RI PT
156
Sediment samples for in situ R, ETSA and TPC measurements at HZ depth of 5-15 cm, were obtained using a PVC sampling tube (30 cm width, 60 cm high). Sediment from the
164
bottom of the sampling tube was collected after removing the surface layer and sieved
165
through a 5 mm mesh sieve. Part of the sample was used for in situ R, and part was used for
166
ETSA and TPC measurements. Sediment samples from a depth of 20 to 40 cm were obtained
167
using the Bou–Rouch method (Bou and Rouch, 1967), where a perforated pipe (5 mm
168
apertures) was inserted into the sediments and samples were extracted using a piston pump.
169
Particulate organic matter (POM) was determined as loss on ignition at 550°C for 3 h. The
170
fine suspended sediment, as an indicator of river bed clogging and the fine organic matter,
171
were determined by incubating water samples (1 L) for 24 samples obtained by either stirring
172
river bed sediments or by pumping and weighting the residue after drying (24 h, 60°C) and
173
ignition (3 h, 550°C). The sediment composition was estimated by fractionating the dry
174
sediment into five grain-size classes (<0.063 mm, 0.063-0.2 mm, 0.2-2 mm, 2-4 mm¸4-5 mm)
175
using a series of stainless steel sieves.
M AN U
TE D
EP
AC C
176
SC
163
In situ R was measured using the closed bottle system (Uehlinger et al., 2002).
177
Plexiglas tubes were half-filled with sieved sediment (< 5 mm), filled to the top with water,
178
sealed and incubated in situ for 2 hours. An optical dissolved oxygen sensor (WTW, FDO®
179
925) was then used to measure temperature and oxygen concentration before and after
180
incubation. Respiration was expressed as O2 consumption per gram of dry weight of sediment 7
ACCEPTED MANUSCRIPT per hour (µL O2 g DW−1 h−1). The ETSA was measured by applying a modified assay adapted
182
from Packard (1971). The frozen sediment samples were thawed and homogenized in an ice-
183
cold homogenization buffer. Samples were then centrifuged, and an aliquot of supernatant
184
was incubated with the substrate (0.1 M sodium phosphate buffer pH = 8.4; 1.7 mM NADH;
185
0.25 mM NADPH; 0.2 % (v/v)) and reagent (Triton-X-100; 2.5 mM 2-(p-iodophenyl)-3-(p-
186
nitrophenyl)-5-phenyl tetrazolium chloride)) solution for 40 min at 15°C. Formazan
187
production was determined spectrophotometrically and the ETSA was measured as the rate of
188
tetrazolium dye reduction, which was converted to oxygen used per dry mass in a given time
189
interval (µL O2 g DW−1 h−1). An estimate of total protein content was made calorimetrically
190
according to the method of Lowry et al. (1951) using a Sigma Protein Assay Kit (P 5656
191
Sigma Diagnostics, St Louis, MO, USA). All field measurements and laboratory analyses are
192
described in detail in Mori et al. (2017) and Debeljak et al. (2017).
193
195
2.3 Database and data pre-processing
TE D
194
M AN U
SC
RI PT
181
The data used to build the models were composed of predictors, i.e., environmental variables and response variables indicating ecological processes and microbial biomass. The
197
majority of ecologically relevant environmental parameters were included in the study as
198
predictors. Some of these parameters either exceed or fall below typical values for this region
199
and were indicated as stressors. Typical ranges for this region were obtained by including
200
measurements only from pristine (i.e., forested) locations where anthropogenic pressures,
201
such as agricultural or urban land use, pollution, and geomorphological pressures were not
202
present.
203
AC C
EP
196
Prior to analyses, manual data discretization of the measured response variables was
204
performed. For this purpose, new discrete valued attributes (i.e., “low”, “med” and “high”)
205
were used to replace the measured numeric response attributes. The discretization was 8
ACCEPTED MANUSCRIPT performed differently for each dataset (separately for each sampling depht) and for each
207
target/response variable (R, ETSA, and TPC). This was done to ensure equal representation of
208
the three classes in the dataset. Additionally to data discretization, automatic attribute
209
selection techniques included in WEKA (Witten et al., 2011) were employed in order to
210
improve the modelling accuracy. These techniques discard irrelevant or redundant attributes
211
from a given dataset. The first technique applied was Information Gain Attribute Ranking
212
(Hall and Holmes, 2003), which evaluates the worth of an attribute by measuring the
213
information gain with respect to the class. However, this method does not take into account
214
attribute interaction. Another technique used for evaluating subsets of attributes rather than
215
individual attributes is the Correlation-based Feature Selection (CFS; Hall, 1999). The CFS
216
algorithm takes into account the usefulness of individual attributes for predicting class and the
217
level of inter-correlation among them. The method values subsets that correlate highly with
218
class value and have low correlation with each other.
M AN U
SC
RI PT
206
220 221
2.4 Decision trees
TE D
219
Decision trees are hierarchical structures composed of three types of nodes (a root, internal nodes and leaves) connected by branches. The root is the starting node situated at the
223
top of the decision tree, and together with the internal nodes, contain tests on the input
224
attributes. The leaves (terminal nodes) contain the predictions of the target (class) values.
225
Decision trees are interpreted in terms of IF-THEN rules (Gal et al., 2013). In this study,
226
decision trees were built using J48 algorithm which is Java’s re-implementation of the C4.5
227
algorithm (Quinlan, 1993) incorporated into the machine-learning package WEKA (Witten et
228
al., 2011). The J48 algorithm repeatedly partitions the original dataset into subsets, as
229
homogeneously as possible (in terms of number of examples) with respect to the target
230
variable. Its most important tasks involve finding the optimal splitting values of the measured
AC C
EP
222
9
ACCEPTED MANUSCRIPT attributes and the most accurate prediction of the target. Pruning was applied to cope with
232
decision tree complexity and avoid overfitting. Pruning improves the transparency of the
233
induced trees by reducing their size, as well as enhancing the classification accuracy by
234
eliminating errors resulting from noisy data (Bratko, 1989). During tree construction, forward
235
pruning by implementing the “minimum number of instances” criterion was applied.
236
According to this criterion, every leaf should contain a minimum number of examples
237
otherwise, no branching is allowed.
RI PT
231
Decision trees learn from using a training data set. The quality of the constructed model,
SC
238
i.e., the accuracy of prediction or predictive performance, is expressed as a percentage of
240
correctly classified instances (% CCI). For the purpose of generalization and model re-
241
usability (e.g., application in other similar catchments) different validation procedures were
242
applied. First, “automatic” cross-validation (CV) was used, where the original dataset was
243
randomly partitioned into a chosen number of folds (N=10). During each turn, a fold was used
244
for testing, while the remaining 9-folds were used for training. The final error was given as
245
the average error from all the generated models. Next, “manual” validation was applied by
246
splitting the original dataset into five subsets, based on the experimental catchments. The aim
247
was to investigate whether the selection of particular catchments for training the decision tree
248
models improves their predictive performance. In turn, each data subset (samples collected
249
within the catchment) was used for testing, while the remaining data set (samples collected
250
within the remaining four catchments) was used for training the model.
AC C
EP
TE D
M AN U
239
251
3. Results
252 253 254
3.1
Hyporheic zone environmental conditions and microbial respiration and biomass
10
ACCEPTED MANUSCRIPT Spatial analysis revealed that within the study area, the proportion of agricultural and
256
urban land use in the buffer zone was up to 0.84 and 0.93, respectively, while at certain sites
257
native forest was completely absent (Table 1). With the exception of Ntot, water chemistry
258
parameters were outside of their natural ranges, while temperature and pH were within typical
259
ranges for this area (considering native forests overgrown with no anthropogenic influence).
260
Similarly, FS and FOM, both indicate clogging, and individual sediment fractions exceeded
261
reference values, while POM was below the normal range at some sites.
RI PT
255
Respiration rates (R) ranged from values close to zero to 1.2 and from 0.4 to 3 µL O2 g
SC
262
DWsed-1 h-1, at 5-15 cm and 20-40 cm, respectively (Table 2). ETSA at 5-15 cm was from 0.0
264
to 2.8 µL O2 g DWsed-1 h-1 and from 0.0 to 3.3 µL O2 g DWsed-1 h-1 at 20-40 cm. TPC ranged
265
from 20.9 to 468.9, and from 87.2 to 1,693.2 µg protein g DWsed-1 for the two depths
266
respectively. Simple regression plots show a significant increase in R, ETSA and TPC with
267
temperature at both measured depth ranges. Respiration measured at 5-15 cm showed the
268
strongest dependence on temperature (R2= 0.62) (Figure 2). A significant but weak
269
relationship was observed between catchment urban land use and R at 5-15 cm, ETSA, and
270
TPC, but not with R at 20-40 cm. The strongest dependence of ETSA on proportion of urban
271
land use was observed at 5-15 cm, and of ETSA and TPC at 20-40 cm. A significant and very
272
weak (R2<0.2) relationship was between ammonium and all three response variables (R,
273
ETSA, TPC) at both depths.
274 275 276
3.2
AC C
EP
TE D
M AN U
263
Predictive performances of decision tree models
When a stratified 10-fold cross-validation (CV) was applied to the whole data set, the
277
models were relatively highly predictive (CCI above 50%, Table 3). The explanatory power
278
of the decision trees was the highest for the model using R as response variable and data from
279
the depth 5-15 cm. The model based on data from the 20-40 cm HZ layer and R performed 11
ACCEPTED MANUSCRIPT much worse, most likely due to a lack of data. The predictive performances of models using
281
data from both depths, and ETSA and TPC were satisfactory (CCI>50%). A modest variation
282
in the models’ predictive performance was observed, using an array of models built by
283
dividing the data set into training (four catchments) and testing (one catchment) subsets
284
(Table 3).
RI PT
280
285
3.3
287
interactions
288
Response of ecological indicators and environmental factors-multiple stressors
SC
286
Decision tree models demonstrated that at 5-15 cm, temperature (at a threshold of 9.3°C) is the most important factor affecting the intensity of riverbed respiration (Fig. 3a).
290
The model also shows that below 6.2°C there is no interaction with other stressor or
291
environmental variables. At temperatures between 6.2 to 9.3°C, the presence of sulphate (>6.5
292
mg L-1) leads to low rates of R. At temperatures above 9.3°C, dissolved nitrite (NO2-),
293
potassium (K+), calcium (Ca2+) and sulphate (SO43-) are important variables determining
294
moderate to high R rates. The decision tree for the HZ layer at 20-40 cm and R were less
295
accurate (38% CCI), but still provide valuable information regarding the importance of
296
sediment composition and hydraulic conductivity for respiration (Fig. 3b).
TE D
EP
297
M AN U
289
Models using ETSA as the response variable reveal the importance of temperature in interaction with land use and water chemistry (Fig. 4a). At depths of 5-15 cm, the
299
combination of temperature (>9.3ºC) and either Ca2+ content (>62.5 mg L-1), or the
300
interaction of Ca2+ (≤62.5 mg L-1) and Ntot (>0.93 mg L-1) resulted in a high ETSA (≥0.6 µL
301
O2 g DWsed-1 h-1). Alternatively, the interaction of temperature (≤9.3ºC) and low urban land
302
use in the catchment (≤0.02) results in low ETSA (≤0.3 µL O2 g DWsed-1 h-1). At 20-40 cm,
303
the best model using ETSA shows the importance of forest within the catchment area (Fig.
304
4b). When native forests covered >0.79 of the catchment, ETSA was low (<0.2 µL O2 g
AC C
298
12
ACCEPTED MANUSCRIPT 305
DWsed-1 h-1). However, ETSA was also low when forest covered ≤0.79. This was observed
306
when temperatures were extremely low (≤5.6ºC), higher than 5.6ºC and in interaction with
307
land use, or higher than 12ºC and in interaction with land use and sulphate concentrations The model for 5-15 cm depth using TPC as response variable (Fig. 5a) shows that
308
increased ammonium (NH4+) concentrations (>0.1 mg L-1) resulted in a high microbial
310
biomass (TPC). On the other hand, NH4+ concentration <0.1 mg L-1, with DOC <4.8 mg L-1
311
and a low urban land use (<0.04) in the catchment resulted in a low TPC. The model
312
exhibited much more complex interactions between predictors when using data from 20-40
313
cm depth (Fig. 5b). For instance, when urban land use in the catchment was >0.03 and
314
proportion of forest in the 250 m buffer zone was ≤0.03, the level of microbial biomass was
315
the highest. During winter the interaction of catchment urban land use (<0.03), buffer zone
316
urban land use (<0.60) and NH4+ concentrations (>0.06 mg L-1) was linked with low level of
317
biomass. Interestingly, at lower levels of NH4+ in the presence of agricultural land in the
318
catchment (>0.15), biomass was still low. During summer, low biomass was shaped by the
319
interaction of catchment urban land use (<0.03), buffer zone urban land use (<0.60) and high
320
Ca2+ concentrations (>60.5 mg L-1).
321
323 324
4.
Discussion
AC C
322
EP
TE D
M AN U
SC
RI PT
309
This study provides new information on the ranges of hyporhec respiration rates and
325
productivity measured as microbial biomass across several pre-alpine catchments under a
326
gradient of anthropogenic pressures. The decision tree models improved understanding of the
327
causal relationship between multiple stressors and environmental factors on one side and
328
hyporheic microbial metabolism and biomass on the other side. They also provided the 13
ACCEPTED MANUSCRIPT threshold values of specific environmental factors below/above which we can expect an
330
increase or decrease of respiration, potential respiration, and microbial biomass within the
331
HZ. When developing models, we considered measured variables of different types (land use,
332
temperature, water chemistry and sediment structure). By applying attribute selection
333
techniques, all irrelevant or redundant variables having no or very little impact on the selected
334
response variable were automatically removed. The temperature, land use and water
335
chemistry including elevated concentrations of sulphate (SO42-), nitrite (NO2-), ammonium
336
(NH4+), potassium (K+), calcium (Ca2+), and/or dissolved organic carbon (DOC) were
337
recognized as the most important factors for HZ microbial respiration (R, ETSA) and biomass
338
(TPC).
M AN U
SC
RI PT
329
339 340 341
4.1 Decision tree model development and validation
A comparison of the models built by a) dividing the data set into training and testing subsets based on catchment units (“train-test” method) and b) a stratified 10-fold cross-
343
validation (“CV”), shows that the generated models performed well. Only a modest variation
344
in the predictive performances of the models was observed when using data from different
345
catchments as the testing data sets. In aquatic ecology, the most widely used approach to
346
evaluate decision tree models is to employ the 10-fold cross-validation (Dakou et al., 2007;
347
Gal et al., 2013). However, when working with larger data sets from several catchments, it is
348
important to test whether the catchment, as a random variable, affects the modelling results.
349
This study demonstrates the important effect of catchment selection on the model
350
performance and show that predictive performance can be improved by combining data from
351
different catchments to form larger data sets.
352 353
AC C
EP
TE D
342
When applying CV to the whole data set, the models that predict ETSA and TPC using data from 20-40 cm depth performed slightly better than models using data from the 5-15 cm 14
ACCEPTED MANUSCRIPT depth. This suggests that there must be some additional environmental factors or stressors
355
influencing the measured indicators at 5-15 cm depth, such as surface flow velocity, shear
356
stress, permeability and hydraulic conductivity. Hydrological parameters are important
357
drivers of HZ processes (Boulton et al., 1998) and should be included in the models in the
358
future to obtain better predictions. Despite the high complexity and heterogeneity of the data,
359
the models proved accurate for all three response variables and depths. The only exception
360
was when predicting R at a depth of 20-40 cm, where the data set was smaller due to lack of
361
data. In general, complex ecological data sets from spatially and temporally dynamic
362
environments with hierarchical organization and the catchment as a major unit, are difficult to
363
analyse using standard statistical methods where assumptions, such as homoscedasticity,
364
independent and normally distributed residuals, no multicollinearity, etc., have to be fulfilled
365
(Downes et al, 2002). On the other hand, ML tools allow for working with noisy data sets
366
from complex and dynamic domains (Gal et al., 2013).
SC
M AN U
The best models were built using R as a response variable. Respiration is a good
TE D
367
RI PT
354
functional indicator of natural dynamics (Doering et al., 2011) and anthropogenic stress (e.g.,
369
eutrophication) (Hill et al., 2000; Janssens et al., 2001). Since R directly depends on
370
temperature and nutrient availability, it represents an immediate response of microbial
371
community to actual, short-term environmental conditions (Simčič et al., 2015). The models
372
with the least predictive power are those that use TPC as a response variable. TPC indicates
373
biomass of microorganisms and extracellular polymeric substances proteins in the samples
374
and is therefore a structural indicator of long-term ecosystem condition (Franken et al., 2001).
375
The models that use ETSA were slightly better. ETSA measures overall enzymatic activity
376
(maximum reaction rate) of respiratory electron transport system at standard temperature and
377
without substrate limitation and is a reflection of environmental conditions on longer term.
378
Equilibrium of ETSA is attained after a few days in the altered environment (Simčič et al.,
AC C
EP
368
15
ACCEPTED MANUSCRIPT 2015). This study finds that for the HZ in gravel bed rivers, more accurate models can be
380
built using response variables that reflect ecosystem function (R, ETSA) rather than structure
381
(TPC). Up to now, the high potential of functional indicators for detecting anthropogenic
382
impacts has been emphasised many times (Sandin and Solimini, 2009; Palmer and Febria,
383
2012; von Schiller et al., 2017). This models can be used for modelling respiration, potential
384
respiration, and protein content within the HZ of other subalpine catchments, as long as they
385
share similar characteristics to those of the 5 experimental catchments (comparable proportion
386
of land use and similar ranges of environmental factors). Based on the measured
387
environmental parameters in any such catchment, the magnitude of respiration and microbial
388
biomass can be determined. Also, the models can be used to predict changes in magnitude of
389
respiration and biomass in a case some environmental factor changes, as long as it’s value
390
stays within the initial range of values used for building the models in this study.
M AN U
SC
RI PT
379
391
4.2 Between-stressor and stressor-ecological indicators linkages
TE D
392
This study finds that water temperature is an important predictor of functional
394
indicators (R, ETSA), but is irrelevant for predicting structural indicator (TPC) response. This
395
was shown also with simple regression plots, where relationships between the temperature
396
and response variables were significant, but strength of correlation was weaker for TPC,
397
indicating that the temperature influences microbial biomass in complex interactions with
398
other environmental factors. In general, an increasing temperature accelerates chemical
399
reactions and enhances biological processes, such as metabolic rate, microbial growth and
400
activity (Davidson and Janssens, 2006; Mora-Gomez et al., 2016). Based on previous studies
401
(Simčič and Mori, 2007; Mori et al., 2017; Debeljak et al., 2017) it is expected that
402
temperature will be an important predictor of R but not for ETSA, which reflects
AC C
EP
393
16
ACCEPTED MANUSCRIPT environmental conditions on longer term, and that the effects of temperature decreases with
404
depth (Hester et al., 2009). According to Hill et al. (2000) there is a significant relationship
405
between sediment R and temperature and several chemical variables. They emphasized the
406
different extent to which temperature influences R in water bodies. They concluded that the
407
temperature-R relationship is not simple causality, and that more complex models and further
408
research is necessary for making solid predictions. Some of the reasons for this are that the
409
temperature effects on microbial activity is resource dependent and that microbial
410
communities are functionally adapted or acclimated to in situ temperature (Hall et al., 2010).
411
An important finding is that the causal relationship between temperature and R depends on
412
temperature range. When water temperatures are extremely low, a simple temperature-R
413
relationship is observed, while at higher temperatures, complex interactions between
414
temperature and dissolved ions influences the intensity of R. This suggests that when
415
environmental temperatures are extremely low, they act as the only limiting factor. However,
416
when the water temperature is considered optimal (i.e., 10-30°C) (e.g., Mora-Gomez et al.,
417
2016), microbial community respiration rates responds also to changes in the water chemistry,
418
such as nitrite, potassium, calcium and sulphate concentrations.
SC
M AN U
TE D
In a case of sulphate, it seems that it is connected with decreased respiration at two
EP
419
RI PT
403
thresholds (i.e., 6.5 and 17.3 mg L-1), depending on temperature. At low temperatures,,
421
medium R occurs, but when sulphate concentration exceeds the threshold, R is low. Similarly,
422
at optimal water temperatures, high R occurs, but when concentrations exceed the threshold,
423
R is of medium values. This indicates the importance of temperature when looking at the
424
influence of sulphate on hyporheic respiration.
425
AC C
420
In contrast to previous studies (e.g., Mori et al., 2017, Debeljak et al., 2018), this study
426
showed the importance of temperatures also for ETSA. The models for 5-15 cm depth shows
427
that the temperature was the most important predictor for ETSA. Large temperature ranges 17
ACCEPTED MANUSCRIPT and structurally similar habitats used in this study are most probably the reason for this.
429
Similarly as for R, interactions with water chemistry parameters were important for the
430
model, but catchment urban land use was additional important predictor for ETSA at 5-15
431
cm). Land use modifies in-stream factors controlling river metabolism through increased
432
nutrient, sediment, and pollutant runoff from agricultural and urban sources (Ponsatí et al.,
433
2015). These impacts are confounded with catchment natural characteristics (climate,
434
geology, soil, vegetation type) that also alter in-stream abiotic properties (Allan, 2004). A
435
single river study indicated significant relationship between ETSA and nutrients (NO3-), but
436
not temperature influence, while the impact of land use was not studied (Simčič and Mori,
437
2007).
SC
M AN U
438
RI PT
428
Presence of forest in the catchment was the most important predictor for the model based on data the depths of 20-40 cm, which was more complex than the one for the depth of
440
5-15 cm. Together with less forest in the catchment, the temperature and sulphate were
441
important for the predicting ETSA rates. Wherever natural forest is removed from the riparian
442
zone and whole catchment, streams and rivers are usually warmer during summer, and
443
primary production usually increases due to the lack of shadow (Allan, 2004). This
444
consequently leads to increased HZ nutrient input and increased microbial activity. Here,
445
similarly to the model with R at 5-15 cm, sulphate in combination with land use supressed or
446
induced ETSA rates, depending on water temperature. These results show the importance of
447
anthropogenic sulphate inputs for river metabolism that can act either as an inhibitor or as
448
stimulator of metabolism.
449
AC C
EP
TE D
439
For models predicting microbial biomass, estimated using TPC, a combination of
450
NH4+, DOC and catchment urban land use was important for depth of 5-15 cm. Clearly,
451
nutrients, such as NH4+ or DOC can act as stimulator of TPC when exceed a certain threshold
452
or inhibitors when nutrients are below certain level, especially when in interaction with 18
ACCEPTED MANUSCRIPT presence of low proportion of urban land use in the catchment. Similar to ETSA at 20-40 cm,
454
land use when combined season, NH4+, Ca2+, were the main predictor that influences TPC at
455
20-40 cm depth. A study of Hendricks (1996) partly reflects the patterns from this study.
456
They demonstrated a significant impact of season, depth and zone (upwelling, downwelling)
457
for microbial biomass in the hyporheic zone. They also found inconsistent pattern in
458
microbial response to increased DOC that was linked with the season.
RI PT
453
459
5.
Conclusions
SC
460 461
This study contributes new knowledge about catchment-scale patterns and drivers
M AN U
462
influencing respiration and microbial biomass in the hyporheic zone. Decision trees based on
464
data from five gravel bed rivers confirmed the important role that temperature and nutrient
465
inputs from anthropogenic activities have on hyporheic processes and structure, and provided
466
new information on the importance of land use, and the interactions between stressors and
467
environmental factors on microbial activity and biomass. As demonstrated, the selection of
468
measure (either functional or structural) and sampling depth is important for explaining the
469
causal relationship between environment and biological responses. In general, temperature,
470
alone or in interaction with other stressors indicating point or diffuse pollution, is one the
471
most important predictors of functional measures (respiration and respiratory electron
472
transport system activity). When looking at structural measure (i.e., microbial biomass), the
473
catchment land use and the nutrients are critical.
474
AC C
EP
TE D
463
A highly relevant finding is that for the study area, an individual stressor, such as sulphate
475
or nitrite, can act as a stimulator or inhibitor of biological processes when exceeding certain
476
threshold values. Moreover, when combined with other environmental variables and stressors,
477
it can have the same impact even if it falls below the defined threshold value. However, larger 19
ACCEPTED MANUSCRIPT data sets over larger geographical ranges are needed to confirm these patterns. These findings
479
demonstrate that it is important to consider interactions between stressors, when developing
480
management plans for freshwater ecosystems and that climate change will have more
481
pronounced effect on ecosystem functioning then structure by accelerating biological
482
processes due to increased temperatures.
RI PT
478
483 484
6.
Acknowledgements
The study was funded by the Slovenian Research Agency (ARRS) (project L2-6778;
SC
485
program P1-0255 and programme for young researchers) and partly by the European
487
Communities 7th Framework Program Funding under Grant agreement no. 603629-ENV-
488
2013-6.2.1-Globaqua. We thank to Bor Kranjc, Žiga Ogorevc, Maja Opalički Slabe, Andrej
489
Peternel, Tomaž Jagar and Allen Wei Liu for help during field work, to Andreja Jerebic and
490
Maryline Pflieger for chemical analyses, to Rok Ciglič for land use analyses and David
491
Kocman for graphical support.
492
493
7.
494
Allan, J.D., 2004. Landscapes and riverscapes: the influence of land use on stream
AC C
EP
References
TE D
M AN U
486
495
ecosystems. Annu. Rev. Ecol. Evol. Syst. 35, 257–824.
496
https://doi.org/10.1146/annurev.ecolsys.35.120202.110122.
497
Aristi, I., von Schiller, D., Arroita, M., Barceló, D., Ponsatí, L., García-Galán, M. J., Sabater,
498
S., Elosegi, A., Acuña, V. 2015. Mixed effects of effluents from a wastewater treatment
499
plant on river ecosystem metabolism: subsidy or stress? Freshwat. Biol. 60, 1398-1410.
500
https://doi.org/10.1111/fwb.12576. 20
ACCEPTED MANUSCRIPT 501
Battin, T. J., Luyssaert, S., Kaplan, L. A., Aufdenkampe, A. K., Richter, A., Tranvik, L. J.
502
(2009). The boundless carbon cycle. Nature Geoscience, 2, 598-600.
503
doi:10.1038/ngeo618
504
Boets, P., Lock, K., Goethals, P.L.M., 2013. Modelling habitat preference, abundance and species richness of alien macrocrustaceans in surface waters in Flanders (Belgium)
506
using decision trees. Ecol. Inform. 17, 78-81.
507
https://doi.org/10.1016/j.ecoinf.2012.06.001.
509
Bou, C., Rouch, R., 1967. Un nouveau champ de recherches sur la faune aquatiqu souterraine.
SC
508
RI PT
505
– C. R. Acad. Sci. Paris 265: 369 – 370.
Boulton, A. J., Findlay, S., Marmonier, P., Stanley, E. H., Valett, H. M., 1998. The functional
511
significance of the hyporheic zone in streams and rivers. Ann Rev Ecol Syst, 29, 59-81.
512
10.1146/annurev.ecolsys.29.1.59
513
M AN U
510
Bratko, I., 1989. Machine learning. In: Gilhooly, K.J. (Ed.), Human and machine problem solving. Pelnum Press, New York and London, pp. 265-287.
515
https://doi.org/10.1007/978-1-4684-8015-3.
516
TE D
514
Dakou, E., D'heygere, T., Dedecker, A., Goethals, P., Lazaridou-Dimitriadou, M., De Pauw, N., 2007. Decision tree models for prediction of macroinvertebrate taxa in the River
518
Axios (Northern Greece). Aquat. Ecol. 41, 399-411. https://doi.org/10.1007/s10452-
519
006-9058-y.
521 522 523 524
AC C
520
EP
517
Davidson, E.A., Janssens, I.A., 2006. Temperature sensitivity of soil carbon decomposition and feedbacks to climate change. Nature 440, 165-173. https://doi.org/10.1038/nature04514. Debeljak, B., Simčič, T., Ciglič, R., Pflieger, M., Mori, N., 2017. Spatio-temporal variation in microbial respiration in the shallow hyporheic zone of pre-Alpine rivers related to
21
ACCEPTED MANUSCRIPT 525
catchment land use. Fundam. Appl. Limnol. 190, 265-277.
526
https://doi.org/10.1127/fal/2017/0962.
527
Doering, M., Uehlinger, U., Ackermann, T., Woodtli, M., Tockner, K., 2011. Spatiotemporal heterogeneity of soil and sediment respiration in a river-floodplain mosaic
529
(Tagliamento, NE Italy). Freshwat. Biol. 56, 1297–1311.
530
https://doi.org/10.1111/j.1365-2427.2011.02569.x.
531
RI PT
528
Downes, B.J., Barmuta, L.A., Fairweather, P.G., Faith, D.P., Keough, M.J., Lake, P.S.,
Mapstone, B.D., Quinn, G.P., 2002. Monitoring Ecological Impacts. Concepts and
533
Practice in Flowing Waters. New York: Cambridge University Press. 434 pp.
534
Džeroski, S., Grbovic, J., Demsar, D., 2000. Predicting chemical parameters of river water
M AN U
SC
532
535
quality from bioindicator data. Appl. Intell. 13, 717.
536
https://doi.org/10.1023/A:1008323212047.
537
European Union. 2000. Directive 2000/60/EC of the European Parliament and of the Council of October 2000 establishing a framework for Communities in the field of water
539
policy, Official Journal of the European Communities, L 327/1, 22.12.2000.
540
TE D
538
Feld, C.K., Segurado, P., Gutierrez-Canovas, C., 2016. Analysing the impact of multiple stressors in aquatic biomonitoring data: A cookbook with applications in R. Sci. Total
542
Environ. 573, 1320-1339. https://doi.org/10.1016/j.scitotenv.2016.06.243.
EP
541
Ferreira, V., and Chauvet, E., 2011. Synergistic effects of water temperature and dissolved
544
nutrients on litter decomposition and associated fungi. Glob. Chang. Biol. 17, 551-
545 546
AC C
543
564. https://doi.org/10.1111/j.1365-2486.2010.02185.x.
Franken, R.J.M., Storey, R.G., Williams, D.D., 2001. Biological, chemical and physical
547
characteristics of downwelling and upwelling zones in the hyporheic zone of a north-
548
temperate stream. Hydrobiologia, 444: 183–195.
549
https://doi.org/10.1023/A:1017598005228. 22
ACCEPTED MANUSCRIPT 550
Gal, G., Škerjanec, M., Atanasova, N., 2013. Fluctuations in water level and the dynamics of
551
zooplankton: a data-driven modelling approach. Freshwat. Biol. 58, 800–816.
552
https://doi.org/10.1111/fwb.12087.
554 555
Hall, M.A., 1999. Correlation based feature subset selection for machine learning. PhD Thesis, University of Waikato, Hamilton, New Zealand, 198 p.
RI PT
553
Hall, M.A., Holmes, G., 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 1437-1447.
557
http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1245283.
558
SC
556
Hall, E.K., Singer , G.A., Kainz, M.J., Lennon, J.T. 2010. Evidence for a temperature acclimation mechanism in bacteria: an empirical test of a membrane-mediated trade-
560
off. Funct Ecol. 24, 898-908. 10.1111/j.1365-2435.2010.01707.x.
561
M AN U
559
Hadwen, W.L., Fellows, C.S., Westhorpe, D.P., Rees, G.N., Mitrovic, S.M., Taylor, B., Baldwin, D.S., Silvester, E., Croome, R., 2010. Longitudinal trends in river
563
functioning: Patterns of nutrient and carbon processing in three Australian rivers.
564
River Res. Appl. 26, 1129-1152. https://doi.org/10.1002/rra.1321.
565
TE D
562
Hill, B.H., Hall, R.K., Husby, P., Herlihy, A.T., Dunne, M., 2000. Interregional comparisons of sediment microbial respiration in streams. Freshwat. Biol. 44, 213-222.
567
https://doi.org/10.1046/j.1365-2427.2000.00555.x.
569 570
Hendricks, S.P. 1996. Bacterial biomass, activity, and production within the hyporheic zone
AC C
568
EP
566
of a north-temperate stream. Arch. Hydrobiol. 136, 467-487.
Hester E.T., Doyle M.W., Poole G.C., 2009. The influence of in‐stream structures on summer
571
water temperatures via induced hyporheic exchange. Limnol. Oceanogr. Methods 54:
572
355–367. https://doi.org/10.4319/lo.2009.54.1.0355.
23
ACCEPTED MANUSCRIPT 573
Jackson, M.C., Loewen, C.J.G., Vinebrooke, R.D., Chimimba, C.T., 2016. Net effects of
574
multiple stressors in freshwater ecosystems: a meta-analysis. Glob. Chang. Biol. 22,
575
180-189. https://doi.org/10.1111/gcb.13028. Janssens, I.A., Lankreijer, H., Matteucci, G., Kowalski, A.S., Buchmann, N., Epron, D.,
577
Pilegaard, K., Kutsch, W., Longdoz, B., Grünwald, T., Montagnani, L., Dore, S.,
578
Rebmann, C., Moors, E. J., Grelle, A., Rannik, Ü., Morgenstern, K., Oltchev, S.,
579
Clement, R., Guðmundsson, J., Minerbi, S., Berbigier, P., Ibrom, A., Moncrieff, J.,
580
Aubinet, M., Bernhofer, C., Jensen, N.O., Vesala, T., Granier, A., Schulze, E. D.,
581
Lindroth, A., Dolman, A.J., Jarvis, P.G., Ceulemans, R., Valentini, R., 2001.
582
Productivity overshadows temperature in determining soil and ecosystem respiration
583
across European forests. Glob. Chang. Biol. 7, 269-278.
584
https://doi.org/10.1046/j.1365-2486.2001.00412.x.
SC
M AN U
585
RI PT
576
Kaandorp, V.P., Molina-Navarro, E., Andersen, H.E., Bloomfield, J.P., Kuijper, M.J.M., de Louw, P.G.B., 2018. A conceptual model for the analysis of multi-stressors in linked
587
groundwater-surface water systems. Sci. Total Environ. 627, 880-895.
588
https://doi.org/10.1016/j.scitotenv.2018.01.259.
TE D
586
Komac M., 2005. Statistics of the Geological map of Slovenia at scale 1:250.000.
590
Krause, S., Lewandowski, J., Grimm, N.B., Hannah, D.M., Pinay, G., McDonald, K., Martí,
592 593 594 595 596
E., Argerich, A., Pfister, L., Klaus, J., Battin, T., Larned, S.T., Schelker, J.,
AC C
591
EP
589
Fleckenstein, J., Schmidt, C., Rivett, M.O., Watts, G., Sabater, F., Sorolla, A., Turk, V., 2017. Ecohydrological interfaces as hot spots of ecosystem processes. Water Resour. Res. 53, 6359-6376. https://doi.org/10.1002/2016WR019516.
Lowry, O.H., Rosebrough, N.J., Farr, A.L., Randall, R.J., 1951. Protein measurement with the Folin phenol reagent. J. Biol. Chem. 193, 265-275.
24
ACCEPTED MANUSCRIPT
598 599 600 601
Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 1, 81-106. https://doi.org/10.1007/BF00116251. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. San Francisco, CA, USA, Morgan Kaufmann Publishers, Inc. Matthaei, C.D., Piggott, J.J., Townsend, C.R., 2010. Multiple stressors in agricultural streams:
RI PT
597
602
interactions among sediment addition, nutrient enrichment and water abstraction. J.
603
Appl. Ecol. 47, 639-649. https://doi.org/10.1111/j.1365-2664.2010.01809.x.
Mora-Gómez, J., Freixa, A., Perujo, N., Barral-Fraga, L., 2016. Limits of the Biofilm Concept
SC
604
and Types of Aquatic Biofilms. In: Romaní, A.M., Guasch, H., Balaguer, M.D. (eds.)
606
Aquatic Biofilms: Ecology, Water Quality and Wastewater Treatment. Norfolk, UK:
607
Caister Academic Press, pp. 3-28.
608
M AN U
605
Mori, N., Simčič, T., Brancelj, A., Robinson, C.T., Doering, M., 2017. Spatio-temporal heterogeneity of actual and potential respiration in two contrasting floodplains.
610
Hydrol. Process. 31, 2622–2636. https://doi.org/10.1002/hyp.11211.
611
TE D
609
Nogaro, G., Datry, T., Mermillod-Blondin, F., Foulquier, A., Montuelle, B., 2013. Influence of hyporheic zone characteristics on the structure and activity of microbial
613
assemblages. Freshwat. Biol. 58, 2567-2583. https://doi.org/10.1111/fwb.12233.
615 616 617 618 619 620 621
Nõges, P., Argillier, C., Borja, Ã., Garmendia, J. M., Hanganu, J., Kodeš, V., Pletterbauer, F., Sagouis, A., Birk, S., 2017. Quantified biotic and abiotic responses to multiple stress
AC C
614
EP
612
in freshwater, marine and ground waters. Sci. Total Environ. 540, 43-52. https://doi.org/10.1016/j.scitotenv.2015.06.045.
Orghidan, T., 1959. Ein neuer Lebensraum des Unterirdischen Wassers der hyporheischen Biotope. Arch. Hydrobiol. 55, 392-414. Packard, T.T., 1971. The measurement of respiratory electron transport activity in marine phytoplankton. J. Mar. Res. 29, 235 – 244. 25
ACCEPTED MANUSCRIPT 622 623 624
Palmer, M.A., Febria, C.M., 2012. The heartbeat of ecosystems. Science, 336, 1393–1394. https://doi.org/10.1126/science.1223250. Palmer, M.A., Bernhardt, E.S., Allan, J.D., Lake, P.S., Alexander, G., Brooks, S., Carr, J., Clayton, S., Dahm, C.N., Follstad Shah, J., 2005. Standards for ecologically successful
626
river restoration. J. Appl. Ecol. 42, 208–217. https://doi.org/10.1111/j.1365-
627
2664.2005.01004.x.
628
RI PT
625
Pavlin, M., Birk, S., Hering D., Urbanič G., 2011. The role of land use, nutrients, and other stressors in shaping benthic invertebrate assemblages in Slovenian rivers.
630
Hydrobiologia. 678, 137-153. https://doi.org/10.1007/s10750-011-0836-8 Piggott, J.J., Townsend, C.R., Matthaei, C.D., 2015. Reconceptualizing synergism and
M AN U
631
SC
629
632
antagonism among multiple stressors. Ecol. Evol. 5, 1538–1547.
633
https://doi.org/10.1002/ece3.1465.
Ponsatí, L., Corcoll, N., Petrović, M., Picó, Y., Ginebreda, A., Tornés, E., Guasch, H.,
635
Barceló, D., Sabater, S., 2016. Multiple-stressor effects on river biofilms under
636
different hydrological conditions. Freshwat. Biol. 61, 2102–2115.
637
https://doi.org/10.1111/fwb.12764.
Rosa, J., Ferreira, V., Canhoto, C., Graça, M.A.S., 2013. Combined effects of water
EP
638
TE D
634
temperature and nutrients concentration on periphyton respiration – implications of
640
global change. Int. Rev. Hydrobiol. 98, 14–23. https://doi.org/10.1002/iroh.20120151.
641 642 643 644
AC C
639
Sandin, L., Solimini, A.G., 2009. Freshwater ecosystem structure—function relationships: from theory to application. Freshwat. Biol. 54, 2017–2024. https://doi.org/10.1111/j.1365-2427.2009.02313.x. Simčič, T., Mori, N., 2007. Intensity of mineralization in the hyporheic zone of the prealpine
645
river Bača (West Slovenia). Hydrobiol. 586, 221–234. https://doi.org/10.1007/s10750-
646
007-0621-x. 26
ACCEPTED MANUSCRIPT 647
Simčič, T., Mori, N., Hossli, C., Robinson, C.T., Doering, M., 2015. The response in
648
floodplain respiration of an Alpine river to experimental inundation under different
649
temperature regimes. Hydrol. Process. 29, 5438–5450.
650
https://doi.org/10.1002/hyp.10584. Storey, R.G., Howard, K.W.F., Williams, D.D., 2003. Factors controlling riffle-scale
RI PT
651
hyporheic exchange flows and their seasonal changes in a gaining stream: A three-
653
dimensional groundwater flow model. Water Resour. Res. 39, 1084-2000. doi:
654
10.1029/2002WR001367.
656 657
Uehlinger, U., Naegeli, M., Fisher, S.G., 2002. A heterotrophic desert stream? The role of sediment stability. West. N. Am. Nat. 62,466 – 473.
M AN U
655
SC
652
Vinebrooke, R.D., Cottingham, K.L., Norberg, J., Scheffer, M., Dodson, S.I., Maberly, S.C., Sommer, U., 2004. Impacts of multiple stressors on biodiversity and ecosystem
659
functioning: the role of species co-tolerance. Oikos, 104, 451-457.
660
https://doi.org/10.1111/j.0030-1299.2004.13255.x.
661
TE D
658
Volf, G., Atanasova, N., Kompare, B., Precali, R., Ožanić, N., 2011. Descriptive and prediction models of phytoplankton in the northern Adriatic. Ecol. Model. 222, 2502-
663
2511. https://doi.org/10.1016/j.ecolmodel.2011.02.013.
664
EP
662
von Schiller, D., Acuña, V., Aristi, I., Arroita, M., Basaguren, A., Bellin, A., Boyero, L., Butturini, A., Ginebreda, A., Kalogianni, E., Larrañaga, A., Majone, B., Martínez, A.,
666
Monroy, S., Muñoz, I., Paunović, M., Pereda, O., Petrovic, M., Pozo, J., Rodríguez-
667
Mozaz, S., Rivas, D., Sabater, S., Sabater, F., Skoulikidis, N., Solagaistua, L., Vardakas,
668
L., Elosegi, A., 2017. River ecosystem processes: A synthesis of approaches, criteria of
669
use and sensitivity to environmental stressors. Sci. Total Environ. 596, 465-480.
670
https://doi.org/10.1016/j.scitotenv.2017.04.081.
AC C
665
27
ACCEPTED MANUSCRIPT 671
Witten, I. H., Frank, E., Hall, M. A., 2011. Data Mining: Practical Machine Learning Tools
672
and Techniques. Burlington, MA, USA, Morgan Kaufmann Publishers.
673
https://doi.org/10.1016/B978-0-12-374856-0.00018-3.
674
RI PT
675 676 677
SC
678 679
M AN U
680 681 682 683
687 688 689 690 691
EP
686
AC C
685
TE D
684
692 693 694 695 28
ACCEPTED MANUSCRIPT FIGURES
697
Figure 1. Map of the study area indicating sampling sites and land use in five studied
698
catchments (Gradaščica, Kamniška Bistrica, Kokra, Tržiška Bistrica, Selška Sora).
699
Figure 2. Relationships between temperature, proportion of urban land use and ammonium
700
concentration and response variables (R, ETSA, TPC) at depth of 5-15 cm (left) and depth of
701
20-40 cm (right).
702
Figure 3. Decision trees with respiration (R) as response variable for a) hyporheic zone at
703
depth of 5-15 cm, and b) hyporheic zone at depth of 20-40 cm. The values in the leaves
704
indicate correctly/incorrecltly classified instances.
705
Figure 4. Decision trees with respiratory potential (ETSA) as response variable for a)
706
hyporheic zone at depth of 5-15 cm, and b) hyporheic zone at depth of 20-40 cm. The values
707
in the leaves indicate correctly/incorrectly classified instances.
708
Figure 5. Decision trees with total protein content (TPC) as response variable a) hyporheic
709
zone at depth of 5-15 cm, and b) hyporheic zone at depth of 20-40 cm. The values in the
710
leaves indicate correctly/incorrectly classified instances.
AC C
EP
TE D
M AN U
SC
RI PT
696
29
ACCEPTED MANUSCRIPT Table 1. List of predictive variables (attributes) with abbreviations, units and ranges included in modelling. Stressors, i.e., variables exceeding or being below natural ranges are marked with +. Units
F_cat F_buf A_cat A_buf U_cat U_buf O_cat O_buf
proportion proportion proportion proportion proportion proportion proportion proportion
0.54 0.00 0.08 0.00 0.02 0.00 0.02 0.04
-
0.82 0.90 0.32 0.84 0.08 0.93 0.16 0.34
°C µS cm-1
3.5 186 5.15 2157 0.6 0.4 0.0 0.2 0.0 0.1 0.1 0.0 0.0 0.6 0.0 0.0 0.0 0.1 0.0 4.6 1.39 0.0 0.0 24.3 0.2 0.1
-
22.3 1050 8.6 5922 13.2 6.6 0.25 51.1 0.3 11.5 41.2 1.7 12.0 12.8 2.1 177.5 66.0 53.0 3.2 74.6 158 478.5 620.8 969.6 227.2 34.0
+ + + + + + + + + + + + + + + + +
M AN U
+
Temp Cond pH Alk Oxy Ntot Ptot DOC NO2NO3SO42NH4+ Na+ ClK+ Ca2+ Mg2+ FS FOM POM PumpT GS4-5 GS2-4 GS0.2-2 GS0.06-0.2 GS<0.06
EP
AC C
+ + + + +
SC
+ + + + + + +
Ranges
RI PT
Abbreviations Seas
TE D
PREDICTIVE VARIABLES Season (summer, winter, spring) Catchment scale variables Forest land use (catchment scale) Forest land use (250 m buffer zone) Agricultural land use (catchment scale) Agricultural land use (250 m buffer zone) Urban land use (catchment scale) Urban land use (250 m buffer zone) Other land use (catchment scale) Other land use (250 m buffer zone) Reach scale variables Water temperature Water conductivity pH Alkalinity Oxygen concentrations Total nitrogen in water Total phosphorus in water Dissolved organic carbon Nitrite Nitrate Sulphate Ammonium Sodium Chloride Potassium Calcium Magnesium Fine suspended sediment Fine organic matter Particulate organic matter Pumping time (only for 20-40 cm depth) Sediments of grain size 4-5 mm Sediments of grain size 2-4 mm Sediments of grain size 0.2-2 mm Sediments of grain size 0.063-0.2 mm Sediments of grain size <0.063 mm
mEq L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 g DW L-1 g AFDM L-1 g AFDM kg DWsed-1 s (10 L)-1 g DW kg DWsed-1 g DW kg DWsed-1 g DW kg DWsed-1 g DW kg DWsed-1 g DW kg DWsed-1
ACCEPTED MANUSCRIPT Table 2. List of response variables (attributes) with units and ranges included in modelling. Variables were measured over three seasons (summer, winter, spring) and at two HZ depths. R –respiration at in situ temperatures; ETSA – respiratory potential at standard temperature (15°C); TPC – total protein content. Units
R (5-15 cm) R (20-40 cm) ETSA (5-15 cm) ETSA (20-40 cm) TPC (5-15 cm) TPC (20-40 cm)
µL O2 g DWsed-1 h-1 µL O2 g DWsed-1 h-1 µL O2 g DWsed-1 h-1 µL O2 g DWsed-1 h-1 µg protein g DWsed-1 µg protein g DWsed-1
low values max 0.1 1.6 0.3 0.2 123.7 206.0
min 0.1 1.6 0.3 0.2 123.7 206.0
max 0.6 2.2 0.6 0.3 167.1 285.8
AC C
EP
TE D
M AN U
SC
min 0.0 0.4 0.0 0.0 20.9 87.2
medium values
high values min 0.6 2.2 0.6 0.3 167.1 285.8
max 1.2 3.0 2.8 3.3 468.9 1693.2
RI PT
RESPONSE VARIABLES
ACCEPTED MANUSCRIPT Table 3. Results of model validation by applying two different techniques, “division to training and testing data” and cross validation (CV) for three response variables measured at two depths. Numbers indicate % of correctly classified instances (CCI). Catchment names in first column indicate which part of the data set was used for testing the constructed decision tree models. The models for R at depth 20-40 were validated only on the whole data set, due
59 67 68 62 65 65
67 76 83 69 89 82
0.44 0.63 0.73 0.52 0.80 0.74
49 42 67 49 56 57
M AN U
82 79 80 87 78 86
AC C
EP
TE D
Respiration - R Gradaščica Kamniška Bistrica Kokra Sora Tržiška Bistrica All data Respiratory potential - ETSA Gradaščica Kamniška Bistrica Kokra Sora Tržiška Bistrica All data Total protein content - TPC Gradaščica Kamniška Bistrica Kokra Sora Tržiška Bistrica All data
56 60 59 57 57 59
20-40 cm depth test/CV Cohen’s CCI kappa
SC
5-15 cm depth train test/CV train Cohen’s CCI CCI CCI kappa
RI PT
to limited number of instances (N=18).
53 56 50 44 61 50
69
38
0.07
0.24 0.17 0.52 0.27 0.23 0.33
76 66 74 63 65 68
47 43 61 44 44 60
0.17 0.15 0.40 0.18 0.17 0.39
0.27 0.32 0.28 0.18 0.33 0.25
65 66 61 66 70 70
34 60 22 32 29 55
0 0.25 -0.18 -0.21 0 0.33
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Highlights Multiple stressors effects on hyporheic zone were studied using machine learning.
•
Biological response in hyporheic zone was well predicted by decision tree models.
•
Models with respiration as response variable had the highest predictive performance.
•
Temperature, land use and water quality jointly defined hyporheic zone response.
•
Models provided new knowledge on interactions among stressors.
AC C
EP
TE D
M AN U
SC
RI PT
•
ACCEPTED MANUSCRIPT
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
AC C
EP
TE D
M AN U
SC
RI PT
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: