Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees

Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees

Accepted Manuscript Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees Nataša...

NAN Sizes 2 Downloads 44 Views

Accepted Manuscript Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees Nataša Mori, Barbara Debeljak, Mateja Škerjanec, Tatjana Simčič, Tjaša Kanduč, Anton Brancelj PII:

S0043-1354(18)30910-2

DOI:

https://doi.org/10.1016/j.watres.2018.10.093

Reference:

WR 14204

To appear in:

Water Research

Received Date: 29 May 2018 Revised Date:

26 October 2018

Accepted Date: 30 October 2018

Please cite this article as: Mori, N., Debeljak, B., Škerjanec, M., Simčič, T., Kanduč, T., Brancelj, A., Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees, Water Research, https://doi.org/10.1016/j.watres.2018.10.093. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Modelling the effects of multiple stressors on respiration and microbial biomass in the hyporheic zone using decision trees

RI PT

Nataša Mori(1)*, Barbara Debeljak(1), Mateja Škerjanec(2), Tatjana Simčič (1), Tjaša Kanduč(3), and Anton Brancelj(1,4) (1)

SC

National Institute of Biology, Department of Organisms and Ecosystems Research, Večna

pot 111, 1000 Ljubljana, Slovenia (2)

M AN U

University of Ljubljana, Faculty of Civil and Geodetic Engineering, Jamova 2, Ljubljana,

Slovenia (3)

Jožef Stefan Institute, Department of Environmental Sciences, Jamova 39, 1000 Ljubljana,

Slovenia (4)

EP

Gorica, Slovenia

TE D

University of Nova Gorica, School for Environmental Sciences, Vipavska 13, 5000 Nova

AC C

*corresponding author: [email protected]

1

ACCEPTED MANUSCRIPT Abstract

2

Integrity of freshwater surface- and groundwater ecosystems and their ecological and

3

qualitative status greatly depends on ecological processes taking place in streambed sediments

4

overgrown by biofilm, in the hyporheic zone (HZ). Little is known about the interactions and

5

effects of multiple stressors on biologically driven processes in the HZ. In this study, machine

6

learning (ML) tools were used to provide evidence-based information on how stressors and

7

ecologically important environmental factors interact and drive ecological processes and

8

microbial biomass. The ML technique of decision trees using the J48 algorithm was applied

9

to build models from a data set of 342 samples collected over three seasons at 24 sites within

10

the catchments of five gravel-bed rivers in north-central Slovenia. Catchment-scale land use

11

data and reach-scale environmental features indicating the HZ morphology and physical and

12

chemical characteristics of water were used as predictive variables, while respiration (R) and

13

microbial respiratory electron transport system activity (ETSA) were used as response

14

variables indicating ecological processes and total protein content (TPC) indicating microbial

15

biomass. Separate models were built for two HZ depths: 5-15 cm and 20-40 cm. The models

16

with R as a response variable have the highest predictive performance (67-89%) showing that

17

R is a good indicator of complex environmental gradients. The ETSA and TPC models were

18

less accurate (42-67%) but still provide valuable ecological information. The best model show

19

that temperature when combined with selected water quality elements is an important

20

predictor of R at depth of 5-15 cm. The ETSA and TPC models show the combined effects of

21

temperature, catchment land use and selected water quality elements on both response

22

variables. Overall, this study provides new knowledge on how ecological processes occurring

23

in the HZ respond to catchment and reach-scale variables, and provides evidence-based

24

information about complex interactions between temperature, catchment land use and water

25

quality. These interactions are highly dependent on the selection of the response variable, i.e.,

26

each response variable is influenced by a specific combination of predictive environmental

27

variables.

28

Key words: machine learning, ecosystem processes, water quality, stressors, freshwater

29

biofilm, hyporheic zone

AC C

EP

TE D

M AN U

SC

RI PT

1

30

1

ACCEPTED MANUSCRIPT 31

1. Introduction Freshwaters are affected by a diversified array of anthropogenic stressors that often

32

interact and are, at the same time, directly or indirectly linked to biological responses.

34

Interacting stressors can have combined effects the same (additive), greater (synergistic) or

35

smaller (antagonistic) than the sum of their individual effects (Piggott et al., 2015).

36

Understanding the effects of single stressors, multiple stressors and their interactions is a

37

precondition for effective catchment management and river rehabilitation or restoration

38

(Palmer et al., 2005; Pavlin et al., 2011). Moreover, it is critical to assess the response of

39

ecosystem processes to stressors if their effects on the ecosystem services - that provide

40

benefits to humans – are to be properly understood (von Schiller et al., 2017). Consequently,

41

there is an increasing number of experimental studies examining the relationships between

42

stressors and their effects on aquatic ecosystem functioning (Matthaei et al., 2010; Ferreira

43

and Chauvet, 2011; Ponsatí et al., 2016), Moreover, several conceptual models and analytical

44

frameworks looking at the impact of multiple stressors on surface and groundwater

45

ecosystems have been developed (Jackson et al., 2016; Feld et al., 2016; Kaandorp et al.,

46

2018).

SC

M AN U

TE D

Stressors affect both the structure and function of aquatic ecosystems (Vinebrooke et

EP

47

RI PT

33

al., 2004). Ecosystem structure is characterized by physical features such as river channel

49

morphology, water quality, and biomass or the composition of biological communities,

50

whereas ecosystem functioning describes those processes that regulate energy and matter flux

51

in an ecosystem (Sandin and Solimini, 2009). Biologically driven ecosystem processes

52

include organic matter decomposition, nutrient cycling, metabolism, and pollutant and

53

community dynamics (von Schiller et al., 2017). Although the EU Water Framework

54

Directive (EU, 2000) defines ecological status as “an expression of the quality of the structure

55

and functioning of aquatic ecosystems associated with surface waters”, there remains a greater

AC C

48

2

ACCEPTED MANUSCRIPT scientific understanding of the structural characteristics of freshwater ecosystems. There has

57

been much less progress made towards developing and standardizing methods that measure

58

ecosystem functioning or that incorporate them into a river ecological status assessment

59

(Palmer and Febria, 2012). One of the advantages of functional measures or indicators is that

60

they can be translated directly into the concept of ecosystem services (von Schiller et al.,

61

2017).

62

RI PT

56

Many important ecosystem processes in running waters takes place in the hyporheic zone (HZ), i.e., the transition zone between surface and groundwater (Orghidan, 1959; Krause

64

et al., 2017). Here, active regions of carbon flux occur together with the mixing of ground-

65

and surface water (Boulton et al., 1998; Battin et al., 2009). This hydrologically highly

66

dynamic and biologically sensitive system is a suitable environment for implementing

67

functional indicator framework to identify impacts, new stressors in the catchments, and a

68

reduction in ecosystem services, such as reduced self-purification processes. Sediment

69

respiration (oxygen consumption, R), is frequently used to measure the response of the HZ

70

ecosystem function to natural conditions and anthropogenic pressures (Hill et al., 2000;

71

Hadwen et al., 2010; Doering et al., 2011). Other functional parameters include microbial

72

metabolic activity assays, such as fluorescein diacetate hydrolysis (FDA), potential

73

denitrification enzyme activity (DEA), substrate-induced respiration (SIR), extracellular

74

phosphatase activity (EPA), and respiratory electron transport system activity (ETSA)

75

(Simčič and Mori 2007; Aristi et al., 2015; Debeljak et al., 2016; Ponsati et al., 2016).

76

Various studies have shown the sensitivity of these parameters to temperature (Hill et al.,

77

2000; Doering et al., 2011), nutrients (Hill et al., 2000), pollution gradient (Aristi et al., 2015),

78

hydro-morphological features (Simčič and Mori, 2007; Nogaro et al., 2013), land use

79

(Debeljak et al., 2017) and newly emerging pollutants (Ponsati et al., 2016). Since multiple

80

stressors and naturally fluctuating environmental factors simultaneously influence most

AC C

EP

TE D

M AN U

SC

63

3

ACCEPTED MANUSCRIPT ecosystems, their consequences are often unpredictable, especially when based on single

82

stressor (Mathaei et al., 2010). To date, only a few studies have investigated the response of

83

functional variables on the combined effects of selected stressors, such as temperature and

84

nutrients (Ferreira and Chauvet, 2011; Rosa et al., 2013), or nutrient enrichment, the amount

85

of fine sediment, and water abstraction (Matthaei et al., 2010). In a recent review, Nõges et al.

86

(2017) concluded that despite an extensive basic knowledge of aquatic ecology, only a few

87

studies actually provide measurable evidence on multi-stress effects, and most models either

88

represent a single water body or are based on a single survey.

SC

RI PT

81

Nowadays, new tools and approaches of data processing exist that can help to explain

90

some of the phenomena discussed above. For example, machine-learning (ML) methods have

91

the ability to discover complex patterns in data sets and allow in-depth analyses (Gal et al.,

92

2013). Decision tree induction (Quinlan, 1986) is a ML approach that allows the user to apply

93

recursive data-partitioning techniques to construct automatically a model (decision tree) for

94

predicting variables with nominal values. The advantage of decision tree methods, compared

95

to statistical methods, is that they are nonparametric methods, i.e., do make no assumption

96

about the distribution of the dependent variable, and have a higher interpretative power than

97

majority of the statistical methods (Gal et al., 2013). Decision trees have been successfully

98

applied in predicting chemical parameters of river water quality from bioindicator data

99

(Džeroski et al., 2000), predicting stream invertebrates and algal blooms (Dakou et al., 2007;

100

Volf et al., 2011), modelling lake zooplankton dynamics (Gal et al., 2013), and analysing the

101

impacts of exotic species on ecosystems (Boets et al., 2013).

AC C

EP

TE D

M AN U

89

102

In this study, the interaction between catchment- and reach-scale environmental

103

factors, some of which exceed their natural ranges and hence act as stressors, and their impact

104

on selected ecological indicators in the HZ were investigated by induction of decision trees.

105

Three types of ecological indicators were used as response variables: respiration (oxygen 4

ACCEPTED MANUSCRIPT consumption measured in situ; R), respiratory potential measured as respiratory electron

107

transport system activity at 15°C (ETSA), and total protein content (TPC) as a proxy for

108

microbial biomass. These indicators are relatively simple to measure, have been well studied

109

in aquatic ecosystems, and are sensitive indicators of ecosystem stress (e.g., Hill et al., 2000;

110

Franken et al., 2001; Simčič et al., 2015; Debeljak et al., 2015). To encompass spatial and

111

temporal variability, data was collected over three seasons (spring, summer, winter), five

112

catchments and at two HZ depths. The objectives of this paper are to: a) evaluate the

113

sensitivity of ecological indicators to multiple stressors occurring together with naturally

114

fluctuating environmental factors, and b) identify the interactions between multiple stressors

115

and environmental factors to find the best predictors of the measured ecological indicators,

116

both by using ML tools

M AN U

SC

RI PT

106

117

2.

119

2.1 Study area

120

Methods

TE D

118

This study was carried out in five pre-Alpine catchments (Gradaščica, Kamniška Bistrica, Kokra, Selška Sora, and Tržiška Bistrica) located in north-central Slovenia in SE

122

Europe (Fig. 1). The catchment areas range from 146 km2 to 539 km2 with rivers from 27 to

123

34 km in length. The elevation of study sites ranges from 262 and 490 m a.s.l. in area with

124

predominant carbonate and/or silicate geology (e.g., upper Triassic limestone and dolomite,

125

tufa, sandstones, conglomerate, clay, and marls) (Komac, 2005). Mean annual precipitation

126

ranges from 1100 to 2200 mm and the mean annual discharges are from 2.9 to 8.8 m3 s-1

127

(Slovenian Environmental Agency).

AC C

EP

121

128

The dominant land use in all five catchments is natural, mixed coniferous and

129

deciduous forest. A moderate increase of agricultural (including arable land and grasslands)

130

and urban land use (including towns, residential areas, industrial zones) is observed 5

ACCEPTED MANUSCRIPT longitudinally. The rivers are partially channelized, especially in urban areas. Instream weirs

132

(i.e., low head dams) and embankments are also present, which are used to moderate flow

133

prevent flooding. In most agricultural areas, riparian zone is dominated by willow, alder and

134

species typical of the region, while in urban areas embankments prevent the development of a

135

riparian strip.

RI PT

131

136

138

2.2. Data collection

Land use data were extracted from the polygon database of the Slovenian Ministry for

SC

137

the Agriculture, Forestry and Food. The proportion of land use types (forest, agricultural,

140

urban) was determined for the contributing part of the catchment upstream of each sampling

141

site and for the 250 m impact zone adjacent to the studied river segment. The samples were

142

collected at 24 (autumn and winter 2013) and 9 (spring 2014) locations within five

143

experimental catchments to encompass the variability in adjacent land use pressures, reach-

144

scale environmental features, and ecological responses. Samples were takenat riffle

145

mesohabitats of the river where the HZ is rarely studied, but are of great ecological

146

importance for biota and extend over large parts of the streambed (Storey et al., 2003). At

147

each sampling site, three spatial replicates were selected within the river channel in order to

148

take into the account the reach-scale variability in the data. The data were gathered at two

149

depths: 5-15 and 20-40 cm, in order to observe differences in ecological processes biomass

150

amount along a vertical gradient, which is an important controlling factor of ecological

151

processes in the HZ (Storey et al., 2003). During the spring, the number of sampling sites was

152

lower due to high water levels that prevented sampling as specific locations. Respiration at

153

depth of 20-40 cm was measured only in spring, resulting in 27 data records.

154 155

AC C

EP

TE D

M AN U

139

At each site and at each depth, the water temperature, conductivity, pH and oxygen levels were measured in triplicate using field probes (WTW Multi 3430 set), while water 6

ACCEPTED MANUSCRIPT samples were collected for laboratory analysis. Once in the laboratory, alkalinity was

157

measured using Gran titrations, ion chromatography (Metrohm, 761 Compact IC) was applied

158

to analyse the cations and anions, total phosphate (Ptot) and total nitrogen (Ntot) were

159

determined spectrophotometrically (Perkin Elmer, Lambda 25), and dissolved organic carbon

160

(DOC) was determined using the non-purgeable organic carbon (NPOC) method (Analytic

161

Jena Multi C/N 3100).

162

RI PT

156

Sediment samples for in situ R, ETSA and TPC measurements at HZ depth of 5-15 cm, were obtained using a PVC sampling tube (30 cm width, 60 cm high). Sediment from the

164

bottom of the sampling tube was collected after removing the surface layer and sieved

165

through a 5 mm mesh sieve. Part of the sample was used for in situ R, and part was used for

166

ETSA and TPC measurements. Sediment samples from a depth of 20 to 40 cm were obtained

167

using the Bou–Rouch method (Bou and Rouch, 1967), where a perforated pipe (5 mm

168

apertures) was inserted into the sediments and samples were extracted using a piston pump.

169

Particulate organic matter (POM) was determined as loss on ignition at 550°C for 3 h. The

170

fine suspended sediment, as an indicator of river bed clogging and the fine organic matter,

171

were determined by incubating water samples (1 L) for 24 samples obtained by either stirring

172

river bed sediments or by pumping and weighting the residue after drying (24 h, 60°C) and

173

ignition (3 h, 550°C). The sediment composition was estimated by fractionating the dry

174

sediment into five grain-size classes (<0.063 mm, 0.063-0.2 mm, 0.2-2 mm, 2-4 mm¸4-5 mm)

175

using a series of stainless steel sieves.

M AN U

TE D

EP

AC C

176

SC

163

In situ R was measured using the closed bottle system (Uehlinger et al., 2002).

177

Plexiglas tubes were half-filled with sieved sediment (< 5 mm), filled to the top with water,

178

sealed and incubated in situ for 2 hours. An optical dissolved oxygen sensor (WTW, FDO®

179

925) was then used to measure temperature and oxygen concentration before and after

180

incubation. Respiration was expressed as O2 consumption per gram of dry weight of sediment 7

ACCEPTED MANUSCRIPT per hour (µL O2 g DW−1 h−1). The ETSA was measured by applying a modified assay adapted

182

from Packard (1971). The frozen sediment samples were thawed and homogenized in an ice-

183

cold homogenization buffer. Samples were then centrifuged, and an aliquot of supernatant

184

was incubated with the substrate (0.1 M sodium phosphate buffer pH = 8.4; 1.7 mM NADH;

185

0.25 mM NADPH; 0.2 % (v/v)) and reagent (Triton-X-100; 2.5 mM 2-(p-iodophenyl)-3-(p-

186

nitrophenyl)-5-phenyl tetrazolium chloride)) solution for 40 min at 15°C. Formazan

187

production was determined spectrophotometrically and the ETSA was measured as the rate of

188

tetrazolium dye reduction, which was converted to oxygen used per dry mass in a given time

189

interval (µL O2 g DW−1 h−1). An estimate of total protein content was made calorimetrically

190

according to the method of Lowry et al. (1951) using a Sigma Protein Assay Kit (P 5656

191

Sigma Diagnostics, St Louis, MO, USA). All field measurements and laboratory analyses are

192

described in detail in Mori et al. (2017) and Debeljak et al. (2017).

193

195

2.3 Database and data pre-processing

TE D

194

M AN U

SC

RI PT

181

The data used to build the models were composed of predictors, i.e., environmental variables and response variables indicating ecological processes and microbial biomass. The

197

majority of ecologically relevant environmental parameters were included in the study as

198

predictors. Some of these parameters either exceed or fall below typical values for this region

199

and were indicated as stressors. Typical ranges for this region were obtained by including

200

measurements only from pristine (i.e., forested) locations where anthropogenic pressures,

201

such as agricultural or urban land use, pollution, and geomorphological pressures were not

202

present.

203

AC C

EP

196

Prior to analyses, manual data discretization of the measured response variables was

204

performed. For this purpose, new discrete valued attributes (i.e., “low”, “med” and “high”)

205

were used to replace the measured numeric response attributes. The discretization was 8

ACCEPTED MANUSCRIPT performed differently for each dataset (separately for each sampling depht) and for each

207

target/response variable (R, ETSA, and TPC). This was done to ensure equal representation of

208

the three classes in the dataset. Additionally to data discretization, automatic attribute

209

selection techniques included in WEKA (Witten et al., 2011) were employed in order to

210

improve the modelling accuracy. These techniques discard irrelevant or redundant attributes

211

from a given dataset. The first technique applied was Information Gain Attribute Ranking

212

(Hall and Holmes, 2003), which evaluates the worth of an attribute by measuring the

213

information gain with respect to the class. However, this method does not take into account

214

attribute interaction. Another technique used for evaluating subsets of attributes rather than

215

individual attributes is the Correlation-based Feature Selection (CFS; Hall, 1999). The CFS

216

algorithm takes into account the usefulness of individual attributes for predicting class and the

217

level of inter-correlation among them. The method values subsets that correlate highly with

218

class value and have low correlation with each other.

M AN U

SC

RI PT

206

220 221

2.4 Decision trees

TE D

219

Decision trees are hierarchical structures composed of three types of nodes (a root, internal nodes and leaves) connected by branches. The root is the starting node situated at the

223

top of the decision tree, and together with the internal nodes, contain tests on the input

224

attributes. The leaves (terminal nodes) contain the predictions of the target (class) values.

225

Decision trees are interpreted in terms of IF-THEN rules (Gal et al., 2013). In this study,

226

decision trees were built using J48 algorithm which is Java’s re-implementation of the C4.5

227

algorithm (Quinlan, 1993) incorporated into the machine-learning package WEKA (Witten et

228

al., 2011). The J48 algorithm repeatedly partitions the original dataset into subsets, as

229

homogeneously as possible (in terms of number of examples) with respect to the target

230

variable. Its most important tasks involve finding the optimal splitting values of the measured

AC C

EP

222

9

ACCEPTED MANUSCRIPT attributes and the most accurate prediction of the target. Pruning was applied to cope with

232

decision tree complexity and avoid overfitting. Pruning improves the transparency of the

233

induced trees by reducing their size, as well as enhancing the classification accuracy by

234

eliminating errors resulting from noisy data (Bratko, 1989). During tree construction, forward

235

pruning by implementing the “minimum number of instances” criterion was applied.

236

According to this criterion, every leaf should contain a minimum number of examples

237

otherwise, no branching is allowed.

RI PT

231

Decision trees learn from using a training data set. The quality of the constructed model,

SC

238

i.e., the accuracy of prediction or predictive performance, is expressed as a percentage of

240

correctly classified instances (% CCI). For the purpose of generalization and model re-

241

usability (e.g., application in other similar catchments) different validation procedures were

242

applied. First, “automatic” cross-validation (CV) was used, where the original dataset was

243

randomly partitioned into a chosen number of folds (N=10). During each turn, a fold was used

244

for testing, while the remaining 9-folds were used for training. The final error was given as

245

the average error from all the generated models. Next, “manual” validation was applied by

246

splitting the original dataset into five subsets, based on the experimental catchments. The aim

247

was to investigate whether the selection of particular catchments for training the decision tree

248

models improves their predictive performance. In turn, each data subset (samples collected

249

within the catchment) was used for testing, while the remaining data set (samples collected

250

within the remaining four catchments) was used for training the model.

AC C

EP

TE D

M AN U

239

251

3. Results

252 253 254

3.1

Hyporheic zone environmental conditions and microbial respiration and biomass

10

ACCEPTED MANUSCRIPT Spatial analysis revealed that within the study area, the proportion of agricultural and

256

urban land use in the buffer zone was up to 0.84 and 0.93, respectively, while at certain sites

257

native forest was completely absent (Table 1). With the exception of Ntot, water chemistry

258

parameters were outside of their natural ranges, while temperature and pH were within typical

259

ranges for this area (considering native forests overgrown with no anthropogenic influence).

260

Similarly, FS and FOM, both indicate clogging, and individual sediment fractions exceeded

261

reference values, while POM was below the normal range at some sites.

RI PT

255

Respiration rates (R) ranged from values close to zero to 1.2 and from 0.4 to 3 µL O2 g

SC

262

DWsed-1 h-1, at 5-15 cm and 20-40 cm, respectively (Table 2). ETSA at 5-15 cm was from 0.0

264

to 2.8 µL O2 g DWsed-1 h-1 and from 0.0 to 3.3 µL O2 g DWsed-1 h-1 at 20-40 cm. TPC ranged

265

from 20.9 to 468.9, and from 87.2 to 1,693.2 µg protein g DWsed-1 for the two depths

266

respectively. Simple regression plots show a significant increase in R, ETSA and TPC with

267

temperature at both measured depth ranges. Respiration measured at 5-15 cm showed the

268

strongest dependence on temperature (R2= 0.62) (Figure 2). A significant but weak

269

relationship was observed between catchment urban land use and R at 5-15 cm, ETSA, and

270

TPC, but not with R at 20-40 cm. The strongest dependence of ETSA on proportion of urban

271

land use was observed at 5-15 cm, and of ETSA and TPC at 20-40 cm. A significant and very

272

weak (R2<0.2) relationship was between ammonium and all three response variables (R,

273

ETSA, TPC) at both depths.

274 275 276

3.2

AC C

EP

TE D

M AN U

263

Predictive performances of decision tree models

When a stratified 10-fold cross-validation (CV) was applied to the whole data set, the

277

models were relatively highly predictive (CCI above 50%, Table 3). The explanatory power

278

of the decision trees was the highest for the model using R as response variable and data from

279

the depth 5-15 cm. The model based on data from the 20-40 cm HZ layer and R performed 11

ACCEPTED MANUSCRIPT much worse, most likely due to a lack of data. The predictive performances of models using

281

data from both depths, and ETSA and TPC were satisfactory (CCI>50%). A modest variation

282

in the models’ predictive performance was observed, using an array of models built by

283

dividing the data set into training (four catchments) and testing (one catchment) subsets

284

(Table 3).

RI PT

280

285

3.3

287

interactions

288

Response of ecological indicators and environmental factors-multiple stressors

SC

286

Decision tree models demonstrated that at 5-15 cm, temperature (at a threshold of 9.3°C) is the most important factor affecting the intensity of riverbed respiration (Fig. 3a).

290

The model also shows that below 6.2°C there is no interaction with other stressor or

291

environmental variables. At temperatures between 6.2 to 9.3°C, the presence of sulphate (>6.5

292

mg L-1) leads to low rates of R. At temperatures above 9.3°C, dissolved nitrite (NO2-),

293

potassium (K+), calcium (Ca2+) and sulphate (SO43-) are important variables determining

294

moderate to high R rates. The decision tree for the HZ layer at 20-40 cm and R were less

295

accurate (38% CCI), but still provide valuable information regarding the importance of

296

sediment composition and hydraulic conductivity for respiration (Fig. 3b).

TE D

EP

297

M AN U

289

Models using ETSA as the response variable reveal the importance of temperature in interaction with land use and water chemistry (Fig. 4a). At depths of 5-15 cm, the

299

combination of temperature (>9.3ºC) and either Ca2+ content (>62.5 mg L-1), or the

300

interaction of Ca2+ (≤62.5 mg L-1) and Ntot (>0.93 mg L-1) resulted in a high ETSA (≥0.6 µL

301

O2 g DWsed-1 h-1). Alternatively, the interaction of temperature (≤9.3ºC) and low urban land

302

use in the catchment (≤0.02) results in low ETSA (≤0.3 µL O2 g DWsed-1 h-1). At 20-40 cm,

303

the best model using ETSA shows the importance of forest within the catchment area (Fig.

304

4b). When native forests covered >0.79 of the catchment, ETSA was low (<0.2 µL O2 g

AC C

298

12

ACCEPTED MANUSCRIPT 305

DWsed-1 h-1). However, ETSA was also low when forest covered ≤0.79. This was observed

306

when temperatures were extremely low (≤5.6ºC), higher than 5.6ºC and in interaction with

307

land use, or higher than 12ºC and in interaction with land use and sulphate concentrations The model for 5-15 cm depth using TPC as response variable (Fig. 5a) shows that

308

increased ammonium (NH4+) concentrations (>0.1 mg L-1) resulted in a high microbial

310

biomass (TPC). On the other hand, NH4+ concentration <0.1 mg L-1, with DOC <4.8 mg L-1

311

and a low urban land use (<0.04) in the catchment resulted in a low TPC. The model

312

exhibited much more complex interactions between predictors when using data from 20-40

313

cm depth (Fig. 5b). For instance, when urban land use in the catchment was >0.03 and

314

proportion of forest in the 250 m buffer zone was ≤0.03, the level of microbial biomass was

315

the highest. During winter the interaction of catchment urban land use (<0.03), buffer zone

316

urban land use (<0.60) and NH4+ concentrations (>0.06 mg L-1) was linked with low level of

317

biomass. Interestingly, at lower levels of NH4+ in the presence of agricultural land in the

318

catchment (>0.15), biomass was still low. During summer, low biomass was shaped by the

319

interaction of catchment urban land use (<0.03), buffer zone urban land use (<0.60) and high

320

Ca2+ concentrations (>60.5 mg L-1).

321

323 324

4.

Discussion

AC C

322

EP

TE D

M AN U

SC

RI PT

309

This study provides new information on the ranges of hyporhec respiration rates and

325

productivity measured as microbial biomass across several pre-alpine catchments under a

326

gradient of anthropogenic pressures. The decision tree models improved understanding of the

327

causal relationship between multiple stressors and environmental factors on one side and

328

hyporheic microbial metabolism and biomass on the other side. They also provided the 13

ACCEPTED MANUSCRIPT threshold values of specific environmental factors below/above which we can expect an

330

increase or decrease of respiration, potential respiration, and microbial biomass within the

331

HZ. When developing models, we considered measured variables of different types (land use,

332

temperature, water chemistry and sediment structure). By applying attribute selection

333

techniques, all irrelevant or redundant variables having no or very little impact on the selected

334

response variable were automatically removed. The temperature, land use and water

335

chemistry including elevated concentrations of sulphate (SO42-), nitrite (NO2-), ammonium

336

(NH4+), potassium (K+), calcium (Ca2+), and/or dissolved organic carbon (DOC) were

337

recognized as the most important factors for HZ microbial respiration (R, ETSA) and biomass

338

(TPC).

M AN U

SC

RI PT

329

339 340 341

4.1 Decision tree model development and validation

A comparison of the models built by a) dividing the data set into training and testing subsets based on catchment units (“train-test” method) and b) a stratified 10-fold cross-

343

validation (“CV”), shows that the generated models performed well. Only a modest variation

344

in the predictive performances of the models was observed when using data from different

345

catchments as the testing data sets. In aquatic ecology, the most widely used approach to

346

evaluate decision tree models is to employ the 10-fold cross-validation (Dakou et al., 2007;

347

Gal et al., 2013). However, when working with larger data sets from several catchments, it is

348

important to test whether the catchment, as a random variable, affects the modelling results.

349

This study demonstrates the important effect of catchment selection on the model

350

performance and show that predictive performance can be improved by combining data from

351

different catchments to form larger data sets.

352 353

AC C

EP

TE D

342

When applying CV to the whole data set, the models that predict ETSA and TPC using data from 20-40 cm depth performed slightly better than models using data from the 5-15 cm 14

ACCEPTED MANUSCRIPT depth. This suggests that there must be some additional environmental factors or stressors

355

influencing the measured indicators at 5-15 cm depth, such as surface flow velocity, shear

356

stress, permeability and hydraulic conductivity. Hydrological parameters are important

357

drivers of HZ processes (Boulton et al., 1998) and should be included in the models in the

358

future to obtain better predictions. Despite the high complexity and heterogeneity of the data,

359

the models proved accurate for all three response variables and depths. The only exception

360

was when predicting R at a depth of 20-40 cm, where the data set was smaller due to lack of

361

data. In general, complex ecological data sets from spatially and temporally dynamic

362

environments with hierarchical organization and the catchment as a major unit, are difficult to

363

analyse using standard statistical methods where assumptions, such as homoscedasticity,

364

independent and normally distributed residuals, no multicollinearity, etc., have to be fulfilled

365

(Downes et al, 2002). On the other hand, ML tools allow for working with noisy data sets

366

from complex and dynamic domains (Gal et al., 2013).

SC

M AN U

The best models were built using R as a response variable. Respiration is a good

TE D

367

RI PT

354

functional indicator of natural dynamics (Doering et al., 2011) and anthropogenic stress (e.g.,

369

eutrophication) (Hill et al., 2000; Janssens et al., 2001). Since R directly depends on

370

temperature and nutrient availability, it represents an immediate response of microbial

371

community to actual, short-term environmental conditions (Simčič et al., 2015). The models

372

with the least predictive power are those that use TPC as a response variable. TPC indicates

373

biomass of microorganisms and extracellular polymeric substances proteins in the samples

374

and is therefore a structural indicator of long-term ecosystem condition (Franken et al., 2001).

375

The models that use ETSA were slightly better. ETSA measures overall enzymatic activity

376

(maximum reaction rate) of respiratory electron transport system at standard temperature and

377

without substrate limitation and is a reflection of environmental conditions on longer term.

378

Equilibrium of ETSA is attained after a few days in the altered environment (Simčič et al.,

AC C

EP

368

15

ACCEPTED MANUSCRIPT 2015). This study finds that for the HZ in gravel bed rivers, more accurate models can be

380

built using response variables that reflect ecosystem function (R, ETSA) rather than structure

381

(TPC). Up to now, the high potential of functional indicators for detecting anthropogenic

382

impacts has been emphasised many times (Sandin and Solimini, 2009; Palmer and Febria,

383

2012; von Schiller et al., 2017). This models can be used for modelling respiration, potential

384

respiration, and protein content within the HZ of other subalpine catchments, as long as they

385

share similar characteristics to those of the 5 experimental catchments (comparable proportion

386

of land use and similar ranges of environmental factors). Based on the measured

387

environmental parameters in any such catchment, the magnitude of respiration and microbial

388

biomass can be determined. Also, the models can be used to predict changes in magnitude of

389

respiration and biomass in a case some environmental factor changes, as long as it’s value

390

stays within the initial range of values used for building the models in this study.

M AN U

SC

RI PT

379

391

4.2 Between-stressor and stressor-ecological indicators linkages

TE D

392

This study finds that water temperature is an important predictor of functional

394

indicators (R, ETSA), but is irrelevant for predicting structural indicator (TPC) response. This

395

was shown also with simple regression plots, where relationships between the temperature

396

and response variables were significant, but strength of correlation was weaker for TPC,

397

indicating that the temperature influences microbial biomass in complex interactions with

398

other environmental factors. In general, an increasing temperature accelerates chemical

399

reactions and enhances biological processes, such as metabolic rate, microbial growth and

400

activity (Davidson and Janssens, 2006; Mora-Gomez et al., 2016). Based on previous studies

401

(Simčič and Mori, 2007; Mori et al., 2017; Debeljak et al., 2017) it is expected that

402

temperature will be an important predictor of R but not for ETSA, which reflects

AC C

EP

393

16

ACCEPTED MANUSCRIPT environmental conditions on longer term, and that the effects of temperature decreases with

404

depth (Hester et al., 2009). According to Hill et al. (2000) there is a significant relationship

405

between sediment R and temperature and several chemical variables. They emphasized the

406

different extent to which temperature influences R in water bodies. They concluded that the

407

temperature-R relationship is not simple causality, and that more complex models and further

408

research is necessary for making solid predictions. Some of the reasons for this are that the

409

temperature effects on microbial activity is resource dependent and that microbial

410

communities are functionally adapted or acclimated to in situ temperature (Hall et al., 2010).

411

An important finding is that the causal relationship between temperature and R depends on

412

temperature range. When water temperatures are extremely low, a simple temperature-R

413

relationship is observed, while at higher temperatures, complex interactions between

414

temperature and dissolved ions influences the intensity of R. This suggests that when

415

environmental temperatures are extremely low, they act as the only limiting factor. However,

416

when the water temperature is considered optimal (i.e., 10-30°C) (e.g., Mora-Gomez et al.,

417

2016), microbial community respiration rates responds also to changes in the water chemistry,

418

such as nitrite, potassium, calcium and sulphate concentrations.

SC

M AN U

TE D

In a case of sulphate, it seems that it is connected with decreased respiration at two

EP

419

RI PT

403

thresholds (i.e., 6.5 and 17.3 mg L-1), depending on temperature. At low temperatures,,

421

medium R occurs, but when sulphate concentration exceeds the threshold, R is low. Similarly,

422

at optimal water temperatures, high R occurs, but when concentrations exceed the threshold,

423

R is of medium values. This indicates the importance of temperature when looking at the

424

influence of sulphate on hyporheic respiration.

425

AC C

420

In contrast to previous studies (e.g., Mori et al., 2017, Debeljak et al., 2018), this study

426

showed the importance of temperatures also for ETSA. The models for 5-15 cm depth shows

427

that the temperature was the most important predictor for ETSA. Large temperature ranges 17

ACCEPTED MANUSCRIPT and structurally similar habitats used in this study are most probably the reason for this.

429

Similarly as for R, interactions with water chemistry parameters were important for the

430

model, but catchment urban land use was additional important predictor for ETSA at 5-15

431

cm). Land use modifies in-stream factors controlling river metabolism through increased

432

nutrient, sediment, and pollutant runoff from agricultural and urban sources (Ponsatí et al.,

433

2015). These impacts are confounded with catchment natural characteristics (climate,

434

geology, soil, vegetation type) that also alter in-stream abiotic properties (Allan, 2004). A

435

single river study indicated significant relationship between ETSA and nutrients (NO3-), but

436

not temperature influence, while the impact of land use was not studied (Simčič and Mori,

437

2007).

SC

M AN U

438

RI PT

428

Presence of forest in the catchment was the most important predictor for the model based on data the depths of 20-40 cm, which was more complex than the one for the depth of

440

5-15 cm. Together with less forest in the catchment, the temperature and sulphate were

441

important for the predicting ETSA rates. Wherever natural forest is removed from the riparian

442

zone and whole catchment, streams and rivers are usually warmer during summer, and

443

primary production usually increases due to the lack of shadow (Allan, 2004). This

444

consequently leads to increased HZ nutrient input and increased microbial activity. Here,

445

similarly to the model with R at 5-15 cm, sulphate in combination with land use supressed or

446

induced ETSA rates, depending on water temperature. These results show the importance of

447

anthropogenic sulphate inputs for river metabolism that can act either as an inhibitor or as

448

stimulator of metabolism.

449

AC C

EP

TE D

439

For models predicting microbial biomass, estimated using TPC, a combination of

450

NH4+, DOC and catchment urban land use was important for depth of 5-15 cm. Clearly,

451

nutrients, such as NH4+ or DOC can act as stimulator of TPC when exceed a certain threshold

452

or inhibitors when nutrients are below certain level, especially when in interaction with 18

ACCEPTED MANUSCRIPT presence of low proportion of urban land use in the catchment. Similar to ETSA at 20-40 cm,

454

land use when combined season, NH4+, Ca2+, were the main predictor that influences TPC at

455

20-40 cm depth. A study of Hendricks (1996) partly reflects the patterns from this study.

456

They demonstrated a significant impact of season, depth and zone (upwelling, downwelling)

457

for microbial biomass in the hyporheic zone. They also found inconsistent pattern in

458

microbial response to increased DOC that was linked with the season.

RI PT

453

459

5.

Conclusions

SC

460 461

This study contributes new knowledge about catchment-scale patterns and drivers

M AN U

462

influencing respiration and microbial biomass in the hyporheic zone. Decision trees based on

464

data from five gravel bed rivers confirmed the important role that temperature and nutrient

465

inputs from anthropogenic activities have on hyporheic processes and structure, and provided

466

new information on the importance of land use, and the interactions between stressors and

467

environmental factors on microbial activity and biomass. As demonstrated, the selection of

468

measure (either functional or structural) and sampling depth is important for explaining the

469

causal relationship between environment and biological responses. In general, temperature,

470

alone or in interaction with other stressors indicating point or diffuse pollution, is one the

471

most important predictors of functional measures (respiration and respiratory electron

472

transport system activity). When looking at structural measure (i.e., microbial biomass), the

473

catchment land use and the nutrients are critical.

474

AC C

EP

TE D

463

A highly relevant finding is that for the study area, an individual stressor, such as sulphate

475

or nitrite, can act as a stimulator or inhibitor of biological processes when exceeding certain

476

threshold values. Moreover, when combined with other environmental variables and stressors,

477

it can have the same impact even if it falls below the defined threshold value. However, larger 19

ACCEPTED MANUSCRIPT data sets over larger geographical ranges are needed to confirm these patterns. These findings

479

demonstrate that it is important to consider interactions between stressors, when developing

480

management plans for freshwater ecosystems and that climate change will have more

481

pronounced effect on ecosystem functioning then structure by accelerating biological

482

processes due to increased temperatures.

RI PT

478

483 484

6.

Acknowledgements

The study was funded by the Slovenian Research Agency (ARRS) (project L2-6778;

SC

485

program P1-0255 and programme for young researchers) and partly by the European

487

Communities 7th Framework Program Funding under Grant agreement no. 603629-ENV-

488

2013-6.2.1-Globaqua. We thank to Bor Kranjc, Žiga Ogorevc, Maja Opalički Slabe, Andrej

489

Peternel, Tomaž Jagar and Allen Wei Liu for help during field work, to Andreja Jerebic and

490

Maryline Pflieger for chemical analyses, to Rok Ciglič for land use analyses and David

491

Kocman for graphical support.

492

493

7.

494

Allan, J.D., 2004. Landscapes and riverscapes: the influence of land use on stream

AC C

EP

References

TE D

M AN U

486

495

ecosystems. Annu. Rev. Ecol. Evol. Syst. 35, 257–824.

496

https://doi.org/10.1146/annurev.ecolsys.35.120202.110122.

497

Aristi, I., von Schiller, D., Arroita, M., Barceló, D., Ponsatí, L., García-Galán, M. J., Sabater,

498

S., Elosegi, A., Acuña, V. 2015. Mixed effects of effluents from a wastewater treatment

499

plant on river ecosystem metabolism: subsidy or stress? Freshwat. Biol. 60, 1398-1410.

500

https://doi.org/10.1111/fwb.12576. 20

ACCEPTED MANUSCRIPT 501

Battin, T. J., Luyssaert, S., Kaplan, L. A., Aufdenkampe, A. K., Richter, A., Tranvik, L. J.

502

(2009). The boundless carbon cycle. Nature Geoscience, 2, 598-600.

503

doi:10.1038/ngeo618

504

Boets, P., Lock, K., Goethals, P.L.M., 2013. Modelling habitat preference, abundance and species richness of alien macrocrustaceans in surface waters in Flanders (Belgium)

506

using decision trees. Ecol. Inform. 17, 78-81.

507

https://doi.org/10.1016/j.ecoinf.2012.06.001.

509

Bou, C., Rouch, R., 1967. Un nouveau champ de recherches sur la faune aquatiqu souterraine.

SC

508

RI PT

505

– C. R. Acad. Sci. Paris 265: 369 – 370.

Boulton, A. J., Findlay, S., Marmonier, P., Stanley, E. H., Valett, H. M., 1998. The functional

511

significance of the hyporheic zone in streams and rivers. Ann Rev Ecol Syst, 29, 59-81.

512

10.1146/annurev.ecolsys.29.1.59

513

M AN U

510

Bratko, I., 1989. Machine learning. In: Gilhooly, K.J. (Ed.), Human and machine problem solving. Pelnum Press, New York and London, pp. 265-287.

515

https://doi.org/10.1007/978-1-4684-8015-3.

516

TE D

514

Dakou, E., D'heygere, T., Dedecker, A., Goethals, P., Lazaridou-Dimitriadou, M., De Pauw, N., 2007. Decision tree models for prediction of macroinvertebrate taxa in the River

518

Axios (Northern Greece). Aquat. Ecol. 41, 399-411. https://doi.org/10.1007/s10452-

519

006-9058-y.

521 522 523 524

AC C

520

EP

517

Davidson, E.A., Janssens, I.A., 2006. Temperature sensitivity of soil carbon decomposition and feedbacks to climate change. Nature 440, 165-173. https://doi.org/10.1038/nature04514. Debeljak, B., Simčič, T., Ciglič, R., Pflieger, M., Mori, N., 2017. Spatio-temporal variation in microbial respiration in the shallow hyporheic zone of pre-Alpine rivers related to

21

ACCEPTED MANUSCRIPT 525

catchment land use. Fundam. Appl. Limnol. 190, 265-277.

526

https://doi.org/10.1127/fal/2017/0962.

527

Doering, M., Uehlinger, U., Ackermann, T., Woodtli, M., Tockner, K., 2011. Spatiotemporal heterogeneity of soil and sediment respiration in a river-floodplain mosaic

529

(Tagliamento, NE Italy). Freshwat. Biol. 56, 1297–1311.

530

https://doi.org/10.1111/j.1365-2427.2011.02569.x.

531

RI PT

528

Downes, B.J., Barmuta, L.A., Fairweather, P.G., Faith, D.P., Keough, M.J., Lake, P.S.,

Mapstone, B.D., Quinn, G.P., 2002. Monitoring Ecological Impacts. Concepts and

533

Practice in Flowing Waters. New York: Cambridge University Press. 434 pp.

534

Džeroski, S., Grbovic, J., Demsar, D., 2000. Predicting chemical parameters of river water

M AN U

SC

532

535

quality from bioindicator data. Appl. Intell. 13, 717.

536

https://doi.org/10.1023/A:1008323212047.

537

European Union. 2000. Directive 2000/60/EC of the European Parliament and of the Council of October 2000 establishing a framework for Communities in the field of water

539

policy, Official Journal of the European Communities, L 327/1, 22.12.2000.

540

TE D

538

Feld, C.K., Segurado, P., Gutierrez-Canovas, C., 2016. Analysing the impact of multiple stressors in aquatic biomonitoring data: A cookbook with applications in R. Sci. Total

542

Environ. 573, 1320-1339. https://doi.org/10.1016/j.scitotenv.2016.06.243.

EP

541

Ferreira, V., and Chauvet, E., 2011. Synergistic effects of water temperature and dissolved

544

nutrients on litter decomposition and associated fungi. Glob. Chang. Biol. 17, 551-

545 546

AC C

543

564. https://doi.org/10.1111/j.1365-2486.2010.02185.x.

Franken, R.J.M., Storey, R.G., Williams, D.D., 2001. Biological, chemical and physical

547

characteristics of downwelling and upwelling zones in the hyporheic zone of a north-

548

temperate stream. Hydrobiologia, 444: 183–195.

549

https://doi.org/10.1023/A:1017598005228. 22

ACCEPTED MANUSCRIPT 550

Gal, G., Škerjanec, M., Atanasova, N., 2013. Fluctuations in water level and the dynamics of

551

zooplankton: a data-driven modelling approach. Freshwat. Biol. 58, 800–816.

552

https://doi.org/10.1111/fwb.12087.

554 555

Hall, M.A., 1999. Correlation based feature subset selection for machine learning. PhD Thesis, University of Waikato, Hamilton, New Zealand, 198 p.

RI PT

553

Hall, M.A., Holmes, G., 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 1437-1447.

557

http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1245283.

558

SC

556

Hall, E.K., Singer , G.A., Kainz, M.J., Lennon, J.T. 2010. Evidence for a temperature acclimation mechanism in bacteria: an empirical test of a membrane-mediated trade-

560

off. Funct Ecol. 24, 898-908. 10.1111/j.1365-2435.2010.01707.x.

561

M AN U

559

Hadwen, W.L., Fellows, C.S., Westhorpe, D.P., Rees, G.N., Mitrovic, S.M., Taylor, B., Baldwin, D.S., Silvester, E., Croome, R., 2010. Longitudinal trends in river

563

functioning: Patterns of nutrient and carbon processing in three Australian rivers.

564

River Res. Appl. 26, 1129-1152. https://doi.org/10.1002/rra.1321.

565

TE D

562

Hill, B.H., Hall, R.K., Husby, P., Herlihy, A.T., Dunne, M., 2000. Interregional comparisons of sediment microbial respiration in streams. Freshwat. Biol. 44, 213-222.

567

https://doi.org/10.1046/j.1365-2427.2000.00555.x.

569 570

Hendricks, S.P. 1996. Bacterial biomass, activity, and production within the hyporheic zone

AC C

568

EP

566

of a north-temperate stream. Arch. Hydrobiol. 136, 467-487.

Hester E.T., Doyle M.W., Poole G.C., 2009. The influence of in‐stream structures on summer

571

water temperatures via induced hyporheic exchange. Limnol. Oceanogr. Methods 54:

572

355–367. https://doi.org/10.4319/lo.2009.54.1.0355.

23

ACCEPTED MANUSCRIPT 573

Jackson, M.C., Loewen, C.J.G., Vinebrooke, R.D., Chimimba, C.T., 2016. Net effects of

574

multiple stressors in freshwater ecosystems: a meta-analysis. Glob. Chang. Biol. 22,

575

180-189. https://doi.org/10.1111/gcb.13028. Janssens, I.A., Lankreijer, H., Matteucci, G., Kowalski, A.S., Buchmann, N., Epron, D.,

577

Pilegaard, K., Kutsch, W., Longdoz, B., Grünwald, T., Montagnani, L., Dore, S.,

578

Rebmann, C., Moors, E. J., Grelle, A., Rannik, Ü., Morgenstern, K., Oltchev, S.,

579

Clement, R., Guðmundsson, J., Minerbi, S., Berbigier, P., Ibrom, A., Moncrieff, J.,

580

Aubinet, M., Bernhofer, C., Jensen, N.O., Vesala, T., Granier, A., Schulze, E. D.,

581

Lindroth, A., Dolman, A.J., Jarvis, P.G., Ceulemans, R., Valentini, R., 2001.

582

Productivity overshadows temperature in determining soil and ecosystem respiration

583

across European forests. Glob. Chang. Biol. 7, 269-278.

584

https://doi.org/10.1046/j.1365-2486.2001.00412.x.

SC

M AN U

585

RI PT

576

Kaandorp, V.P., Molina-Navarro, E., Andersen, H.E., Bloomfield, J.P., Kuijper, M.J.M., de Louw, P.G.B., 2018. A conceptual model for the analysis of multi-stressors in linked

587

groundwater-surface water systems. Sci. Total Environ. 627, 880-895.

588

https://doi.org/10.1016/j.scitotenv.2018.01.259.

TE D

586

Komac M., 2005. Statistics of the Geological map of Slovenia at scale 1:250.000.

590

Krause, S., Lewandowski, J., Grimm, N.B., Hannah, D.M., Pinay, G., McDonald, K., Martí,

592 593 594 595 596

E., Argerich, A., Pfister, L., Klaus, J., Battin, T., Larned, S.T., Schelker, J.,

AC C

591

EP

589

Fleckenstein, J., Schmidt, C., Rivett, M.O., Watts, G., Sabater, F., Sorolla, A., Turk, V., 2017. Ecohydrological interfaces as hot spots of ecosystem processes. Water Resour. Res. 53, 6359-6376. https://doi.org/10.1002/2016WR019516.

Lowry, O.H., Rosebrough, N.J., Farr, A.L., Randall, R.J., 1951. Protein measurement with the Folin phenol reagent. J. Biol. Chem. 193, 265-275.

24

ACCEPTED MANUSCRIPT

598 599 600 601

Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 1, 81-106. https://doi.org/10.1007/BF00116251. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. San Francisco, CA, USA, Morgan Kaufmann Publishers, Inc. Matthaei, C.D., Piggott, J.J., Townsend, C.R., 2010. Multiple stressors in agricultural streams:

RI PT

597

602

interactions among sediment addition, nutrient enrichment and water abstraction. J.

603

Appl. Ecol. 47, 639-649. https://doi.org/10.1111/j.1365-2664.2010.01809.x.

Mora-Gómez, J., Freixa, A., Perujo, N., Barral-Fraga, L., 2016. Limits of the Biofilm Concept

SC

604

and Types of Aquatic Biofilms. In: Romaní, A.M., Guasch, H., Balaguer, M.D. (eds.)

606

Aquatic Biofilms: Ecology, Water Quality and Wastewater Treatment. Norfolk, UK:

607

Caister Academic Press, pp. 3-28.

608

M AN U

605

Mori, N., Simčič, T., Brancelj, A., Robinson, C.T., Doering, M., 2017. Spatio-temporal heterogeneity of actual and potential respiration in two contrasting floodplains.

610

Hydrol. Process. 31, 2622–2636. https://doi.org/10.1002/hyp.11211.

611

TE D

609

Nogaro, G., Datry, T., Mermillod-Blondin, F., Foulquier, A., Montuelle, B., 2013. Influence of hyporheic zone characteristics on the structure and activity of microbial

613

assemblages. Freshwat. Biol. 58, 2567-2583. https://doi.org/10.1111/fwb.12233.

615 616 617 618 619 620 621

Nõges, P., Argillier, C., Borja, Ã., Garmendia, J. M., Hanganu, J., Kodeš, V., Pletterbauer, F., Sagouis, A., Birk, S., 2017. Quantified biotic and abiotic responses to multiple stress

AC C

614

EP

612

in freshwater, marine and ground waters. Sci. Total Environ. 540, 43-52. https://doi.org/10.1016/j.scitotenv.2015.06.045.

Orghidan, T., 1959. Ein neuer Lebensraum des Unterirdischen Wassers der hyporheischen Biotope. Arch. Hydrobiol. 55, 392-414. Packard, T.T., 1971. The measurement of respiratory electron transport activity in marine phytoplankton. J. Mar. Res. 29, 235 – 244. 25

ACCEPTED MANUSCRIPT 622 623 624

Palmer, M.A., Febria, C.M., 2012. The heartbeat of ecosystems. Science, 336, 1393–1394. https://doi.org/10.1126/science.1223250. Palmer, M.A., Bernhardt, E.S., Allan, J.D., Lake, P.S., Alexander, G., Brooks, S., Carr, J., Clayton, S., Dahm, C.N., Follstad Shah, J., 2005. Standards for ecologically successful

626

river restoration. J. Appl. Ecol. 42, 208–217. https://doi.org/10.1111/j.1365-

627

2664.2005.01004.x.

628

RI PT

625

Pavlin, M., Birk, S., Hering D., Urbanič G., 2011. The role of land use, nutrients, and other stressors in shaping benthic invertebrate assemblages in Slovenian rivers.

630

Hydrobiologia. 678, 137-153. https://doi.org/10.1007/s10750-011-0836-8 Piggott, J.J., Townsend, C.R., Matthaei, C.D., 2015. Reconceptualizing synergism and

M AN U

631

SC

629

632

antagonism among multiple stressors. Ecol. Evol. 5, 1538–1547.

633

https://doi.org/10.1002/ece3.1465.

Ponsatí, L., Corcoll, N., Petrović, M., Picó, Y., Ginebreda, A., Tornés, E., Guasch, H.,

635

Barceló, D., Sabater, S., 2016. Multiple-stressor effects on river biofilms under

636

different hydrological conditions. Freshwat. Biol. 61, 2102–2115.

637

https://doi.org/10.1111/fwb.12764.

Rosa, J., Ferreira, V., Canhoto, C., Graça, M.A.S., 2013. Combined effects of water

EP

638

TE D

634

temperature and nutrients concentration on periphyton respiration – implications of

640

global change. Int. Rev. Hydrobiol. 98, 14–23. https://doi.org/10.1002/iroh.20120151.

641 642 643 644

AC C

639

Sandin, L., Solimini, A.G., 2009. Freshwater ecosystem structure—function relationships: from theory to application. Freshwat. Biol. 54, 2017–2024. https://doi.org/10.1111/j.1365-2427.2009.02313.x. Simčič, T., Mori, N., 2007. Intensity of mineralization in the hyporheic zone of the prealpine

645

river Bača (West Slovenia). Hydrobiol. 586, 221–234. https://doi.org/10.1007/s10750-

646

007-0621-x. 26

ACCEPTED MANUSCRIPT 647

Simčič, T., Mori, N., Hossli, C., Robinson, C.T., Doering, M., 2015. The response in

648

floodplain respiration of an Alpine river to experimental inundation under different

649

temperature regimes. Hydrol. Process. 29, 5438–5450.

650

https://doi.org/10.1002/hyp.10584. Storey, R.G., Howard, K.W.F., Williams, D.D., 2003. Factors controlling riffle-scale

RI PT

651

hyporheic exchange flows and their seasonal changes in a gaining stream: A three-

653

dimensional groundwater flow model. Water Resour. Res. 39, 1084-2000. doi:

654

10.1029/2002WR001367.

656 657

Uehlinger, U., Naegeli, M., Fisher, S.G., 2002. A heterotrophic desert stream? The role of sediment stability. West. N. Am. Nat. 62,466 – 473.

M AN U

655

SC

652

Vinebrooke, R.D., Cottingham, K.L., Norberg, J., Scheffer, M., Dodson, S.I., Maberly, S.C., Sommer, U., 2004. Impacts of multiple stressors on biodiversity and ecosystem

659

functioning: the role of species co-tolerance. Oikos, 104, 451-457.

660

https://doi.org/10.1111/j.0030-1299.2004.13255.x.

661

TE D

658

Volf, G., Atanasova, N., Kompare, B., Precali, R., Ožanić, N., 2011. Descriptive and prediction models of phytoplankton in the northern Adriatic. Ecol. Model. 222, 2502-

663

2511. https://doi.org/10.1016/j.ecolmodel.2011.02.013.

664

EP

662

von Schiller, D., Acuña, V., Aristi, I., Arroita, M., Basaguren, A., Bellin, A., Boyero, L., Butturini, A., Ginebreda, A., Kalogianni, E., Larrañaga, A., Majone, B., Martínez, A.,

666

Monroy, S., Muñoz, I., Paunović, M., Pereda, O., Petrovic, M., Pozo, J., Rodríguez-

667

Mozaz, S., Rivas, D., Sabater, S., Sabater, F., Skoulikidis, N., Solagaistua, L., Vardakas,

668

L., Elosegi, A., 2017. River ecosystem processes: A synthesis of approaches, criteria of

669

use and sensitivity to environmental stressors. Sci. Total Environ. 596, 465-480.

670

https://doi.org/10.1016/j.scitotenv.2017.04.081.

AC C

665

27

ACCEPTED MANUSCRIPT 671

Witten, I. H., Frank, E., Hall, M. A., 2011. Data Mining: Practical Machine Learning Tools

672

and Techniques. Burlington, MA, USA, Morgan Kaufmann Publishers.

673

https://doi.org/10.1016/B978-0-12-374856-0.00018-3.

674

RI PT

675 676 677

SC

678 679

M AN U

680 681 682 683

687 688 689 690 691

EP

686

AC C

685

TE D

684

692 693 694 695 28

ACCEPTED MANUSCRIPT FIGURES

697

Figure 1. Map of the study area indicating sampling sites and land use in five studied

698

catchments (Gradaščica, Kamniška Bistrica, Kokra, Tržiška Bistrica, Selška Sora).

699

Figure 2. Relationships between temperature, proportion of urban land use and ammonium

700

concentration and response variables (R, ETSA, TPC) at depth of 5-15 cm (left) and depth of

701

20-40 cm (right).

702

Figure 3. Decision trees with respiration (R) as response variable for a) hyporheic zone at

703

depth of 5-15 cm, and b) hyporheic zone at depth of 20-40 cm. The values in the leaves

704

indicate correctly/incorrecltly classified instances.

705

Figure 4. Decision trees with respiratory potential (ETSA) as response variable for a)

706

hyporheic zone at depth of 5-15 cm, and b) hyporheic zone at depth of 20-40 cm. The values

707

in the leaves indicate correctly/incorrectly classified instances.

708

Figure 5. Decision trees with total protein content (TPC) as response variable a) hyporheic

709

zone at depth of 5-15 cm, and b) hyporheic zone at depth of 20-40 cm. The values in the

710

leaves indicate correctly/incorrectly classified instances.

AC C

EP

TE D

M AN U

SC

RI PT

696

29

ACCEPTED MANUSCRIPT Table 1. List of predictive variables (attributes) with abbreviations, units and ranges included in modelling. Stressors, i.e., variables exceeding or being below natural ranges are marked with +. Units

F_cat F_buf A_cat A_buf U_cat U_buf O_cat O_buf

proportion proportion proportion proportion proportion proportion proportion proportion

0.54 0.00 0.08 0.00 0.02 0.00 0.02 0.04

-

0.82 0.90 0.32 0.84 0.08 0.93 0.16 0.34

°C µS cm-1

3.5 186 5.15 2157 0.6 0.4 0.0 0.2 0.0 0.1 0.1 0.0 0.0 0.6 0.0 0.0 0.0 0.1 0.0 4.6 1.39 0.0 0.0 24.3 0.2 0.1

-

22.3 1050 8.6 5922 13.2 6.6 0.25 51.1 0.3 11.5 41.2 1.7 12.0 12.8 2.1 177.5 66.0 53.0 3.2 74.6 158 478.5 620.8 969.6 227.2 34.0

+ + + + + + + + + + + + + + + + +

M AN U

+

Temp Cond pH Alk Oxy Ntot Ptot DOC NO2NO3SO42NH4+ Na+ ClK+ Ca2+ Mg2+ FS FOM POM PumpT GS4-5 GS2-4 GS0.2-2 GS0.06-0.2 GS<0.06

EP

AC C

+ + + + +

SC

+ + + + + + +

Ranges

RI PT

Abbreviations Seas

TE D

PREDICTIVE VARIABLES Season (summer, winter, spring) Catchment scale variables Forest land use (catchment scale) Forest land use (250 m buffer zone) Agricultural land use (catchment scale) Agricultural land use (250 m buffer zone) Urban land use (catchment scale) Urban land use (250 m buffer zone) Other land use (catchment scale) Other land use (250 m buffer zone) Reach scale variables Water temperature Water conductivity pH Alkalinity Oxygen concentrations Total nitrogen in water Total phosphorus in water Dissolved organic carbon Nitrite Nitrate Sulphate Ammonium Sodium Chloride Potassium Calcium Magnesium Fine suspended sediment Fine organic matter Particulate organic matter Pumping time (only for 20-40 cm depth) Sediments of grain size 4-5 mm Sediments of grain size 2-4 mm Sediments of grain size 0.2-2 mm Sediments of grain size 0.063-0.2 mm Sediments of grain size <0.063 mm

mEq L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 mg L-1 g DW L-1 g AFDM L-1 g AFDM kg DWsed-1 s (10 L)-1 g DW kg DWsed-1 g DW kg DWsed-1 g DW kg DWsed-1 g DW kg DWsed-1 g DW kg DWsed-1

ACCEPTED MANUSCRIPT Table 2. List of response variables (attributes) with units and ranges included in modelling. Variables were measured over three seasons (summer, winter, spring) and at two HZ depths. R –respiration at in situ temperatures; ETSA – respiratory potential at standard temperature (15°C); TPC – total protein content. Units

R (5-15 cm) R (20-40 cm) ETSA (5-15 cm) ETSA (20-40 cm) TPC (5-15 cm) TPC (20-40 cm)

µL O2 g DWsed-1 h-1 µL O2 g DWsed-1 h-1 µL O2 g DWsed-1 h-1 µL O2 g DWsed-1 h-1 µg protein g DWsed-1 µg protein g DWsed-1

low values max 0.1 1.6 0.3 0.2 123.7 206.0

min 0.1 1.6 0.3 0.2 123.7 206.0

max 0.6 2.2 0.6 0.3 167.1 285.8

AC C

EP

TE D

M AN U

SC

min 0.0 0.4 0.0 0.0 20.9 87.2

medium values

high values min 0.6 2.2 0.6 0.3 167.1 285.8

max 1.2 3.0 2.8 3.3 468.9 1693.2

RI PT

RESPONSE VARIABLES

ACCEPTED MANUSCRIPT Table 3. Results of model validation by applying two different techniques, “division to training and testing data” and cross validation (CV) for three response variables measured at two depths. Numbers indicate % of correctly classified instances (CCI). Catchment names in first column indicate which part of the data set was used for testing the constructed decision tree models. The models for R at depth 20-40 were validated only on the whole data set, due

59 67 68 62 65 65

67 76 83 69 89 82

0.44 0.63 0.73 0.52 0.80 0.74

49 42 67 49 56 57

M AN U

82 79 80 87 78 86

AC C

EP

TE D

Respiration - R Gradaščica Kamniška Bistrica Kokra Sora Tržiška Bistrica All data Respiratory potential - ETSA Gradaščica Kamniška Bistrica Kokra Sora Tržiška Bistrica All data Total protein content - TPC Gradaščica Kamniška Bistrica Kokra Sora Tržiška Bistrica All data

56 60 59 57 57 59

20-40 cm depth test/CV Cohen’s CCI kappa

SC

5-15 cm depth train test/CV train Cohen’s CCI CCI CCI kappa

RI PT

to limited number of instances (N=18).

53 56 50 44 61 50

69

38

0.07

0.24 0.17 0.52 0.27 0.23 0.33

76 66 74 63 65 68

47 43 61 44 44 60

0.17 0.15 0.40 0.18 0.17 0.39

0.27 0.32 0.28 0.18 0.33 0.25

65 66 61 66 70 70

34 60 22 32 29 55

0 0.25 -0.18 -0.21 0 0.33

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Highlights Multiple stressors effects on hyporheic zone were studied using machine learning.



Biological response in hyporheic zone was well predicted by decision tree models.



Models with respiration as response variable had the highest predictive performance.



Temperature, land use and water quality jointly defined hyporheic zone response.



Models provided new knowledge on interactions among stressors.

AC C

EP

TE D

M AN U

SC

RI PT



ACCEPTED MANUSCRIPT

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

AC C

EP

TE D

M AN U

SC

RI PT

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: