Recalibration and cross-validation of pesticide trapping equations for vegetative filter strips (VFS) using additional experimental data

Recalibration and cross-validation of pesticide trapping equations for vegetative filter strips (VFS) using additional experimental data

Science of the Total Environment 647 (2019) 534–550 Contents lists available at ScienceDirect Science of the Total Environment journal homepage: www...

4MB Sizes 0 Downloads 32 Views

Science of the Total Environment 647 (2019) 534–550

Contents lists available at ScienceDirect

Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

Recalibration and cross-validation of pesticide trapping equations for vegetative filter strips (VFS) using additional experimental data Stefan Reichenberger a,⁎, Robin Sur b, Carolin Kley b, Stephan Sittig a, Sebastian Multsch a a b

knoell Germany GmbH, Konrad-Zuse-Ring 25, 68163 Mannheim, Germany Bayer AG, 40789 Monheim, Germany

H I G H L I G H T S

G R A P H I C A L

A B S T R A C T

• Lack of parsimonious mechanistic approaches for modelling pesticide trapping in filter strips • Sabbagh regression equation needed broader data basis and rigorous evaluation • Experimental data basis has been considerably widened • Suitability of Sabbagh equation for modelling pesticide trapping in vegetative filter strips has been confirmed • Regression-free mass balance approach is viable alternative

a r t i c l e

i n f o

Article history: Received 4 May 2018 Received in revised form 23 July 2018 Accepted 30 July 2018 Available online 02 August 2018 Editor: Ouyang Wei Keywords: Vegetated filter strips Pesticides Mitigation Trapping efficiency Cross-validation DREAM

a b s t r a c t Vegetative filter strips (VFS) are widely used for mitigating pesticide inputs into surface waters via surface runoff and erosion. To simulate the effectiveness of VFS the model VFSMOD is frequently used. While VFSMOD simulates infiltration and sedimentation mechanistically, the reduction of pesticide load in surface runoff by the VFS is calculated with the empirical Sabbagh equation. This multiple regression equation has not been widely accepted by regulatory authorities, because its reliability has not been sufficiently demonstrated yet. A major drawback is the small number of calibration data points (n = 47). To corroborate and improve the predictive capability of the Sabbagh equation, additional experimental VFS data were compiled from the available literature. The enlarged dataset (n = 244) was used to recalibrate the Sabbagh equation, the recently proposed Chen equation and a set of “reduced” Sabbagh equations with fewer independent variables, with ordinary least squares (OLS) regression and to test an alternative, regression-free mass balance approach. The Sabbagh equation fitted the dataset slightly better than the Chen equation (coefficient of determination R2 = 0.82 vs. 0.79). The purely predictive mass balance approach performed slightly worse (Nash-Sutcliffe Efficiency NSE = 0.74), but significantly better than the Sabbagh and Chen equations with their old coefficients. In a k-fold cross validation analysis to assess the predictive capability of the various regression equations, both the full Sabbagh and the reduced Sabbagh equations with two or more variables outperformed the Chen equation. Finally, a maximum-likelihood-based calibration and uncertainty analysis were conducted for the Sabbagh equation using the DREAM_ZS algorithm and two different likelihood functions. The DREAM simulations corroborated the parameter values obtained with OLS regression. The study confirmed the suitability of the Sabbagh equation for regulatory modelling of pesticide trapping in VFS. However, the regression-free mass balance approach turned out to be a viable alternative. © 2018 Published by Elsevier B.V.

⁎ Corresponding author. E-mail address: [email protected] (S. Reichenberger).

https://doi.org/10.1016/j.scitotenv.2018.07.429 0048-9697/© 2018 Published by Elsevier B.V.

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

1. Introduction Modern agriculture often relies on the use of pesticides in order to maintain a high production level. However, a fraction of the pesticide amount applied to agricultural fields can reach surface waters and groundwater via a number of different pathways and can cause harmful effects to the environment (e.g. Flury, 1996). One major input pathway of pesticides into surface waters is surface runoff from agricultural fields (Wauchope, 1996). Total pesticide losses from fields via surface runoff are usually around 0.5% or less of the amount applied, although losses of up to 5% have been observed under worst-case conditions (Wauchope, 1978). There are a number of mitigation measures available to reduce transfer of pesticides to surface waters via surface runoff and erosion (Reichenberger et al., 2007; Catalogne and Le Hénaff, 2016). Of these, the most widely implemented mitigation measures are vegetative filter strips (VFS) (Brown et al., 2012). These are densely vegetated areas designed to intercept surface runoff, often located at the downslope field border (Brown et al., 2012; Muñoz-Carpena et al., 2010). The dense vegetation–soil system reduces pesticide loads in surface runoff by increasing i) infiltration that reduces total surface runoff volume (and dissolved pesticides), ii) surface roughness that reduces surface runoff velocity and produces settling of sediment and sediment-bound pesticides, and iii) contact between dissolved pollutants and soil and vegetation surfaces that enhances their removal from runoff (Muñoz-Carpena et al., 2018; Lacas et al., 2005). The effectiveness of VFS in reducing surface runoff volumes and associated eroded sediment and pesticide loads has been demonstrated in general, but also found very variable (Reichenberger et al., 2007). Both hydrological considerations (e.g. Schulz, 2004) and experimental VFS studies (e.g. Poletika et al., 2009) have shown that the most important factor influencing VFS efficiency for a given runoff event is the hydraulic load, i.e. the volume of incoming surface runoff and rainfall divided by the overflown area of the VFS. Hence, to simulate the efficiency of VFS in reducing surface runoff and pesticide inputs into surface water in a risk assessment context, fixed reduction fractions (e.g. as a function of the VFS length) are not suitable, because they will underestimate VFS efficiency for small runoff events and overestimate it for large ones. Instead, an event-based, dynamic model is needed. The most widely used dynamic, event-based model to simulate the effectiveness of VFS in reducing surface runoff volumes, eroded sediment and pesticide loads is VFSMOD (Muñoz-Carpena and Parsons, 2014). While VFSMOD simulates infiltration and sedimentation mechanistically, the reduction of pesticide load in surface runoff by the VFS (ΔP) is calculated with the empirical multiple regression equation of Sabbagh et al. (2009). This equation uses the following inputs: observed or, in case of VFSMOD, simulated reduction of total inflow (surface runoff volume + precipitation on the VFS; ΔQ) and eroded sediment load (ΔE), absolute surface runoff volume and eroded sediment load entering the VFS, linear adsorption coefficient Kd of the pesticide, and the clay content of the field soil (as a proxy for the clay content of the eroded sediment). For regulatory modelling purposes, i.e. modelling in the context of pesticide authorization, the low data requirements of the Sabbagh equation are a major advantage compared with complex, physically-based pesticide trapping models (e.g. Perez-Ovilla, 2010). However, the Sabbagh equation has not been widely accepted by regulatory authorities, because its reliability has not been sufficiently demonstrated yet. A major drawback is the small number of experimental data points (n = 47) which were used for calibration. Hence, evaluation against additional experimental data is necessary in order to confirm or disprove the suitability of the Sabbagh equation for regulatory modelling of pesticide trapping in VFS. Moreover, Chen et al. (2016) proposed an alternative regression equation with a different structure based on 181 experimental data points. This equation uses fewer independent variables, but has more parameters than the Sabbagh equation due to incorporating interactions between variables. The authors claimed that their approach was superior to the Sabbagh equation both in terms of predictive power and model structure / scientific foundation.

535

The objective of the present study was to corroborate and improve the predictive capability of the Sabbagh equation by i) broadening the underlying experimental database, ii) comparing the performance of the Sabbagh equation with other pesticide trapping equations, notably the Chen equation and an alternative, regression-free mass balance approach (Reichenberger et al., 2017), iii) rigorously testing its predictive capability, and iv) exploring different methods for parameter estimation. 2. Theory and calculations In this section, the three equations mentioned above are described. 2.1. Sabbagh equation The Sabbagh equation for pesticide trapping efficiency with the coefficients fitted by Sabbagh et al. (2009) is given as   ΔP ¼ 24:79 þ 0:54 ΔQ þ 0:52 ΔE−2:42 ln F ph þ 1 −0:89%C

ð1Þ

where ΔP = relative reduction (%) of total pesticide load, ΔQ = relative reduction (%) of total water inflow, ΔE = relative reduction (%) of incoming sediment load, Fph = phase distribution coefficient (ratio of dissolved and particle-bound pesticide mass in inflow), and %C = clay content (%) of field topsoil (as a proxy for the clay content of the eroded sediment). The phase distribution coefficient is given by the following equation: F ph ¼

Qi K d Ei

ð2Þ

where Qi = total water inflow into the VFS (run-on + rainfall + snowwmelt (L)), Ei = incoming sediment load (kg), Kd = linear sorption coefficient (L/kg). Consequently, the Sabbagh equation has five regression parameters and six independent variables: ΔQ, ΔE, Qi, Ei, Kd and %C. If one expresses Kd as a function of generic Koc (linear adsorption coefficient normalized to organic carbon) and the organic carbon content of the field topsoil, the number of independent variables increases to seven. The Sabbagh equation is not applicable to situations with sediment-free run-on (Ei = 0), because then Fph becomes infinity and ΔE becomes NaN (Not a Number). 2.2. Chen equation The Chen equation with the original coefficients fitted by Chen et al. (2016) is given as: ð3Þ

ΔP ¼ 101−ð8:06−0:07 ΔQ þ 0:02 ΔE þ 0:05%C −2:17 Cat þ 0:02 ΔQ Cat−0:0003 ΔQ ΔEÞ

2

where Cat is a categorical variable with Cat = 1 for Koc N 9000 L/kg and Cat = 0 for Koc ≤ 9000 L/kg. Note that for the Chen equation the fit is not performed against ΔP directly, but against the transformed variable (101 − ΔP)0.5. The main idea behind the Chen equation was to increase predictive performance by i) replacing the continuous phase distribution variable Fph with a categorical adsorption variable, and ii) incorporating interactions between explanatory variables (Chen et al., 2016). The equation is not process-based, but purely statistical. The Chen equation has seven regression parameters, but only four independent variables: ΔQ, ΔE, %C and Koc. The actual mass distribution between the liquid and the solid phase of the surface runoff is not taken into account. The Chen equation is not applicable either to situations with sediment-free run-on (Ei = 0), because then ΔE becomes NaN. It has to be noted, though, that Chen et al. (2016) used at least three studies

536

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

with sediment-free simulated run-on for the derivation of their equation: Misra et al. (1996), Seybold et al. (2001), and Klöppel et al. (1997).

Moreover, unlike the other two equations, the mass balance approach can be used in situations with sediment-free run-on (Ei = 0): It can be shown that Eq. (9) collapses to ΔP = ΔQ for Ei = Ef = 0.

2.3. Regression-free mass balance approach 3. Materials and methods The mass balance approach “dilution + constant particle-bound concentration” (Reichenberger et al., 2017) was inspired by the following equation in Muñoz-Carpena et al. (2015): Si ≈ S f ≈ S o

ð4Þ

This equation reflects the assumption that, for the short duration of a typical surface runoff event (minutes to hours), the particle-bound (adsorbed to suspended sediment) pesticide concentration in surface runoff will not change significantly during the event so that the incoming (Si), filter-trapped (Sf) and outgoing (So) particle-bound pesticide concentrations are equivalent (Muñoz-Carpena et al., 2015). This means that neither sorptive exchange of pesticide between surface runoff water and suspended particles nor selective sedimentation (fine particles, which often have higher OM content and thus adsorb pesticides more strongly than coarse particles, settle more slowly than coarse particles) will have a significant effect on particle-bound pesticide concentrations. The mass balance approach of Reichenberger et al. (2017) furthermore assumes instantaneous and complete mixing of incoming runon and incoming rainfall on the VFS: C0 ¼ Ci

Vi Qi

ð5Þ

where Ci = dissolved pesticide concentration in run-on (mg L−1), Vi = incoming run-on volume (L), and C′ = dissolved pesticide concentration after the mixing of run-on and rainfall (mg L−1). This implies the prerequisite that the time lag between rainfall and run-on is short. With the above assumption that the particle-bound concentration does not change significantly during the event, the particle-bound pesticide concentration after mixing S′ (mg kg−1) is given as: S 0 ¼ Si ¼ K d C i

ð6Þ

Assuming that infiltration and sedimentation are the only relevant mechanisms of pesticide trapping (i.e. sorption of pesticides to grass or VFS soil is not taken into account) one can calculate the pesticide mass trapped in the VFS (mf; mg) and the pesticide mass leaving the filter strip (mo; mg) as: 

 m f ¼ min mi ; V f C 0 þ E f S0

ð7Þ

mo ¼ mi −m f

ð8Þ

where min is the minimum operator, Vf = volume of water infiltrated in the VFS (L), Ef = total sediment load retained in the VFS (kg), and mi = total pesticide mass entering the VFS (mg). The approach can be recast as a single equation for ΔP (cf. SI):    ΔQ ΔE min ðV i þ K d Ei Þ; Vi þ K d Ei ΔP 100% 100% ¼ ðV i þ K d Ei Þ 100%

ð9Þ

Eq. (9) expresses ΔP as a function of Vi, Kd, Ei, ΔE and ΔQ. Hence, the mass balance approach “dilution + constant particle-bound concentration” has five independent variables (six if one expresses Kd as a function of generic Koc and the organic carbon content of the field topsoil or the eroded sediment). In contrast to the Sabbagh and Chen equations, there is no dependence on clay content.

3.1. Data search First, a literature search was performed using i) the search engines Google Scholar and ScienceDirect for the years 2007–2016. The following combinations of keywords were used: i) (grassed) buffer (strip), ii) vegetated buffer, iii) (vegetative) filter strip, iv) (vegetated) filter strip, v) bande (enherbée), zone tampon, vi) vfs. In a second step, the literature already available and filed on the first author's PC (PhD theses, peer-reviewed papers, literature reviews, conference material) was searched using the same keywords as above. Subsequently, the datasets used by Sabbagh et al. (2009) for calibration and validation of the Sabbagh equation were reexamined. Finally, a number of VFS studies conducted for regulatory purposes (e.g. White et al., 2016) were examined. 3.2. Suitability check and data handling The identified VFS datasets and their individual data points were thoroughly checked for their suitability for calibration and evaluation of the Sabbagh equation. The following experimental data are necessary in order to test the Sabbagh equation, on an event basis: • • • •

precipitation volume on the VFS volume of surface runoff water leaving the field (or a control plot) mass of eroded sediment leaving the field mass of pesticide leaving the field (measured in both in dissolved and particle-bound phase, or in the unfiltered bulk surface runoff) • volume of surface runoff leaving the VFS • mass of eroded sediment leaving the VFS • mass of pesticide leaving the VFS (in both phases).

Moreover, the equation requires: • Kd of the compound for the field topsoil • clay and organic carbon (OC) content of the field topsoil.

Since the Sabbagh equation uses the phase distribution coefficient Fph (Eq. (2)), the incoming surface runoff must contain eroded sediment, otherwise Fph becomes infinite. Hence, studies with simulated sediment-free run-on (e.g. Krutz et al. (2003)) are not usable for calibrating or evaluating the Sabbagh equation. Individual data points (unique combinations of site, event, treatment and compound) needed to fulfil all of the following usability criteria: 1) measured pesticide concentrations in VFS inflow and VFS outflow available (for both dissolved and particle-bound phase, or in the unfiltered bulk surface runoff) and ≥LOQ (limit of quantification) 2) 0 ≤ ΔQ b 100 (data points with complete infiltration are not usable) 3) observed ΔE existing and ≥0 4) observed ΔP existing and ≥0. In a first step also studies were included in which pesticides had only been analyzed in the dissolved phase (e.g. Syversen and Bechmann (2004)). For these studies, particle-bound concentrations for the individual data points were back-calculated assuming sorption equilibrium. However, the back-calculation introduced too much uncertainty in the “observed” ΔP values. Hence, we finally discarded these studies.

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

The collected data included two studies with snowmelt-induced surface runoff events. However, in the end there were only six data points from Larsbo et al. (2016) that fulfilled the above usability criteria. We therefore excluded snowmelt-induced events from the present study. The calibration and validation data points used by Sabbagh et al. (2009) were thoroughly re-examined. Existing errors were corrected and invalid / not usable data points (cf. Reichenberger and Pires (2014)) were excluded. Finally, after completion of the calibration dataset, Koc values were updated and harmonised, so that for each substance always the same Koc value was used. 3.3. Calibration against the full dataset and prediction The consolidated experimental dataset was used to recalibrate the Sabbagh and Chen equations, as well as a set of “reduced” Sabbagh equations with fewer independent variables. Recalibration of the equations was done by ordinary least squares (OLS) regression and conducted with the Python package Scikit-learn (Pedregosa et al., 2011). Three statistical tests were performed for each fit: the Shapiro-Wilk test for normality as well as the Breusch-Pagan test and the White test for homoscedasticity, i.e. constant variance of errors (cf. Supplementary information). Additionally, the following equations were applied in predictive mode to the consolidated test dataset: i) the regression-free mass balance approach (Reichenberger et al., 2017), ii) the original Sabbagh equation (Sabbagh et al., 2009), iii) the original Chen equation (Chen et al., 2016), and iv) observed ΔQ as a predictor of ΔP. For the calibration, four goodness-of-fit measures were computed: i) r2 (squared Pearson correlation coefficient between observed and fitted ΔP values), ii) the Coefficient of Determination R2, iii) RMSE (root mean squared error), and iv) PBIAS (percent bias; cf. Moriasi et al. (2007). The four measures are given as 0

12

  n  ∑i¼1 Oi −O P i −P B C r ¼ @  2 0:5 2 0:5 A n  n  ∑i¼1 P i −P ∑i¼1 Oi −O 2

Pearson r2 should only be used together with other measures and a visual examination of the prediction. 3.4. Cross-validation analysis The predictive capability of a regression model is usually assessed by testing it against independent data, i.e. data points which were not used for calibration. For instance, Sabbagh et al. (2009) calibrated their equation against 47 data points from 5 studies and tested it against data from 5 other field studies. However, if experimental data are scarce and originate only from a few field studies, it is often advantageous to use all available data points for calibration, in order to capture as much variability in experimental conditions as possible (cf. Chen et al. (2016)). This implies, in turn, that no independent data points are left for model evaluation. In this case, the predictive capability of the regression model can be assessed with e.g. a cross-validation analysis. Hence, since all available 244 data points had been used for calibration and no independent data points were left for testing, we performed a k-fold cross-validation analysis (analogously to Chen et al. (2016)) to assess the predictive capability of the Sabbagh and Chen equations. Analogously to the calibration for the whole test dataset, the crossvalidation analysis was also performed for the “reduced” Sabbagh equations. The basic principles of k-fold cross-validation analysis are explained in the Supplementary information. To ensure comparability between equations the cross-validation analysis was performed for all equations with the same random samples. The k-fold cross validation analysis was conducted with Scikit-learn (Pedregosa et al., 2011). Three predictive accuracy measures were computed: i) predictive r2 (Pearson r2 between the measured and the predicted ΔP values of the validation dataset), ii) predictive squared correlation coefficient Q2 (Consonni et al., 2010), and iii) RMSEP. The predictive squared correlation coefficient Q2 (Eq. (6) in Consonni et al. (2010)) is defined as: hP

i PRESS ðOi −P i Þ2 =nval nval ¼ 1− MSEP i Q ¼ 1− hP  ¼ 1− 2 ncal TSScal variancecal  O − O =ncal i cal i¼1 ncal 2

ð10Þ

537

nval i¼1

ð14Þ

where Oi = observed value of the ith data point, Pi = predicted value for Pn

2

ðO −P i Þ R2 ¼ 1− P i¼1  i  n  2 i¼1 Oi −O sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 i¼1 ðOi −P i Þ RMSE ¼ n PBIAS ¼

Pn i¼1 ðOi −P i Þ 100% P n i¼1 ðOi Þ

ð11Þ

ð12Þ

ð13Þ

where Oi = observed value of the ith data point, Pi = fitted value for the ith data point, O is the mean of the observed data, P is the mean of the fitted data, and n = number of calibration data points. In the case of a linear regression, r2 and R2 are identical. For the predictions, the following measures of predictive accuracy were computed: i) predictive r2 (Pearson r2 between observed and predicted ΔP values), ii) the Nash-Sutcliffe coefficient of efficiency NSE (Nash and Sutcliffe, 1970), iii) RMSEP (root mean squared error of prediction), and iv) PBIAS. NSE and RMSEP are computed identically as R2 (Eq. (11)) and RMSE (Eq. (12)), respectively. However, while R2 and RMSE compare observed and fitted values, NSE and RMSEP compare observed and predicted values. It has also to be noted that Pearson r2 only evaluates linear relationships between observed and predicted data, not how well the observations are matched (cf. Legates and McCabe (1999)): One can have r2 = 1 and be way off the 1:1 line. Hence,

the ith data point, O is the mean of the observed data, ncal = number of calibration data points, nval = number of validation data points, PRESS = sum of squares of the prediction errors, TSScal = total sum of squares of the calibration data, and MSEP = mean squared error of prediction. Q2 can assume values between 1 and −∞. It is structurally similar to the Nash-Sutcliffe coefficient NSE (Nash and Sutcliffe, 1970) which is often used for evaluating the performance of hydrological models. However, in Q2 the numerator and the denominator refer to two different samples (the validation and the calibration dataset, respectively), while in the NSE numerator and denominator refer to the same sample. While Q2 and RMSEP evaluate the ability of the predicted values to reproduce the observations, predictive r2 only evaluates the correlation between predicted and observed values. Hence, Q2 and RMSEP are more relevant indicators of predictive accuracy than predictive r2. The cross-validation analysis was implemented as follows: 1) The full dataset was randomly divided into k subsets; one subset was defined as the validation data, and the k-1 others as calibration data. This was done using the function RepeatedKfold of Scikit-learn (http://scikit-learn.org/stable/modules/cross_ validation.html#repeated-k-fold). The procedure was repeated for 6 different values of k and m = 50 iterations each. The resulting (2 + 4 + 6 + 8 + 10 + 12) × 50 = 2100 pairs of calibration and validation datasets were stored in a .csv file. The .csv file was used for all equations. 2) For each of the 2100 pairs of calibration and validation datasets, the Sabbagh equation was fitted against the calibration dataset (k − 1

538

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

subsets lumped together). The values, standard errors and p-values (of the T statistic) of the fitted regression coefficients were stored. The fitted regression coefficients were applied to predict ΔP for the validation dataset. For each pair, a number of indicators were calculated (Table S11) and a plot of observed vs. predicted ΔP was generated for the validation dataset. 3) Same procedure as in 2) for the “reduced” Sabbagh equations 4) Same procedure as in 2) for the Chen equation. Since for the Chen equation the fit is not performed against observed ΔP directly, but against the transformed variable (101 − ΔP)0.5, a backtransformation of observed and predicted values to ΔP has to be performed before calculating predictive r2, sums of squares and derived indicators (MSEP, RSMEP, Q2). 5) For each equation, the indicators calculated for the 2100 pairs were averaged over two different levels a) over the k folds of each iteration → 6 × 50 = 300 records b) over the k folds and m iterations for each value of k → 6 records. 3.5. Calibration and estimation of confidence intervals with DREAM The easiest and most common technique to fit a linear regression model is ordinary least squares (OLS) regression. However, OLS regression relies on some assumptions on the errors of the observed values (error = deviation between observed value and the unobservable true value of the quantity of interest) which are not always met, notably i) normal distribution, and ii) homoscedasticity (homogeneity of variance). Normally distributed errors are not required for the OLS fit itself, but for the subsequent statistical tests: Nonnormally distributed errors will lead to invalid significance levels and confidence intervals. If the errors are heteroscedastic (i.e. not homoscedastic), this means that the regression coefficients obtained with OLS will not be the best estimators (i.e. the ones with the lowest sum of squared residuals). Hence, in order to check the results of the OLS regression and in order to obtain reliable confidence intervals for the model parameters, a maximum-likelihood-based calibration and uncertainty analysis were performed for the Sabbagh equation using the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm implemented in MATLAB, in its variant DREAM_ZS (Vrugt, 2016). DREAM is a multi-chain Markov Chain Monte Carlo (MCMC) simulation algorithm which yields posterior distributions of the calibrated parameters according to Bayesian inference (Vrugt, 2016). DREAM_ZS was run for the consolidated test dataset (n = 244, with 10,000 iterations each) with two different likelihood functions: a) a simple log-likelihood function (LL) based on squared residuals; this likelihood function assumes homoscedasticity (homogeneity of variance) of total model residuals; b) a generalized likelihood function (GL) which allows for heteroscedasticity (i.e. non-homogeneity of variance) of total model residuals (Schoups and Vrugt, 2010). In the case of nonuniqueness (equifinality), two DREAM simulations may not yield exactly the same best parameter sets. Therefore 10 DREAM simulations were performed for each variant, with 10,000 iterations each. Out of the 10 best parameter sets (i.e. the ones with the highest likelihood) obtained with the 10 DREAM simulations, the best one was selected. In the generalized likelihood function of Schoups and Vrugt (2010) heteroscedasticity is explicitly accounted for by assuming that the standard deviation of the total model residuals (consisting of measurement errors, input errors and model structural errors) increases or decreases linearly with the measured variable (in our case ΔP): σ t ¼ σ 0 þ σ 1 ΔP

ð15Þ

where σ0 and σ1 are parameters to be inferred from the data. The statistical distributions of the parameters σ0 and σ1 were estimated simultaneously with the model parameters in the 10,000 model iterations. The boundaries of the uniform prior parameter distributions were simply chosen as sufficiently wide and are given in Table 1. They also apply to the estimated posterior distributions. Convergence was monitored with the Gelman-Rubin convergence diagnostic, indicating convergence for a value b 1.2 in all 3 Markov chains (Gelman and Rubin, 1992; Vrugt et al., 2009). Convergence was reached after ca. 1800 simulations for LL and ca. 2000 simulations for GL. The last 2000 of the 10,000 simulations, i.e. after convergence, were used for calculating the posterior distributions of model parameters and data points. Confidence intervals for the model parameters were calculated as the 5th and 95th percentiles of the posterior distribution of each parameter. Predictive uncertainty estimates for the individual data points were calculated as the 2.5th and 97.5th percentiles of the posterior distribution of simulated ΔP for each data point. For comparison, DREAM_ZS was also run for the “reduced” Sabbagh equations with two or more independent variables (LL option only). 4. Results and discussion 4.1. Data collection Many datasets had to be discarded for one of the following reasons: a) suspended sediment loads had not been measured, e.g. because quantities were too small for a reliable quantification (e.g. Otto et al. (2008), Otto et al. (2012), Lacas et al. (2012)), or b) suspended sediment loads had been measured, but not reported (e.g. Webster and Shaw (1996), Caron et al. (2010), Lafrance et al. (2013)), or c) sediment-free simulated-run-on was used (e.g. Seybold et al. (2001), Mersie et al. (2003), Misra et al. (1996), Krutz et al. (2003)), or d) pesticides were only analyzed in the dissolved phase (e.g. Syversen and Bechmann (2004)). The consolidated test dataset included 244 data points from 14 studies and 13 different experimental sites (Table 2). Out of these 244 data points, 34 and 103 data points had already been used by Sabbagh et al. (2009) for calibration and validation, respectively (13 calibration and 27 validation data points of Sabbagh et al. (2009) were considered invalid by us and removed). The other 107 data points were collected by us from the available literature. Among the 13 sites, 6 were from the United States, 4 from France, 2 from Germany and one from Australia; 11 sites originated from a temperate and 2 from a subtropical climate. The ranges of clay and organic matter contents of the field soils were 11–48% and 1.4–7%, respectively. Most of the soils had a high silt Table 1 Boundaries of the uniform prior distributions of the investigated parameters of the Sabbagh equation. Regression parameter

Corresponding variable

Parameter name

Minimum valuea

Maximum valuea

Unit

Variable name

Unit

a (intercept) b c d

−50 0.1 0.1 −5

100 5 5 20

% – – %

– % % –

e σ0 (heteroscedasticity intercept) σ1 (heteroscedasticity slope)

−5 0 −1

20 100 1

– % –

– ΔQ ΔE ln (Fph + 1) %C – ΔP

% – %

a The boundaries of the prior parameter distributions were simply chosen as sufficiently wide.

Bignan Mead, Nebraska

AUS

FR USA

DE

USA

Popov et al. (2006)b, Popov (2005) Réal (1997)e Schmitt et al. (1999)a

Spatz (1999)

Tingle et al. (1998)b

g

f

e

d

c

b

a

St. Paul, Minnesota

Brooksville, Mississippi

Klein-hohenheim, BW

Surface runoff generation

Simulated run-on + simulated rainfall on VFS Simulated run-on, no rainfall on VFS

Natural rainfall

1994–1996 Simulated rainfall on source area, not on VFS 2015 Simulated run-on + simulated rainfall on VFS

1994

1994–1995 Natural rainfall 1996 Simulated run-on, no rainfall on VFS

2004?

1995

1993–1994 Natural rainfall 1995 Simulated run-on, no rainfall on VFS 1991 Simulated rainfall on source area, not on VFS 1999 Natural rainfall 1997 Natural rainfall 1994 Natural rainfall 1994 Natural rainfall 1995 Natural rainfall 1997–1999 Natural rainfall

Period

Chemicals

4.0

4.6

20.1 3.0–6.0 6.0–18.0 6.0–18.0 6.0–18.0 6.0–12.0

Atrazine, metolachlor

Atrazine, chlorpyrifos

Acetochlor, atrazine, chlorpyrifos Diflufenican, isoproturon Atrazine, DEA, DIA, lindane Atrazine, DEA, DIA, lindane Diflufenican, isoproturon Metolachlor, pendimethalin, terbuthylazine

20.1 Atrazine, cyanazine, metolachlor 20.1 Atrazine, chlorpyrifos, metolachlor 4.6–13.7 Atrazine

m

Buffer strip lengths VL

Grass

7.6–15.2 Tebuconazole, trichlorfon equivalents

Grass 12.0 Diflufenican, isoproturon Grass; grass + shrub + 7.5–15.0 Alachlor, atrazine, permethrin tree Grass 5.0–20.0 Desethyl-terbuthylazine, pendimethalin, terbuthylazine Grass 0.5–4.0 Metolachlor, metribuzin

Grass

Grass Grass Grass Grass Grass Grass; untreated field margin Grass

Grass Grass Grass

Buffer strip type

Used by Sabbagh et al. (2009) for calibration. Used by Sabbagh et al. (2009) for validation. The same experimental site and device was used in all three studies. Data do not refer to single events, but combine several events that occurred in one season: Bignan: 3 events; La Jaillière: 3 events; Plélo: 8 events. Data belong to the same experimental site as the data published in Patty et al. (1997), but to a later period. In the case of simulated run-on, this corresponds to the soil material that was added to the run-on as suspended sediment. One data point is a unique combination of site, event, treatment and substance; replicates have been averaged.

White et al. (2016); Rice et al. USA (2017)

Liverpool plains, NSW

USA

Poletika et al. (2009)

Sioux County, Iowa

USA FR FR FR FR DE

Boyd et al. (2003)c Lecomte (1999) Patty et al. (1997)b,d Patty et al. (1997)b,d Patty et al. (1997)b,d Pätzold et al. (2007)b

Ames, Iowa Ames, Iowa Spindletop research farm, Kentucky Ames, Iowa Bourg Dun Bignan La Jaillière Plélo Velbert-Neviges, NRW

USA USA USA

b,c

Country Site

Arora et al. (1996) Arora et al. (2003)a,c Barfield et al. (1998)a

Reference

Table 2 Overview of studies and sites comprising the consolidated dataset.

3.3

3.2

1.4

7 3.0

6.0

4.4

5.2 1.6 7 2 3 2.9

5.2 5.2 3.1

%

OM content of field topsoilf

12

48

24

16 30

40

29

21 11 16 20 12 25

21 21 20

%

Clay content of field topsoilf

10

20

31

9 18

12

8

6 43 10 8 6 14

33 6 10

Nb usable data pointsg S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550 539

540

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

Table 3 Substances included in the 244 calibration data points, and Koc values used during calibration and cross-validation. Compound

Number of VFS data points

Koc (L/kg)

Source

Remarks

Acetochlor Alachlor Atrazine Chlorpyrifos Cyanazine Desethyl atrazine (DEA) Desethyl terbuthylazine Deisopropyl atrazine (DIA) Diflufenican Isoproturon Lindane Metolachlor Metribuzin Pendimethalin Permethrin Tebuconazole Terbuthylazine Trichlorfon equivalents

2 6 46 8 11 5 2 5 29 29 3 35 10 18 6 5 19 5

204 140 174 8151 190 110 78 130 3186 81 1270 226.1 37.92 13,792 73,000 769 231 7.4

EFSA (2011a) ARS-PPDB (2006) Lewis et al. (2016) European Commission (2005) Lewis et al. (2016) Lewis et al. (2016) EFSA (2011b) Lewis et al. (2016) EFSA (2007) EFSA (2015) Lewis et al. (2016) European Commission (2004) Lewis et al. (2016) EFSA (2016) ARS-PPDB (2006) EFSA (2014) EFSA (2011b) Murphy and Arthur (2011)

Kfoc, arithmetic mean (n = 9) Kdoc, recalculated arithmetic mean (n = 13) Kfoc, mean (?) from literature data (n = 13) Kfoc or Kdoc, arithmetic mean (n = 14) Kdoc; could not be traced further back Kdoc; could not be traced further back Kfoc, arithmetic mean (n = 9) Kdoc; could not be traced further back Kfoc, median (n = 10) Kfoc, arithmetic mean and median (n = 9) Kdoc; could not be traced further back (EU dossier) Kfoc; median (n = 9)a Kfoc, arithmetic mean from EU dossier (n = 7) Kfoc, arithmetic mean (n = 9) Kdoc, recalculated arithmetic mean (n = 7) Kfoc, arithmetic mean (n = 9) Kfoc, arithmetic mean (n = 9) Kfoc, arithmetic mean (n = 5)

a

Data from S-metolachlor.

content, reflecting the susceptibility of silty soils to infiltration excess runoff. In summary, the 13 field sites seem sufficiently representative for VFS in temperate climates, but not for VFS in tropical climates. Most experiments were conducted in the 1990s, with only two exceptions (Popov et al., 2006; White et al., 2016). The test dataset contained 18 different substances with a Koc range of 7.4–73,000 L/kg (Table 3). There were 46 data points for substances with a Koc ≤ 100 L/kg, 134 data points for 100 L/kg b Koc ≤ 1000 L/kg, 40 data points for 1000 L/kg b Koc ≤ 10,000 L/kg, and 24 data points for Koc N 10,000 L/kg. Incoming run-on volumes normalized to VFS area totalled between 0.05 and 5500 mm (median = 36 mm), while incoming total inflow (run-on + rainfall) ranged from 5 to 5600 mm (median = 73 mm). Eroded sediment inputs normalized to VFS area ranged from 10−7 to 2 10 kg m−2 (median = 0.14 kg m−2) in the consolidated test dataset. Hence, although several studies had to be discarded because of missing or zero eroded sediment input, the dataset can be considered sufficiently diverse with regard to eroded sediment inputs. As already mentioned in Section 3.2, one data point is defined as a unique combination of site, treatment (e.g. type of VFS, VFS length and width, source/strip ratio), event and substance. Consequently, if several compounds had been monitored simultaneously for one site/event/ treatment combination, the same combination entered the test dataset several times. This leads to a higher weight for the hydrology of the affected site/event/treatment combinations. However, we consider it rather an advantage to have data for several compounds for the same site/event/treatment/combination, because this allows separating the effect of compound properties on trapping efficiency from other factors. 4.2. New regression coefficients The following new regression coefficients were obtained with OLS for the Sabbagh equation based on 244 data points (for further details of the fit, cf. Supplementary information):   ΔP ¼ −11:5142 þ 0:5949 ΔQ þ 0:4892 ΔE−0:3753 ln F ph þ 1 þ 0:2039%C ð16Þ Compared with the original formulation of Sabbagh et al. (2009) (Eq. (1)), the regression coefficients have changed substantially: i) the intercept is now negative, but closer to zero than before; ii) the influence of the phase distribution term ln(Fph + 1) has strongly decreased; and iii) the effect of the clay content of the field topsoil is much weaker now than before, but positive. There is no immediate explanation why the pesticide trapping efficiency should increase with increasing clay

content of the field topsoil. Probably the positive value of the coefficient reflects higher-order interactions of clay content with other variables. The Shapiro-Wilk-test indicated non-normally distributed residuals, while both White and Breusch-Pagan tests indicate heteroscedasticity (cf. Section 10.3.2 in the SI). As a consequence of heteroscedasticity, the coefficients obtained with OLS will still be unbiased, but not be the best possible estimators (i.e. yielding the fit with the minimum squared residuals), because the Gauss-Markov theorem does not apply any more. Hence, the predictive capability of the Sabbagh equation could possibly further be increased by using other calibration methods, e.g. heteroscedasticity-consistent standard errors or maximum-likelihoodbased methods where heteroscedasticity is accounted for in the likelihood function (cf. Section 3.5). Leaving out one, two or three of the less important independent variables yielded the following reduced Sabbagh equations: ΔP ¼ −12:6211 þ 0:5763 ΔQ þ 0:4862 ΔE þ 0:2305%C 

ð17Þ

 ΔP ¼ −6:2680 þ 0:5728 ΔQ þ 0:5071 ΔE−0:4611 ln F ph þ 1

ð18Þ

ΔP ¼ −6:8053 þ 0:5456 ΔQ þ 0:5063 ΔE

ð19Þ

ΔP ¼ 17:0577 þ 0:8046 ΔQ

ð20Þ

Leaving out the phase distribution variable ln(Fph + 1) led to only very little change in the regression coefficients for the remaining variables and the intercept (Eq. (17)) compared with the full Sabbagh equation (Eq. (16)). It has to be noted that Eq. (19) has some structural similarity to the mass balance approach (Eq. (9)), which can also be written as a function of ΔQ and ΔE. For the Chen equation, the following new regression coefficients were obtained with OLS based on the 244 data points: ΔP ¼ 101−ð10:441−0:0165 ΔQ −0:0062 ΔE−0:0179%C

ð21Þ

2

−1:7045 Cat þ 0:0184 ΔQ Cat−0:0006 ΔQ ΔEÞ

Compared with the original regression by Chen et al. (2016), the influence of most factors has decreased, while the regression coefficient of ΔQ ΔE has doubled. Moreover, the counteracting effect of the two terms containing ΔE has disappeared. While in the original fit p-values for all regression coefficients were b0.05, now p-values for ΔQ, ΔE and ΔQ Cat are N0.05. Together with the warning from Scikit-learn about potential multicollinearity (cf. SI) the non-significant regression coefficients for ΔQ, ΔE and ΔQ Cat indicate overparameterisation of the Chen equation, which in turn will impair its predictive capability.

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

541

In contrast to the test result for the Sabbagh equation, for the Chen equation the White test suggested that homoscedasticity is given (cf. Section 10.3.2 in the Supplementary information).

4.3. Performance of the different equations In this section the performances of the recalibrated Sabbagh and Chen equations, the mass balance approach and the other equations mentioned in Section 3.3 for the consolidated test dataset are compared. The goodness of fit of the calibrated equations as well as the predictive performance of the mass balance approach and other predictively applied equations are summarized in Table 4. Ordinary least squares (OLS) regression against the full dataset yielded a coefficient of determination R2 = 0.82 for the Sabbagh equation and R2 = 0.79 for the Chen equation (Table 4). For the reduced Sabbagh equations with 2 or 3 independent variables, R2 lay between the values for the full Sabbagh and the Chen equation. Applying the regression-free mass balance approach to the test dataset yielded a NSE (mathematically equivalent to the R2 for the fitted equations) of 0.74. Using the relative reduction of total water inflow ΔQ as predictor for ΔP still yielded NSE = 0.65. Applying the Sabbagh and Chen equations with the old coefficients in predictive mode yielded only NSE = 0.53 and 0.56, respectively. The indicator RMSE(P) yielded the same performance ranking as R2 and NSE. With regard to percent bias (PBIAS), the new Sabbagh equation and the reduced Sabbagh equations with 3 independent variables performed best with almost zero bias. The mass balance approach showed a slight underestimation of ΔP (PBIAS = 1.49), while the new Chen equation showed a slight overestimation (PBIAS = −1.59). Applying the Chen equation with the old coefficients yielded the largest bias (PBIAS = −6.549). Visual examination of ΔP measured vs. ΔP fitted/predicted for the Sabbagh and Chen equations and the mass balance approach (Fig. 1) confirms the general performance ranking from the calculated measures: All three equations match measured ΔP relatively well, but scatter increases in the order Sabbagh new b Chen new b mass balance approach.

Fig. 1. Measured ΔP vs. ΔP fitted with the Sabbagh and Chen equations (OLS regression) and ΔP predicted with the mass balance approach for the full test dataset (n = 244).

The visual fit of the reduced Sabbagh equations with three or two variables (Eqs. (17)–(19)) is only slightly worse than for the full equation (Fig. 2). However, leaving out the phase distribution variable ln (Fph + 1) led to more pronounced positive outliers. The variant with only ΔQ as independent variable (Eq. (20)) yielded a significantly worse fit than the other reduced equations and generally overestimated ΔP in the low range of measured values. In Fig. 3 observed ΔP, ΔP refitted with the Sabbagh and Chen equations and ΔP predicted with the mass balance approach are plotted against observed ΔQ on the x axis. It turns out that for the mass balance approach, predicted ΔP was often very close to ΔQ, which means that Kd × Ei was ≪Vi (in other words, the particle-bound pesticide mass in the run-on entering the VFS was negligibly small compared with the

Table 4 Performance of the various pesticide trapping efficiency equations for the consolidated test dataset (n = 244). Calibration Equation

Pearson r2a

Coefficient of determination R2b,c

RMSEd

PBIASe

Sabbagh Sabbagh reduced (ΔQ, ΔE, clay) Sabbagh reduced (ΔQ, ΔE, ln(Fph + 1)) Sabbagh reduced (ΔQ, ΔE) Sabbagh reduced (ΔQ only) Chen

0.819 0.815 0.812 0.806 0.706 0.798f

0.819 0.815 0.812 0.806 0.706 0.793f,g

10.132 10.245 10.330 10.502 12.914 10.846f

0.00094 0.00634 0.00091 0.00565 0.00456 −1.588f

Prediction Equation

Pearson r2a,h

Nash-Sutcliffe coefficient (NSE)c

RMSEPd

PBIAS

Mass balance approach Old Sabbaghi Old Chenj ΔQ as predictor of ΔP

0.765 0.576 0.621 0.706

0.741 0.528 0.559 0.652

12.128 16.371 15.821 14.058

1.493 2.136 −6.549 3.531

a

Between fitted/predicted and observed ΔP. For linear regression, R2 is equal to Pearson r2. c R2 and NSE are computed identically. However, NSE compares predicted and observed values, while R2 compares fitted and observed values. d Root mean squared error RMSE and root mean squared error of prediction RSMEP are computed identically. However, RMSEP compares predicted and observed values, while RMSE compares fitted and observed values. e Percent bias; cf. Moriasi et al. (2007). Positive values indicate model underestimation bias, and negative values indicate model overestimation bias. f After back-transforming (101 − ΔP)0.5 to ΔP. g Here, r2 and R2 are not the same. The reason is that the linear regression was not performed against ΔP, but against a transformed variable. h Pearson r2 only evaluates linear relationships between observed and predicted data, not how well the observations are matched (cf. Legates and McCabe (1999)). i Eq. (1) applied predictively. j Eq. (3) applied predictively. b

542

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

Fig. 2. Measured ΔP vs. ΔP fitted with the full and the four reduced Sabbagh equations (OLS regression) for the full test dataset (n = 244).

dissolved pesticide mass). However, ΔP predicted with the mass balance approach was also substantially higher than ΔQ for a number of data points with ΔQ between 30 and 70% and high Kd (N100 L/kg). Observed ΔP showed the largest scatter around the 1:1 line, while ΔP for the refitted Sabbagh equation stayed within 20% of ΔQ with only a few exceptions. It becomes evident from Fig. 3 that observed ΔQ is not a worst-case predictor of ΔP. This is because ΔQ denotes the reduction of total inflow, i.e. including rainfall. Hence, for surface runoff events caused by natural rainfall observed ΔQ can be substantially higher than observed ΔP. Fig. 4 shows ΔP predicted with the mass balance approach and ΔQ on the y axis against observed ΔP on the x axis. That is, for each observed data point one can see the contribution of ΔQ to ΔP predicted with the mass balance approach. In the vast majority of cases where predicted ΔP differed from ΔQ, the additional consideration of sedimentation in the mass balance approach led to a better prediction of ΔP than considering only infiltration (using ΔQ as a predictor of ΔP).

Fig. 4. Measured pesticide trapping efficiency ΔP vs. ΔP predicted with the mass balance approach and measured ΔQ for the full test dataset (n = 244).

4.4. Outlier analysis To obtain a better understanding about the applicability limits of each equation, an outlier analysis was performed for the refitted Sabbagh and Chen equations and the mass balance approach. For each equation, all data points with an absolute deviation between fitted (or predicted, respectively) ΔP and measured ΔP of N20% (cf. Table 5 and Fig. 1) were identified and examined. A short summary is given below. For more detailed information the reader is referred to the Supplementary Information. The Sabbagh equation was the least outlier-prone of the three equations (cf. Table 5). All 10 positive outliers (overestimation of pesticide trapping efficiency ΔP by N20%) could be explained by uncertainty in the experimental data. For the 8 negative outliers (underestimation of ΔP) at least potential explanations were found (e.g. large uncertainty in measured ΔE due to very low eroded sediment loads).

Fig. 3. Measured reduction of total water inflow ΔQ vs. measured pesticide trapping efficiency ΔP, ΔP fitted with the Sabbagh and Chen equations and ΔP predicted with the mass balance approach for the full test dataset (n = 244).

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550 Table 5 Number of outliers for the three equations applied to the consolidated test dataset (n = 244).

ΔP overestimated by N20%c ΔP underestimated by N20% Max. overestimation of ΔP (%)c Max. underestimation of ΔP (%) a b c

Sabbagh eq.a

Chen eq.a

Mass balance approachb

10 8 36.4 33.6

13 6 44.0 36.1

11 16 53.9 37.6

Fit (ordinary least squares regression). Independent prediction. % is the unit of ΔP.

The Chen equation exhibited substantially more positive (13) than negative outliers (6). It was found less robust with respect to measurement errors and less able to fit data points with unusual values or combinations of values of the independent variables than the Sabbagh equation. The mass balance approach was the most outlier-prone of the three equations (Table 5). However, it should be noted that the application of this simplified mechanistic approach to the test dataset was purely predictive, while the other two equations were fitted to this dataset. The mass balance approach showed more negative (16) than positive (11) outliers. The mass balance approach relies on three key assumptions (cf. Section 2.3): i) instantaneous and complete mixing of incoming run-on and incoming rainfall on the VFS (which requires that the time lag between rainfall and run-on is short); ii) constant particle-bound pesticide concentration in surface runoff during the event (cf. Muñoz-Carpena et al., 2015); and iii) infiltration and sedimentation are the only relevant pesticide trapping mechanisms in the VFS, i.e. sorption of dissolved pesticide to soil or plant material in the VFS is negligible. Consequently, the equation will probably not perform well if one or more of these assumptions are not met. Moreover, in cases where Kd × Ei is not ≪Vi, the mass

543

balance approach will be susceptible to uncertainties in Kd (e.g. if the local Koc is unknown and different from the generic Koc), Ei and ΔE (due to measurement errors of eroded sediment loads or differences between control plots and plots with VFS). The outlier analysis corroborated these presumptions. However, also the Sabbagh and Chen equations are susceptible to uncertainties in ΔE.

4.5. Cross-validation In the following only the main outcomes of the cross-validation analysis are presented. Detailed results are given in the Supplementary information. The main outcomes are summarized briefly in the following. For both Sabbagh and Chen equations, arithmetic mean and median regression coefficients over the 2100 individual tests were very similar to those obtained by OLS fitting to the whole dataset (cf. Table S12 vs. Eq. (16); Table S13 vs. Eq. (21), respectively). The standard deviations and coefficients of variation (CV) of the regression coefficients in Tables S12 and S13 also give an indication to what extent the regression coefficients may change if new experimental data are added to the full dataset. For the Sabbagh equation, CV were rather small for the coefficients of the two most important variables ΔQ (3.9%) and ΔE (6.7%). The largest CV was observed for the coefficient of the phase distribution term ln(Fph + 1) (26%). This suggests that the coefficients of the Sabbagh equation (and thus predicted ΔP values) are not likely to change substantially if new data points are added to the calibration dataset. For the Chen equation, coefficients of variation were on average larger (2.6%–66%) than for the Sabbagh equation. For instance, the coefficient of the key variable ΔQ ΔE had a CV of 16.7%. It also needs to be noted that changes in the regression coefficients of the Chen equation do not propagate linearly to ΔP values (cf. Eq. (21)). In conclusion, predictions with the Chen equation are likely to change substantially more strongly than predictions with the Sabbagh equation if new data are added to the calibration dataset.

Fig. 5. Example plot for the Sabbagh equation (results for 1 of 2100 individual cross-validation tests). The graph shows observed vs. predicted ΔP values for the validation dataset, while the window shows the results of the calibration.

544

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

Fig. 6. Example plot for the Chen equation (results for 1 of 2100 individual cross-validation tests). The graph shows observed vs. predicted ΔP values for the validation dataset, while the window shows the results of the calibration. Measured and predicted values were back-transformed from (101 − ΔP)0.5 to ΔP before plotting.

Also for the reduced Sabbagh equations, mean and median regression coefficients over the 2100 individual tests were similar to those obtained by OLS fitting to the whole dataset (cf. Table S14 vs. Eqs. (17)– (20)). CV of the regression coefficients of the independent variables and standard deviations of the intercept were similar to those obtained for the full Sabbagh equation. 2100 plots were produced for each equation. As examples the plots for the first of the 2100 individual tests for the Sabbagh and Chen equations are shown in Figs. 5 and 6, respectively. Mean and median values (over the 2100 individual tests) of goodness-of-fit indicators for the calibration (r2, adjusted r2) and predictive accuracy measures (Q2, predictive Pearson r2, RMSEP) for the validation are given in Table 6 for the Sabbagh and Chen equations and in Table 7 for the reduced Sabbagh equations. Similarly to the performance evaluation for the whole test

Table 6 Performance indicators averaged over the 2100a individual cross-validation tests. Indicator

Sabbagh equation

Chen equation

Difference Sabbagh − Chen

Mean

Median

Mean

Median

Mean

Median

Calibration Pearson r2 Adjusted r2

0.820 0.816

0.820 0.816

0.792b 0.786b

0.792b 0.785b

0.027 0.030

0.028 0.030

Prediction Q2 Pearson r2 RMSEP

0.806 0.799 10.286

0.813 0.815 10.307

0.766c 0.771c 11.212c

0.777c 0.788c 11.289c

0.040 0.029 −0.926

0.032 0.024 −0.851

a

6 different levels of k (k = 2, 4, 6, 8, 10, 12); 50 iterations per level of k. These measures were calculated during the linear regression and therefore refer to the transformed variable (101 − ΔP)0.5. c The predictive accuracy indicators for the Chen equation were calculated after backtransforming (101 − ΔP)0.5 to ΔP. b

dataset (cf. Section 4.3) Q2 (Eq. (14)) and RMSEP are more relevant indicators of predictive accuracy than Pearson r2. For all calculated indicators, mean and median values were better for the Sabbagh equation than for the Chen equation. This means that on average, the Sabbagh equation fitted the calibration subsets and predicted the validation subsets better than the Chen equation. The reduced Sabbagh equation with the variables ΔQ, ΔE and clay content performed slightly worse in the calibration than the full Sabbagh equation, but equally well in the prediction. All reduced Sabbagh equations with more than one independent variable performed better in both calibration and prediction than the Chen equation. For the Sabbagh equation, homoscedasticity was not given for most of the 2100 regressions performed (cf. Table S12). Consequently, the regression coefficients obtained with ordinary least squares regression will not be the best possible estimators (i.e. with the lowest sum of squared residuals). The transformation of ΔP to (101 − ΔP)0.5 used in the Chen equation significantly reduced heteroscedasticity (mean and median p values of the White test were N0.05; cf. Table S13). However, despite stronger heteroscedasticity, the Sabbagh equation showed consistently higher predictive accuracy than the Chen equation (Table 6). At the level of individual iterations (unique combinations of number of folds k and iteration ID m), the Sabbagh equation performed better than the Chen equation for all 300 combinations of k and m, both for calibration and for validation (cf. Tables S15 and S16). Moreover, on average, PRESS (sum of squares of the prediction errors) had a lower mean, standard deviation and coefficient of variation over the k folds (1 fold = 1 possible realization of arranging k random subsets into a calibration dataset of k-1 subsets and a test dataset of 1 subset) of an iteration for the Sabbagh equation than for the Chen equation (Table S17). That is, the predictions of the Sabbagh equation were not only better, but also more consistent between folds than those of the Chen equation.

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

545

Table 7 Performance indicators averaged over the 2100a individual cross-validation tests for the reduced Sabbagh equations. Reduced Sabbagh variants Indicator

ΔQ, ΔE, clay

ΔQ, ΔE, ln(Fph + 1)

ΔQ, ΔE

ΔQ

Mean

Median

Mean

Median

Mean

Median

Mean

Median

Calibration Pearson r2 Adj. r2

0.815 0.813

0.815 0.813

0.812 0.810

0.813 0.810

0.806 0.804

0.806 0.804

0.706 0.705

0.706 0.704

Prediction Q2 Pearson r2 RMSEP

0.806 0.798 10.279

0.814 0.814 10.310

0.800 0.793 10.443

0.808 0.809 10.426

0.798 0.789 10.487

0.806 0.805 10.510

0.698 0.702 12.811

0.715 0.716 12.715

a

6 different levels of k (k = 2, 4, 6, 8, 10, 12); 50 iterations per level of k.

At the level of the different values of k (number of folds), the Sabbagh equation was found to perform better than the Chen equation for all 6 values of k, both for calibration (Table S18) and for validation (Table S20). Moreover, for each value of k, PRESS had a lower mean, standard deviation and coefficient of variation over the 50 iterations for the Sabbagh equation than for the Chen equation (Table S22). Thus, the predictions of the Sabbagh equation were not only better, but also more consistent between iterations than those of the Chen equation. The results of the cross-validation analysis can be summarized as follows: • Despite stronger heteroscedasticity (heterogeneity of variance of errors), the Sabbagh equation clearly outperformed the Chen equation in terms of predictive accuracy. • The reduced Sabbagh equation with the variables ΔQ, ΔE and clay content performed slightly worse in the calibration than the full Sabbagh equation, but equally well in the prediction. This suggests that the additional independent variable ln(Fph + 1) present in the full Sabbagh equation does not improve predictive capability. • All reduced Sabbagh equations with more than one independent variable performed better in both calibration and prediction than the Chen equation. This means that even with only ΔQ and ΔE as independent variables one can achieve good predictive accuracy. • Predictive accuracy was not only better, but also more consistent between folds and between iterations for the Sabbagh than for the Chen equation. • It remains to be examined more closely which number of folds (k) is best suited for cross-validation analyses like the present one. • If new data points are added to the calibration dataset, the coefficients of the Sabbagh equation (and thus predicted ΔP values) are not likely to change substantially. Larger changes in regression coefficients and predictions are to be expected for the Chen equation.

Table 8 Results of the parameter estimation (log-likelihood function) for the Sabbagh equation using the DREAM_ZS algorithm. Parameter a (intercept) b c d e

Independent variable

Best estimatora

Lower bounda,b

Upper bounda,b

ΔQ ΔE ln(Fph + 1) %C

−11.36603 0.59606 0.48860 −0.36398 0.19723

−16.83219 0.52303 0.41960 −0.63749 0.087747

−5.67267 0.65841 0.56165 −0.084400 0.31024

a Results are given for the run with the highest likelihood of the best parameter set out of ten trials. b 90-% confidence intervals of the last 20% of the parameter estimations.

• In summary, the cross-validation exercise corroborates the predictive capability of the Sabbagh equation and suggests that it is more suitable for predictive modelling of pesticide trapping efficiency of VFS than the Chen equation.

4.6. DREAM In this section the main results of the parameter estimation and uncertainty analysis with DREAM for the Sabbagh equation are reported. The best parameter estimates and the confidence intervals for each parameter for the two DREAM simulation variants (log-likelihood (LL) and generalized likelihood function (GL) according to Schoups and Vrugt (2010)) are given in Tables 8 and 9, respectively. The DREAM calibration with LL yielded a best parameter set almost identical to the one obtained with OLS (Eq. (16)). This is not surprising because both methods try to minimize squared residuals. Parameter confidence intervals for LL (Table 8) were rather small, except for parameter d (coefficient for the phase distribution variable ln(Fph + 1)). The best parameter set obtained with GL (Table 9) was relatively similar to LL, except for a lower (more negative) value of d (−0.61 vs. −0.37) and a higher value of c, the coefficient for ΔE (0.57 vs. 0.49). Compared with LL, parameter confidence intervals for the calibration with GL were larger for parameters a (intercept) and c, but smaller for d and e (coefficient for the clay content). The best estimates of the heteroscedasticity parameters σ0 and σ1 were 43.8334 and −0.4167, respectively. The upper confidence bound of the slope σ1 was also negative. Hence, DREAM estimated that the error variance decreases with increasing measured ΔP. Posterior parameter distributions are shown in Fig. 7 for LL and in Fig. 8 for GL. For both DREAM simulation variants, the posterior distributions of all parameters and also of the residuals were single-peaked and relatively symmetrical. However, the distributions of the two most

Table 9 Results of the parameter estimation (generalized likelihood function according to Schoups and Vrugt (2010)) for the Sabbagh equation using the DREAM_ZS algorithm. Parameter

Independent variable

a (intercept) b ΔQ c ΔE d ln(Fph + 1) e %C σ0 (heteroscedasticity intercept) σ1 (heteroscedasticity slope)

Best estimatora

Lower bounda,b

Upper bounda,b

−13.47536 0.55435 0.56813 −0.61473 0.18550 40.16430 −0.37420

−22.60519 0.48721 0.46641 −0.79255 0.09688 29.03007 −0.45636

−3.65922 0.61913 0.66041 −0.47832 0.25874 47.82303 −0.24809

a Results are given for the run with the highest likelihood of the best parameter set out of ten trials. b 90-% confidence intervals of the last 20% of the parameter estimations.

546

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

Fig. 7. Posterior parameter distribution obtained with DREAM-ZS (log-likelihood function) for the 5 model parameters of the Sabbagh equation (90-% confidence intervals, together with the best estimators as the red square markers), the dynamics of the convergence (Gelman-Rubin convergence diagnostic (Vrugt et al., 2009)), measured ΔP vs. simulated ΔP, and the residuals measured vs. simulated ΔP. Results given for the run with the highest likelihood of the best parameter set out of ten trials.

important parameters b and c (the coefficients of ΔQ and ΔE) were less peaky for GL than for LL, indicating a higher degree of equifinality for GL. The posterior correlation matrices of the Sabbagh model parameters are presented in Tables S23 (LL) and S24 (GL) in the Supplementary information. For both GL and LL, parameter correlations between the 5 parameters of the Sabbagh equation a, b, c, d, and e were acceptable (the strongest correlations observed were r = −0.64 for LL and r = − 0.73 for GL). For the GL variant, correlations between the 5 parameters and the heteroscedasticity parameters σ0 and σ1 were also acceptable. However, the correlation coefficient between σ0 and σ1 was close to −1, indicating overparameterisation/nonuniqueness. Total predictive uncertainty intervals for each data point are shown in Fig. 9 for the log-likelihood function and in Fig. 10 for the generalized likelihood function. Predictive uncertainty intervals tended to be larger for the GL option than for the LL option, notably for small measured ΔP values and negative outliers. However, there were also some data points (mostly positive outliers) where the GL option yielded smaller uncertainty intervals. The goodness of fit of the two DREAM variants is compared in Table 10. The best parameter set obtained with the LL variant has the same r2, R2 and RMSE values as the OLS fit (cf. Table 4), but a slightly worse PBIAS (−0.123% vs. 0.00094%). The GL variant performed slightly better than LL in terms of PBIAS, but slightly worse for r2, R2 and RMSE. A comparison of the visual fit of the three variants (Fig. 11) revealed that the GL variant yielded different simulated ΔP values than LL and OLS for most data points. GL gave a smaller overestimation for some positive outliers than LL and OLS did, but a much larger underestimation for

two negative outliers from Réal (1997), which were characterized by large uncertainties in ΔE (cf. SI). In summary, the quality of the visual fit was not discernibly better or worse for one variant than for the others; however, in terms of the four objective goodness-of-fit measures the OLS regression performed best and the GL variant worst. One explanation for the worse performance of the GL variant despite accounting for heteroscedasticity may be that the likelihood function of Schoups and Vrugt (2010) assumes that the variance of the total model residuals is a linear function of the measured variable (Eq. (15)). This makes sense for hydrological variables such as discharge, where the measurement error typically increases with the flow (Schoups and Vrugt, 2010). However, the pesticide trapping efficiency ΔP is a normalized variable which is bound between 0 and 100%. The observational error of ΔP (and of its key predictor variables ΔQ and ΔE) should be smallest at the extremes of ΔP (complete infiltration and sedimentation for ΔP = 100% or complete inundation of the VFS for ΔP approaching zero) and largest in the middle range. In summary, it seems that the generalized likelihood function of Schoups and Vrugt (2010) is not suitable for fitting pesticide trapping efficiency of VFS for the following reasons: i) overparameterisation, ii) assumption of linearly increasing measurement error not applicable, iii) objectively worse fit than OLS (r2, R2, RMSE, PBIAS). However, there would be scope for rerunning DREAM with a new likelihood function where the variance of the total model residuals is described as being largest in the middle and zero at the extremes of ΔP. The results of the DREAM simulations for the reduced Sabbagh equations are given in the Supplementary information (Section 10.6;

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

547

Fig. 8. Posterior distribution functions obtained with DREAM (generalized likelihood function according to Schoups and Vrugt (2010)) for the 5 model parameters of the Sabbagh equation (90-% confidence intervals, together with the best estimators as the red square markers), the heteroscedasticity parameters σ0 and σ1, the dynamics of the convergence (Gelman-Rubin convergence diagnostic (Vrugt et al., 2009)), measured ΔP vs. simulated ΔP, and the residuals of measured vs. simulated ΔP. Results given for the run with the highest likelihood of the best parameter set out of ten trials.

Fig. 9. Total predictive uncertainty intervals estimated with DREAM (log-likelihood function) for the Sabbagh equation. Total predictive uncertainty intervals are given by the 2.7th and 97.5th percentiles of the posterior distribution of simulated ΔP for each data point. Results given for the run with the highest likelihood of the best parameter set out of ten trials.

Fig. 10. Total predictive uncertainty intervals estimated with DREAM (generalized likelihood function according to Schoups and Vrugt (2010)) for the Sabbagh equation. Total predictive uncertainty intervals are given by the 2.5th and 97.5th percentiles of the posterior distribution of simulated ΔP for each data point. Results given for the run with the highest likelihood of the best parameter set out of ten trials.

548

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

Table 10 Performance of the Sabbagh equation for the best parameter sets of the two DREAM simulations. Calibration Likelihood function

Pearson r2a

Coefficient of determination R2

RMSE

Log-likelihood Generalized likelihood with heteroscedasticityb

0.819 0.815

0.819 0.815

10.132 −0.123 10.247 0.0330

a b

PBIAS

Between simulated and observed ΔP. Schoups and Vrugt (2010).

Tables S25–S31; Figs. S12–S17). The regression coefficients obtained with DREAM for the reduced Sabbagh equations were very similar to those obtained with OLS regression (cf. Eqs. (17)–(19)). Pearson r2, R2 and RMSE were also almost identical between DREAM and OLS (cf. Table 4). Compared with the predictive uncertainty intervals obtained for the full Sabbagh equation (LL option), predictive uncertainty intervals as determined with DREAM were similar for the variant “ΔQ, ΔE and ln(Fph + 1)”, and slightly smaller for some positive outliers for “ΔQ, ΔE and %C”, and “ΔQ and ΔE”. However, it has to be noted that the predictive uncertainty intervals for individual data points are a measure of precision, not of accuracy. While the reduced Sabbagh equations did not perform significantly worse than the full Sabbagh equation in the DREAM simulations (cf. Table S25), there was no indication of equifinality either for the full Sabbagh equation (LL option). Hence, in situations where the two less important predictor variables clay content and ln(Fph + 1) are available, they should also be used. However, if one or both of these variables are not available, the reduced Sabbagh equations become an attractive option. The following conclusions can be drawn from the DREAM simulations • The DREAM simulations with the LL option have corroborated the parameter values of the Sabbagh equation obtained with OLS regression. Moreover, they provided estimates of the posterior probability distributions of the 5 Sabbagh model parameters. This enabled the

derivation of the corresponding confidence intervals and estimating the predictive uncertainty for the individual data points. • The DREAM simulations with the GL option yielded slightly worse results than the OLS and LL options. This suggests that it is preferable not to account for heteroscedasticity than to do it wrongly. • The DREAM simulations for the reduced Sabbagh equations corroborated the parameter values obtained with OLS regression and did not yield larger predictive uncertainty intervals than for the full Sabbagh equation. The reduced Sabbagh equations (Eqs. (17)–(19)) can be recommended for situations where clay content and/or ln (Fph + 1) are not available.

5. Conclusions and outlook In this study, the experimental data basis of the Sabbagh equation has been considerably widened in terms of experimental conditions and substance properties. The new, consolidated test dataset is made publicly available to help advance the modelling of pesticide trapping efficiency of vegetative filter strips. The study confirmed the suitability of the Sabbagh equation for modelling pesticide trapping in VFS. The new parameter set obtained with ordinary least squares (OLS) regression has been corroborated by both the cross-validation analysis and the maximum-likelihood-based calibration and uncertainty analysis conducted with DREAM_ZS. It can therefore be recommended for use in regulatory modelling with VFSMOD. The new parameter set constitutes a major improvement in terms of predictive capability and statistical justification compared with the original coefficients of Sabbagh et al. (2009). Of course, once the experimental database is extended with new studies, refitting the Sabbagh or any other regression equation will again yield different coefficients. Consequently, fitted/predicted ΔP values for the same data points will change, which is undesirable from a regulatory point of view. Moreover, there is always the possibility of introducing a bias into the database due to the substance properties or experimental conditions (e.g. hydraulic load on the VFS) reflected in new datasets. However, the larger the underlying database gets, the smaller the changes resulting from including additional data will be. The results of the cross-validation analysis suggest indeed that the coefficients of the Sabbagh equation (and thus predicted ΔP values) are not likely to change substantially when new data points are added to the calibration dataset. In contrast to the empirical Sabbagh and Chen equations, the mechanistic mass balance approach “dilution + constant particle-bound concentration” does not need calibration. Hence, it will always yield the same results for a given experimental data point provided that the same Kd value is used. However, this also means that predicted ΔP values do not improve when new data points become available. Because of its advantage of being mechanistic and its overall good predictive performance, the mass balance approach can be recommended as a viable alternative to the Sabbagh equation for regulatory modelling of pesticide trapping in VFS. There is also scope for improvement: Possibly further relevant processes, such as sorption of dissolved pesticide to soil or plant material in the VFS, can be included in the mass balance approach in a simple, yet process-based manner. Acknowledgements This research was funded by Bayer AG, Monheim, Germany. Thanks are due to Rafael Muñoz-Carpena for valuable discussions and to two anonymous reviewers for their constructive criticism.

Fig. 11. Measured ΔP vs. fitted ΔP for three calibrated versions of the Sabbagh equation. OLS = ordinary least squares; DREAM lik2 = DREAM with log-likelihood function; DREAM lik14 = DREAM with generalized likelihood function according to Schoups and Vrugt (2010). The DREAM-estimated parameter sets correspond to the run with the highest likelihood of the best parameter sets out of ten trials.

Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.scitotenv.2018.07.429.

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

References Arora, K., Mickelson, S.K., Baker, J.L., Tierney, D.P., Peters, C.J., 1996. Herbicide retention by vegetative buffer strips from runoff under natural rainfall. Trans. ASAE 39, 2155–2162. Arora, K., Mickelson, S.K., Baker, J.L., 2003. Effectiveness of vegetated buffer strips in reducing pesticide transport in simulated runoff. Trans. ASAE 46, 635. ARS-PPDB, 2006. The ARS Pesticide Properties Database. Available at. https://www.ars. usda.gov/northeast-area/beltsville-md/beltsville-agricultural-research-center/adaptive-cropping-systems-laboratory/docs/ppd/pesticide-properties-database/ USDA, Beltsville, MD (verified 29 Nov 2017, not updated any more). Barfield, B.J., Blevins, R.L., Fogle, A.W., Madison, C.E., Inamdar, S., Carey, D.I., Evangelou, V.P., 1998. Water quality impacts of natural filter strips in karst areas. Trans. ASAE 41, 371. Boyd, P.M., Baker, J.L., Mickelson, S.K., Ahmed, S.I., 2003. Pesticide transport with surface runoff and subsurface drainage through a vegetative filter strip. Trans. ASAE 46, 675. Brown, C., Balderacchi, M., Beinum, W.V., Capri, E., Trevisan, M., 2012. Definition of vegetative filter strip scenarios for Europe. Final Rep Univ York Heslington York UK. Caron, E., Lafrance, P., Auclair, J.-C., Duchemin, M., 2010. Impact of grass and grass with poplar buffer strips on atrazine and metolachlor losses in surface runoff and subsurface infiltration from agricultural plots. J. Environ. Qual. 39, 617–629. Catalogne C., Le Hénaff G. (coordinateurs), 2016. Guide d’aide à l’implantation des zones tampons pour l’atténuation des transferts de contaminants d’origine agricole. Élaboré dans le cadre du groupe technique Zones tampons. Agence française pour la biodiversité, collection Guides et protocoles, 64 pages. ISBN web-pdf: 978-2-37785006-8. ISBN print: 978-2-37785-018-1. Chen, H., Grieneisen, M.L., Zhang, M., 2016. Predicting pesticide removal efficacy of vegetated filter strips: a meta-regression analysis. Sci. Total Environ. 548, 122–130. Consonni, V., Ballabio, D., Todeschini, R., 2010. Evaluation of model predictive ability by external validation techniques. J. Chemom. 24, 194–201. EFSA, 2007. Conclusion regarding the peer review of the pesticide risk assessment of the active substance diflufenican. EFSA Sci. Rep. 122, 1–84. EFSA, 2011a. Conclusion on the peer review of the pesticide risk assessment of the active substance acetochlor. EFSA J. 9, 2143. https://doi.org/10.2903/j.efsa.2011.2143. EFSA, 2011b. Conclusion on the peer review of the pesticide risk assessment of the active substance terbuthylazine. EFSA J. 9, 1969. https://doi.org/10.2903/j.efsa.2011.1969. EFSA, 2014. Conclusion on the peer review of the pesticide risk assessment of the active substance tebuconazole. EFSA J. 12, 3485. https://doi.org/10.2903/j. efsa.2014.3485. EFSA, 2015. Conclusion on the peer review of the pesticide risk assessment of the active substance isoproturon. EFSA J. 13, 4206. https://doi.org/10.2903/j.efsa.2015.4206. EFSA, 2016. Peer review of the pesticide risk assessment of the active substance pendimethalin. EFSA J. 14, 4420. https://doi.org/10.2903/j.efsa.2016.4420. European Commission, 2004. Review report for the active substance S-Metolachlor. http://ec.europa.eu/food/plant/pesticides/eu-pesticides-database/public/?event= activesubstance.detail&language=EN&selectedID=1855. European Commission, 2005. Review report for the active substance chlorpyrifos. http:// ec.europa.eu/food/plant/pesticides/eu-pesticides-database/public/?event= activesubstance.detail&language=EN&selectedID=1130. Flury, M., 1996. Experimental evidence of transport of pesticides through field soils—a review. J. Environ. Qual. 25, 25–45. Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 457–472. Klöppel, H., Kördel, W., Stein, B., 1997. Herbicide transport by surface runoff and herbicide retention in a filter strip - rainfall and runoff simulation studies. Chemosphere 35, 129–141. Krutz, L.J., Senseman, S.A., Dozier, M.C., Hoffman, D.W., Tierney, D.P., 2003. Infiltration and adsorption of dissolved atrazine and atrazine metabolites in buffalograss filter strips. J. Environ. Qual. 32, 2319–2324. Lacas, J.-G., Voltz, M., Gouy, V., Carluer, N., Gril, J.-J., 2005. Using grassed strips to limit pesticide transfer to surface water: a review. Agron. Sustain. Dev. 25, 253–266. Lacas, J.-G., Carluer, N., Voltz, M., 2012. Efficiency of a grass buffer strip for limiting diuron losses from an uphill vineyard towards surface and subsurface waters. Pedosphere 22, 580–592. Lafrance, P., Caron, E., Bernard, C., 2013. Impact of grass filter strips length on exported dissolved masses of metolachlor, atrazine and deethylatrazine: a four-season study under natural rain conditions. Soil Use Manag. 29, 87–97. Larsbo, M., Sandin, M., Jarvis, N., Etana, A., Kreuger, J., 2016. Surface runoff of pesticides from a clay loam field in Sweden. J. Environ. Qual. 45, 1367–1374. Lecomte, V., 1999. Transfert de produits phytosanitaires par le ruissellement et l'érosion de la parcelle au bassin versant. Processus, déterminisme et modélisation spatiale. ENGREF, Paris (231 pp.). Legates, D.R., McCabe, G.J., 1999. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 35, 233–241. Lewis, K.A., Tzilivakis, J., Warner, D.J., Green, A., 2016. An international database for pesticide risk assessments and management. Hum. Ecol. Risk Assess. Int. J. 22, 1050–1064. Mersie, W., Seybold, C.A., McNamee, C., Lawson, M.A., 2003. Abating endosulfan from runoff using vegetative filter strips: the importance of plant species and flow rate. Agric. Ecosyst. Environ. 97, 215–223. Misra, A.K., Baker, J.L., Mickelson, S.K., Shang, H., 1996. Contributing area and concentration effects on herbicide removal by vegetative buffer strips. Trans. ASAE 39, 2105–2111. Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D., Veith, T.L., 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 50, 885–900.

549

Muñoz-Carpena, R., Parsons, J.E., 2014. VFSMOD-W Vegetative Filter Strips Modelling System. Model documentation & user's manual. Version 6.x. http://abe.ufl.edu/carpena/ files/pdf/software/vfsmod/VFSMOD_UsersManual_v6.pdf. Muñoz-Carpena, R., Fox, G.A., Sabbagh, G.J., 2010. Parameter importance and uncertainty in predicting runoff pesticide reduction with filter strips. J. Environ. Qual. 39, 630–641. Muñoz-Carpena, R., Ritter, A., Fox, G.A., Perez-Ovilla, O., 2015. Does mechanistic modeling of filter strip pesticide mass balance and degradation processes affect environmental exposure assessments? Chemosphere 139, 410–421. Muñoz-Carpena, R., Lauvernet, C., Carluer, N., 2018. Shallow water table effects on water, sediment, and pesticide transport in vegetative filter strips–part 1: nonuniform infiltration and soil water redistribution. Hydrol. Earth Syst. Sci. 22, 53. Murphy, I., Arthur, E., 2011. Trichlorfon: Adsorption and Desorption on Four Soils and One Sediment. Project Number: MEDLY006 (Unpublished study prepared by Bayer CropScience). Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I—a discussion of principles. J. Hydrol. 10, 282–290. Otto, S., Vianello, M., Infantino, A., Zanin, G., Di Guardo, A., 2008. Effect of a full-grown vegetative filter strip on herbicide runoff: maintaining of filter capacity over time. Chemosphere 71, 74–82. Otto, S., Cardinali, A., Marotta, E., Paradisi, C., Zanin, G., 2012. Effect of vegetative filter strips on herbicide runoff under various types of rainfall. Chemosphere 88, 113–119. Patty, L., Real, B., Joël Gril, J., 1997. The use of grassed buffer strips to remove pesticides, nitrate and soluble phosphorus compounds from runoff water. Pest Manag. Sci. 49, 243–251. Pätzold, S., Klein, C., Brümmer, G.W., 2007. Run-off transport of herbicides during natural and simulated rainfall and its reduction by vegetated filter strips. Soil Use Manag. 23, 294–305. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830. Perez-Ovilla, O., 2010. Modeling Runoff Pollutant Dynamics Through Vegetative Filter Strips: A Flexible Numerical Approach. University of Florida, U.S.A. (PhD thesis, 195 pp.). Poletika, N.N., Coody, P.N., Fox, G.A., Sabbagh, G.J., Dolder, S.C., White, J., 2009. Chlorpyrifos and atrazine removal from runoff by vegetated filter strips: experiments and predictive modeling. J. Environ. Qual. 38, 1042–1052. Popov, V.H., 2005. Catchment Approach to Managing Pesticides in the Envirionment—A Case Study With the Herbicide Atrazine. University of Western Sydney, Australia (PhD Dissertation, 260 pp.). Popov, V.H., Cornish, P.S., Sun, H., 2006. Vegetated biofilters: the relative importance of infiltration and adsorption in reducing loads of water-soluble herbicides in agricultural runoff. Agric. Ecosyst. Environ. 114, 351–359. Réal, B., 1997. Étude de l'efficacité de dispositifs enherbés. Campagnes 1993-94, 1994-95, 1995-96. I.T.C.F. - AGENCES DE L'EAU (21 pp.). Reichenberger, S., Pires, J., 2014. VFSMOD simulation study for UBA project, WP 2 – model coupling plus study results. https://www.researchgate.net/publication/265173454_ VFSMOD_simulation_study_for_UBA_project_GERDA_WP2_model_coupling_plus_ results_20140831_minor_update_with_some_typos_removed_and_some_unclear_ expressions_fixed (99 pp.). Reichenberger, S., Bach, M., Skitschak, A., Frede, H.-G., 2007. Mitigation strategies to reduce pesticide inputs into ground- and surface water and their effectiveness; a review. Sci. Total Environ. 384, 1–35. https://doi.org/10.1016/j.scitotenv.2007.04.046. Reichenberger, S., Sur, R., Kley, C., 2017. Evaluation of pesticide trapping efficiency equations for vegetative filter strips (VFS) using additional experimental data. SETAC Eur. Annu. Meet. 2017 07–11 May 2017 Bruss. Belg. Poster Present. Rice, P., Xu, T., White, J.W., Horgan, B., Williams, J., Coody, P., Arthur, E., McConnell, L., 2017. Quantification of turfgrass buffer performance in reducing transport of pesticides in surface runoff. 254th Am. Chem. Soc. Natl. Meet. 20–24 Aug 2017 Wash. DC USA Poster Present. Sabbagh, G.J., Fox, G.A., Kamanzi, A., Roepke, B., Tang, J.-Z., 2009. Effectiveness of vegetative filter strips in reducing pesticide loading: quantifying pesticide trapping efficiency. J. Environ. Qual. 38, 762–771. Schmitt, T.J., Dosskey, M.G., Hoagland, K.D., 1999. Filter strip performance and processes for different vegetation, widths, and contaminants. J. Environ. Qual. 28, 1479–1489. Schoups, G., Vrugt, J.A., 2010. A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour. Res. 46, W10531. Schulz, R., 2004. Field studies on exposure, effects, and risk mitigation of aquatic nonpoint-source insecticide pollution. J. Environ. Qual. 33, 419–448. Seybold, C., Mersie, W., Delorem, D., 2001. Removal and degradation of atrazine and metolachlor by vegetative filter strips on clay loam soil. Commun. Soil Sci. Plant Anal. 32, 723–737. Spatz, R., 1999. Rückhaltevermögen von Pufferstreifen für pflanzenschutzmittelbelasteten Oberflächenabfluss. Shaker Verlag, Aachen (Dissertation, Institut für Phytomedizin, Universität Hohenheim, 176 pp.). Syversen, N., Bechmann, M., 2004. Vegetative buffer zones as pesticide filters for simulated surface runoff. Ecol. Eng. 22, 175–184. Tingle, C.H., Shaw, D.R., Boyette, M., Murphy, G.P., 1998. Metolachlor and metribuzin losses in runoff as affected by width of vegetative filter strips. Weed Sci. 475–479. Vrugt, J.A., 2016. Markov chain Monte Carlo simulation using the DREAM software package: theory, concepts, and MATLAB implementation. Environ. Model. Softw. 75, 273–316. Vrugt, J.A., Ter Braak, C.J.F., Diks, C.G.H., Robinson, B.A., Hyman, J.M., Higdon, D., 2009. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-

550

S. Reichenberger et al. / Science of the Total Environment 647 (2019) 534–550

adaptive randomized subspace sampling. Int. J. Nonlinear Sci. Numer. Simul. 10, 273–290. Wauchope, R.D., 1978. The pesticide content of surface water draining from agricultural fields—a review 1. J. Environ. Qual. 7, 459–472. Wauchope, R.D., 1996. Pesticides in runoff: measurement, modeling, and mitigation. J. Environ. Sci. Health B 31, 337–344.

Webster, E.P., Shaw, D.R., 1996. Impact of vegetative filter strips on herbicide loss in runoff from soybean (Glycine max). Weed Sci. 662–671. White, J.W., Kochiss, C.S., Xu, T., 2016. Quantification of Turf Grass Buffer Performance in Reducing Transport of Trichlorfon and Tebuconazole in Surface Runoff. Bayer CropScience, Research Triangle Park, NC, USA 27709 (172 pp.).