Journal of Cleaner Production 222 (2019) 310e323
Contents lists available at ScienceDirect
Journal of Cleaner Production journal homepage: www.elsevier.com/locate/jclepro
Uncertainty analysis of the relationship between discharge and nitrate concentration in the Lower Illinois River using a copula model Daeryong Park a, Momcilo Markus b, Kichul Jung a, Myoung-Jin Um c, * a
Department of Civil and Environmental Engineering, Konkuk University, Seoul, 05029, Republic of Korea Prairie Research Institute, University of Illinois, Champaign, IL, 61820, USA c Department of Civil Engineering, Kyonggi University, Suwon, 16227, Republic of Korea b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 19 March 2018 Received in revised form 20 February 2019 Accepted 4 March 2019 Available online 8 March 2019
The complex relationship between water quality and river discharge has long been an active area of research. This research applies bivariate statistics to river nitrate concentration and discharge, and develops a procedure for selecting the best bivariate distribution for the chosen models. Nitrate concentration (NO3) data from five stations on the Lower Illinois River, USA, collected from 1972 to 2012 were used for demonstration of this procedure. To explore the statistical relationship between discharge and nitrate concentration, three copula statistical models were applied to determine optimal bivariate distributions. The generalized Pareto (GPA), generalized extreme value (GEV), and lognormal two- (LN2) and three-parameter (LN3) distributions were used as marginal distributions, and were combined with the Clayton, Frank, and Gumbel copula models. The probability plot correlation coefficient (PPCC) test and the Sn statistic were used to select the most suitable copula models. As a result, the GPA distributions for streamflow and nitrate concentration in the Frank copula model were more accurate for representing the data from the two mainstem stations, Havana (D-31) and Valley City (D-32), and those from the Oakford station along the Sangamon River (E-25). However, the Clayton copula model using NL3 for representing streamflow and GPA for representing nitrate concentration was more accurate for stations along the Spoon (DJ-08) and La Moine Rivers (DG-01). This study enabled the development of a procedure for determining an optimal copula for application in the environmental sciences and also provided a case study for applying copula models using discharge and nutrient concentration as hydrologic and water quality variables, respectively. © 2019 Elsevier Ltd. All rights reserved.
Keywords: Copula model Discharge Nitrate concentration Lower Illinois river
1. Introduction Nutrient concentration and load of water have been found to be highly correlated with flow characteristics. Many studies have suggested using a “rating-curve” to estimate sediment and nutrient concentration from continuous streamflow data (Cohn et al., 1992; Cohn, 1995, 2005). The rating curve method can represent seasonal trends in parameters from serial correlations in the logtransformed relationship between nutrient concentration and discharge. The relationship between water quality and streamflow has also been investigated and the correlation between the logtransformed sediment concentration and streamflow has been suggested to be linear (Antonopoulos et al., 2001; Syvitski et al.,
* Corresponding author. E-mail address:
[email protected] (M.-J. Um). https://doi.org/10.1016/j.jclepro.2019.03.034 0959-6526/© 2019 Elsevier Ltd. All rights reserved.
2000; Vogel et al., 2003). In addition, several studies have stated that the relationship between nutrient concentration and streamflow of each river station matches their respective lognormal distribution (Ditoro, 1984; Van Buren et al., 1997; Vogel et al., 2005). Vogel et al. (2005) investigated the relationship between discharge and constituent concentration and load, and represented the concentration and streamflow at four river sites in Ohio, USA, using the bivariate twoparameter lognormal (LN2) and three-parameter lognormal (LN3) distributions. They then used the probability plot correlation coefficient (PPCC) test to verify these two distributions; their results suggested that the concentration and discharge at three sites, excluding Grand River, corresponded with the LN2 distribution. Conventional joint distributions contain certain restrictions on the types of marginal distributions, which indicate the probability distribution of the variables in the subset and the dependent variables. This study used copula functions to resolve those restrictions
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
311
Fig. 1. Process for selecting an appropriate copula model for discharge and nitrate concentration in this study.
on joint distributions. Copulas define functions that join or “couple” multivariate distribution functions to their one-dimensional marginal counterparts (Nelson, 2007). Copula functions enable joint behaviors to be modeled based on the marginal functions from the untransformed data. Copulas have recently been used as a mathematical tool for hydrological analysis and applications, such as analyzing precipitation, drought, and flooding (De Michele and Salvadori, 2003; De Michele et al., 2005; Kao and Govindaraju, 2007; Fu and Butler, 2014; Sraj et al., 2015). De Michele et al. (2005) applied bivariate copulates considering flood peaks and volumes to evaluate flooding at a dam in Italy. Furthermore, Kao and Govindaraju (2007) conducted a bivariate frequency distribution using precipitation data in Indiana, USA. They applied copula models to rainfall depth, storm event duration, and peak intensity of rainfall to explain dependencies between these variables. Fu and Butler (2014) used the Clayton, Gumbel, and Frank copula functions to rainfall data to assess the drainage systems of a town in Scotland. Recent studies have applied copula models to more accurately
evaluate nutrient and streamflow variables. Chowdhary et al. (2008) applied copula functions to represent the effects of several best management practice (BMP) scenarios, selecting the total suspended solids (TSS) and turbidity in place of nutrients, and used the Archimedean family of copula functions by applying the LN3 distribution for nutrients. They also used the three-parameter Weibull and Gamma distributions for flow before and after the implementation of each BMP scenario. Bezak et al. (2014) applied trivariate copula functions to model the relationship between flow volume, peak flow frequency, and suspended sediment concentration. They fit six trivariate symmetric and asymmetric Archimedean copulas and four marginal distributions to data from one station in Slovenia and five stations in the USA, and found that the Gumbel-Hougaard combination was the optimal copula function for modeling all three pairs of variables at the selected stations. Wang et al. (2017) applied copula models to represent the relationship between discharge and total nitrogen (TN) or total phosphorus (TP) content of the Miyun Reservoir, China. For this
Table 1 Summary of the applied univariate distributions. Probability models GPA GEV
LN2 LN3
Probability density function 1 vða x0 Þ 1=v1 1 u u 2 1 13 o 1 o n 1n v v f ðaÞ ¼ exp4 1 ða x0 Þ v 5 1 ða x0 Þ v u u u # " 2 1 ðln a mÞ f ðaÞ ¼ pffiffiffiffiffiffi exp 2s 2 as 2p # " 1 ðlnða x0 Þ mÞ2 pffiffiffiffiffiffi exp f ðaÞ ¼ 2s 2 ða x0 Þs 2p
f ðaÞ ¼
Note: u, v, x0 , m; and s denote the scale, shape, and location parameters, mean, and standard deviation, respectively.
312
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
relationship, the priority management intervals of inflows were determined considering multiple variables. Liu et al. (2018) used one station to determine the relationship between water quality parameters (ammonia nitrogen, NH3eN, permanganate index, and CODMn) and discharge in the Yi River Basin, China. However, the application of copula functions to investigate the relationship between water quality and hydrological factors has rarely been studied. To accomplish this, a method to determine optimal marginal distributions for copula functions and the optimal joint distributions of nutrient concentration and discharge was required. The objective of this study is to interpret the relationship between discharge, as the hydrologic variable, and nitrate concentration, as the water quality variable. This study incorporates discharge and nitrate concentration into copula models and suggests the most suitable and accurate models for matching the observed data from the lower Illinois River basin. The daily streamflow and total nitrogen concentration recorded at five stations on the Lower Illinois River, USA, from 1972 to 2012, were considered. One of the major nutrient sources in the Lower Illinois River Basin is nitrate-N from agricultural land, as this is widely used for corn and soybean production (Markus et al., 2014; McIsaac et al., 2016; Panno et al., 2008). 2. Methods 2.1. Copula model
study selected the optimal statistics for the streamflow and nitrogen data by arbitrarily selecting the four most common marginal distributions to streamflow and nitrogen (Castillo et al., 2005; Kroll and Vogel, 2002; Morgan et al., 2011; Smith, 1989), which were the generalized Pareto (GPA), generalized extreme value (GEV), and two- (LN2) and three-parameter (LN3) lognormal distributions (Table 1). The GPA (Pickands, 1975) is a three-parameter distribution that includes the characteristics of the uniform, exponential, and Pareto distributions. The GPA distribution has been used previously to assess hydrological extreme values (Hosking and Wallis, 1987, Rasmussen et al., 1994, De Michele and Salvadori, 2003; Malamud and Turcottte, 2006; Li et al., 2014; among others). The GEV (Jenkinson, 1955) is a three-parameter distribution that inchet, Gumbel, and Weibull cludes the characteristics of the Fre distributions, and has been widely applied in environmental and hydrology data (El Adlouni et al., 2007; Katz et al., 2002; Min et al., 2011; Ouarda et al., 2001; Stedinger et al., 1993). Like the GEV, the LN2 and LN3 distributions have been also widely applied to hydrologic datasets, and can suitably fit water quality distributions (Meybeck, 1993; Novotny, 2004; Mitchell, 2005; Wang et al., 2013; Geng et al., 2015). In statistics, the L-moments proposed by Hosking (1990) denote the expectations of the linear function of probability-weighted moments (PWMs), and are represented as follows:
ck ¼ n1
A copula model provides the joint probabilistic distributions of multivariate variables (Bargaoui and Bardossy, 2015). A copula model using univariate marginal distributions was introduced following the method of Sklar (1959):
Jða; bÞ ¼ PfAðaÞ; BðbÞg; a; b 2 ℝ
(1)
where AðaÞ and BðbÞ indicate the univariate (marginal) distributions. When P : ½0; 12 approaches ½0; 1, Jða; bÞ indicates a copula. Approaches for analyzing a copula model were described by Joe (1997) and Nelsen (2007). The L-moment and the maximum likelihood (ML) methods are used to estimate parameters of the marginal distributions and the copula models, respectively (Genest et al., 1995, 2006; Silva and Lopes, 2008). Um et al. (2017) suggested processes for selecting an optimal copula model, and these processes were followed for discharge and nitrate concentration as in Fig. 1. 2.1.1. Marginal distributions Numerous statistical distributions have been applied to environmental and hydrological data to consider the probabilities of discharge, pollutant concentration, and pollutant load (discharge concentration) (Hosking and Wallis, 1986; Cohn et al., 1992; Vogel and Wilson, 1996). Copula functions require identification of suitable marginal distributions for target variables. This
n1 k
1 X n i1 xi k
(2)
i¼kþ1
where n is the sample size, xi denotes the data in ascending order, and ck indicates the unbiased estimators of the PWMs. The Lmoment is estimated as follows:
liþ1 ¼ ð 1Þim
i m
iþm bm ; i ¼ 0; 1; /; n 1 m
(3)
The coefficient of variation is defined as l2 =l1 , and the L-moment ratios are defined as li =l2 . The L-moment ratio diagrams using the Lskewness and L-kurtosis indicate an appropriate distribution for the studied data (Hosking and Wallis, 1987; Hosking, 1990).
2.1.2. Copulas Here, Archimedean copulas, which are an associative class of copulas that allow modeling dependence in randomly high dimensions with one parameter, which governs the strength of dependence, were used (Nelson, 2007). Archimedean copulas are estimated using marginal distributions and can exhibit different tail dependencies, such as the upper tail, lower tail, or both of these dependencies with magnitudes (Hofert and Macheler, 2011). Based on the work of Hofert and Macheler (2011) and Um et al. (2017), Archimedean copulas can be described as follows:
Table 2 Summary of the three copulas assessed in this study (Genest and Favre, 2007). Model Clayton
Kendall's tau,
Parameter,
t
g
Generator, fðgÞ
ðmaxfag þ bg 1; 0gÞ1=g
t¼
g 1
ðt g 1Þ=g
3 1 ðga 1Þðgb 1Þ5 ln4 ln g g1
t ¼ 1 ½1 D1 ðgÞ g
expð ½ðln uÞg þ ðln vÞg 1=g Þ
t¼
2
Frank
Gumbel Note: D1 ðgÞ ¼
Bivariate copula, Pða; b; gÞ
Rg 0
ðt=gÞ=ðet 1Þdt denotes the first Debye function.
g gþ2 4
g2ℝ
g1 g
g1
log
egt 1 eg 1
jlogðtÞjg
!
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
313
Fig. 2. Selected stations from the Lower Illinois River Basin.
Table 3 Summary and basic statistics of nitrate concentration and discharge at the five stations (Markus et al., 2014). EPA station No.
D-31
D-32
DG-01
DJ-08
E-25
USGS station No. Stream gauge Drainage area (km2)
05570500 Illinois River at Havana 46,845
05586100 Illinois River at Valley City 68,465
05585000 La Moine River at Ripley 3,310
05570000 Spoon River at Seville 4,237
05583000 Sangamon River at Oakford 13,038
Data
Q* (m3/s)
Date Number Average Min Max Standard deviation Skewness Kurtosis
1978.1e2008.5 274 550.78 4.13 147.74 0.57 1978.28 10.40 378.40 1.81 1.39 0.38 1.53 0.47
N (mg/L)
Q (m3/s)
N (mg/L)
1974.12e2012.9 352 698.05 3.93 95.43 0.18 2831.69 9.68 549.16 1.77 1.25 0.33 1.13 0.51
Q (m3/s)
N (mg/L)
1972.2e2012.9 467 30.67 2.89 0.08 0.01 506.87 11.00 64.29 2.30 3.87 0.48 17.99 0.69
Q (m3/s)
N (mg/L)
1977.12e2012.6 301 42.37 4.73 0.57 0.01 778.71 13.00 75.54 3.23 4.90 0.06 34.07 1.09
Q (m3/s)
N (mg/L)
1974.10e2012.8 312 125.26 4.33 5.97 0.01 1277.09 12.00 183.95 2.64 3.21 0.23 13.34 0.96
Note: Q and N are discharge and nitrate concentration, respectively. *Flows for 1978e2007 are estimated by hydrological modeling (Lian et al., 2010).
314
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
Fig. 3. Relationship between discharge and nitrate concentration at each site (Figures on the left are normal scale and those on the right are log-scale).
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
Pða; bjgÞ ¼ f1 ðfðajgÞ þ fðbjgÞ
(4)
where a and b are independent observations (discharge and nitrate concentration), f is the Archimedean generator, and g is a copula parameter. Genest and MacKay (1986) used the Archimedean generator to derive Kendall's tau (t):
ð1
.
0
t ¼ 1 þ 4 fðxÞ f ðxÞdx
(5)
0
where fðxÞ is the first derivative of fðxÞ. The Clayton, Frank, and Gumbel-Hougaard Archimedean copulas are commonly used for hydrologic and environmental data (Clayton, 1978; Frank, 1979; Gumbel, 1960; Hougaard, 1986). Kendall's tau and the parameters for the three Archimedean copulas are presented in Table 2. The relationship between discharge and nitrate concentration is represented in this study using these three copula functions.
315
2.2.2. Selection of copula models The Sn statistic, which is based on the bootstrap method, was used to fit the most suitable copula models for discharge and nitrate concentration, as follows:
Sn ¼
n X n i iþ1 i þn K qn ; K qn ; n K 2n 3 n n n i¼1 n X i iþ1 i K 2 qn ; K 2 qn ; Kn n n n i¼1
(8)
where n is the sample size and Kn ðtÞ and Kqn ðtÞ are empirical and theoretical distributions. According to Genest and Favre (2007), using 10,000 bootstrap datasets allows the p-value to be estimated, the selection of an appropriate copula model from the candidate models was enabled. A copula model is more suitable if the Sn statistic value of the model is smaller than that of the other copula models. 2.3. Study stations and data selection
2.1.3. Copula parameters In this study, the inference functions for margins (IFM) method to estimate the copula parameters, which uses the ML method for estimating the three copula parameters by calculating the likelihood functions after the parameters of the marginal distributions are estimated, was adopted as follows:
LðqÞ ¼
n X
o n b b BðbÞ log Cq AðaÞ;
(6)
i¼1
b b where n is the sample size, and AðaÞ and BðbÞ denote the marginal distributions depending on discharge and nitrate concentration. LðqÞ is presented as qIFM because LðqÞ is the ML (Joe, 2014).
The Lower Illinois River Basin (Fig. 2), located in the northern region of the Mississippi River Basin, was selected for this study. Agricultural watersheds of the Lower Illinois River Basin cause high nitrate concentration that have affected the water quality of streams and rivers (Guo et al., 2002; Lian et al., 2010; Markus et al., 2014). Agriculture is the dominant land use type in the entire Lower Illinois River Basin watershed, consisting mainly of row crops (corn and soybeans). Five sites were selected, and are managed by the Illinois Environmental Protection Agency (IEPA) (IEPA, 2007; Markus et al., 2014). Among these five stations, two sites, D-31
2.2. Copula model selection methods This study analyzed discharge and nitrate concentration from the five afore-mentioned stations on the Lower Illinois River and used the GPA, GEV, LN2, and LN3 distributions to calculate marginal distributions of discharge and nitrate concentration. This study applied the Clayton, Frank, and Gumbel copulas to determine appropriate copula models. The goodness-of-fit method verifies the selection of an appropriate copula model between discharge and nitrate concentration. The probability plot correlation coefficient (PPCC) test and the Sn statistic values were used to determine the goodness-of-fit at each station. 2.2.1. Marginal distribution The PPCC test was used to calculate the correlation coefficient between the observed and modeled data, and to then fit the univariate distribution of variables (Filliben, 1975). The coefficient is estimated as follows:
Pn i¼1 ðxi xÞðmi mÞ r ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn Pn i¼1 ðxi xÞ i¼1 ðmi mÞ
(7)
where xi and x indicate the ordered and mean values of the observed data, respectively, and mi and m denote the ordered and mean values of the corresponding model percentiles, respectively. The method proposed by Vogel and Kroll (1989) was followed to determine the critical values of the marginal distributions in the PPCC test.
Fig. 4. L-skewness and L-kurtosis diagrams. (a) Discharge, (b) Nitrate concentration.
316
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
Fig. 5. Comparison of quantile plots by the reduced variate (The left figures denote the variations in discharge and the right figures explain those of nitrate).
6.964 6.814 5.701 6.779 6.623 6.588 5.540 6.565 6.811 6.829 6.067 6.806 6.957 6.844 5.758 6.811 6.529
Gumbel Frank Clayton
0.987 2.290 0.129 2.297 1.016 2.328 0.141 2.339 0.969 2.135 0.192 2.144 1.004 2.321 0.137 2.330 1.422 1.476 1.641 1.403 1.634 1.452 1.613 1.385 1.607 1.512 1.679 1.502 1.673 1.493 1.662 1.423 1.654 1.550
Gumbel
E-25
Frank
5.539 5.131 4.371 5.066 5.238 4.897 4.152 4.843 5.547 5.220 4.735 5.163 5.592 5.197 4.479 5.133 5.019
(Havana) and D-32 (Valley City), are located along the mainstem of the Illinois River, and the remaining three sites are located in tributaries: E-25 (the Sangamon River at Oakford), DG-01 (the La Moine River at Ripley), and DJ-08 (the Spoon River at Seville) (Fig. 2 and Table 3). Station E-25 covers 19% of the Lower Illinois River Basin area. The Spoon River basin, which includes station DJ-08, is predominantly used for agricultural purposes (82.1 percent). There are between 274 and 467 datasets available for each station from 1974 to 2012, as the IEPA monitoring project collected data at four to six-week intervals during this period. This study used the streamflow data for station D-31 from Lian et al. (2010), who used Hydrologic Simulation ProgrameFORTRAN (HSPF) to generate daily flows for the Illinois River Basin, and these values were adopted and statistically verified from a study by Markus et al. (2014). The basic statistics of discharge and nitrate concentration at the five stations are summarized in Table 3, and the relationship between discharge and nitrate concentration on the normal and log-scales for each station are presented in Fig. 3.
317
1.533 1.866 1.671 1.864 1.519 1.842 1.663 1.842 1.583 1.920 1.774 1.920 1.546 1.884 1.689 1.882 1.750
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
2.186 2.036 0.044 2.020 2.148 2.032 0.042 2.019 1.602 1.558 0.062 1.551 2.178 2.044 0.047 2.029 1.475 LN3
LN2
GEV
Note: Q and N are discharge and nitrate concentration, respectively.
GPA
Average
Clayton Gumbel
1.354 1.398 1.307 1.396 1.332 1.378 1.305 1.376 1.445 1.500 1.461 1.498 1.407 1.453 1.344 1.450 1.400 4.485 4.055 3.557 4.051 4.243 3.895 3.478 3.891 4.706 4.408 4.319 4.399 4.742 4.289 3.805 4.284 4.163
Frank Clayton
2.120 1.899 0.070 1.910 2.104 1.935 0.076 1.944 1.568 1.569 0.130 1.568 2.221 2.001 0.077 2.012 1.450 1.349 1.628 1.624 1.631 1.331 1.613 1.601 1.617 1.352 1.634 1.635 1.638 1.335 1.622 1.609 1.626 1.553
Gumbel Frank
4.731 4.645 4.363 4.633 4.446 4.445 4.187 4.440 4.447 4.444 4.248 4.438 4.535 4.511 4.245 4.504 4.454 0.340 1.221 0.517 1.231 0.295 1.351 0.533 1.363 0.315 1.256 0.550 1.266 0.292 1.354 0.527 1.366 0.861
Clayton Gumbel Clayton N
GPA GEV LN2 LN3 GPA GEV LN2 LN3 GPA GEV LN2 LN3 GPA GEV LN2 LN3
Q
Frank
1.362 1.588 1.573 1.591 1.372 1.604 1.587 1.608 1.368 1.604 1.581 1.607 1.373 1.609 1.590 1.612 1.539
DG-01 D-32 D-31
In this study, PPCC tests were applied to determine the most
Marginal distribution
3.2. Selection of the most suitable copula models
Table 4 Estimated copula parameters depending on the combination of the marginal distributions and copula models at five stations.
In this study, the GPA, GEV, LN2, and LN3 distributions were used to determine the most suitable copula models for assessing discharge and nitrate concentration with L-moments. The Lskewness and L-kurtosis for the observed discharge and nitrogen data with the GPA, GEV, LN2, and LN3 distribution lines are described in Fig. 4. When the L-skewness and L-kurtosis line or point of a distribution of the four theoretical distributions is close to the L-skewness and L-kurtosis point of the observed data, the distribution is suggested to appropriately represent the data in Fig. 4. The observed discharge data at D-31, D-32, and E-25 were close to the GPA distribution, and DJ-08 were similar to the GPA and LN3 distributions, but the observed discharge data at DG-01 were close to the LN3 distribution (Fig. 4). The tail shapes of the low discharges at all five stations (Fig. 5) were similar to the GPA distribution. However, the high discharges did not follow any marginal distributions, excepting DJ-08. The nitrate concentration data from D-31 and D-32 were very close to the GEV distribution, and those at DG-01, DJ-08, and E-25 were relatively close to the GPA, excluding their upper tails. Generally, the fitting shapes of the GPA distribution were similar for discharge, excluding the upper tail. The fit of the GPA distribution was more appropriate for the nitrate concentration data from DG-01, DJ-08, and E-25, while that of the GEV distribution was close to the observed data from D-31 and D-32. The GPA distribution was appropriately fitted to discharge and nitrate concentration for six of the ten plots as shown in Fig. 5. The three Archimedean copulas and the copula parameters of four marginal distributions were estimated for discharge and nitrate concentration data from the five stations are presented in Table 4. Consequently, the copula parameters of the 240 (4 marginal distributions for discharge 4 marginal distributions for nitrate concentration 3 copulas 5 stations) copula models using the four marginal distributions for discharge and nitrogen concentration, three copulas, and five stations are estimated in Table 4. The estimated parameters for the Clayton, Frank, and Gumbel copula models were 0.751e1.422, 4.135e6.529, and 1.400e1.750, respectively, as shown in Table 4. The range of the Frank copula parameters was wider than that of the parameters of the Clayton or Gumbel copulas.
0.169 0.826 0.436 0.832 0.236 1.189 0.653 1.197 0.225 1.204 0.643 1.211 0.228 1.166 0.635 1.173 0.751
3.1. Evaluation of copula models
4.396 4.209 3.877 4.197 4.252 4.142 3.823 4.135 4.352 4.204 3.857 4.195 4.316 4.179 3.852 4.170 4.135
DJ-08
3. Results
318
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
Fig. 6. PPCC test results for the marginal distributions of discharge and nitrate concentration (Blue: Observed data, Red: Critical values). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
319
Fig. 7. Dependencies of Sn statistics on the three copula models.
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
320
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
suitable copula models among the four marginal distributions using the monitoring data. The critical values of the PPCC test were compared with the PPCC results of the monitoring data to assess the validity of the marginal distributions. The critical values and the PPCC results of discharge and nitrate concentration monitoring data based upon the four marginal distributions at the five stations are represented in Fig. 6. Nitrate concentration characteristics of the Lower Illinois River are presumed to differ from those of other known rivers or watersheds (Fig. 6). Discharge and nitrate concentration at D-31 and D-32 were acceptable for all four marginal distributions. Similarly, the nitrate concentration at DG-01, DJ-08, and E-25 were acceptable for all four marginal distributions, excluding LN2 at DG-01. Nitrate concentration at DG-01, DJ-08, and E-25 were only acceptable for GPA and GEV and were not acceptable for LN2 and LN3. This result differs from known information that states that water quality follows LN2 (Meybeck, 1993; Novotny, 2004; Mitchell, 2005; Wang et al., 2013; Geng et al., 2015). This result indicates that distributions other than LN2 for nitrate concentration may be more suitable depending on land use, rainfall, discharge or other watersheds characteristics. Sn statistics were applied to assess the goodness-of-fit (GOF) for the combinations of incorporating the four marginal distributions into the three copula models at five stations. The most suitable
copula model has the lowest value of Sn statistics (Genest et al., 2009). In this study, the 240 copula models were compared based on the four marginal distributions of discharge and nitrogen concentration, three copulas, and five stations, and the Sn statistic results were used to determine which copulas were appropriate for representing discharge and nitrate concentration from the five stations. The null hypothesis is that there is no difference in the means between discharge or nitrate concentration data for each station. All p-values exceeded 0.05 for all stations. The three copula models at D-31 exhibited the lowest Sn statistics, indicating that these were the most appropriate copula models for the LN3-GEV (Q-N), GPA-GPA, and GPA-GEV distributions in Fig. 7. D-31 and D-32 exhibit different Sn static values depending on the copulas. However, DG-01, DJ-08, and E-25 exhibited relatively similar Sn statistic values for all three copulas. For all five stations, the GPA and LN3 for discharge and GPA and GEV for nitrate exhibit small Sn statistic values. In this study, the three lowest Sn statistic values were selected for each of the copulas from each station. The three lowest Sn statistic values for all five stations, which represent the Frank copula incorporated into the GPA-GPA (Q-N) marginal distributions at stations D-31, D-32, and E-25, are compared in Fig. 8. However, the Clayton copula with the LN3-GPA (Q-N) marginal distributions was more appropriate for analyzing
Fig. 8. Comparison of the minimum Sn statistics of the three copula models.
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
Fig. 9. Nitrate concentration (mg/L) with conditional copulas (Figures on the left are normal scale and those on the right are the log-scales of discharge on the x-axis).
321
322
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323
the relationship between Q and N at DG-01 and DJ-08. 3.3. Conditional nitrate concentration percentile based on discharge In this study, the most suitable copulas for incorporation into the IFM method to provide the conditional nitrate concentration percentiles using discharge at the five stations were suggested as follows (Brechmann et al., 2013; Um et al., 2017):
v Qc ¼ Cq ðbjA ¼ aÞ ¼ Cq ða; bÞ ðA ¼ aÞ va
(9)
where A denotes the marginal discharge distribution, Qc indicates the conditional percentile of the nitrate concentration, and a and b indicate the non-exceedance probability of nitrate concentration and discharge, respectively. The conditional nitrate concentration percentiles for the corresponding non-exceedance probability depends on discharge, as shown in Fig. 9. The non-exceedance probabilities range from 0.2 to 1.0 (Fig. 9). For example, at station D-31, the non-exceedance probability of the relationship between 1000 cm and 7 mg/L is 0.7. That is, the nitrate concentration based on both the 1000 cm and 0.7 non-exceedance probability at station D-31 is 7 mg/L. At the lower discharge, stations DG-01 and DJ-08 exhibited higher nitrate concentration percentiles than the others, indicating that the cumulative frequencies of high nitrate concentration under the low-flow conditions at DG-01 and DJ-08 were relatively higher than those at the other stations. At a higher discharge, D-31 and D32 exhibit lower nitrate concentration percentiles than the other stations. This signifies that the cumulative frequencies of the low nitrate concentration under high-discharge conditions at D-31 and D-32 are relatively higher than those at the other stations. Consequently, for all five stations, the non-exceedance probability of the nitrate concentration depending on discharge (Fig. 9) corresponds well with some relationship (Fig. 3). 4. Conclusions This study investigated the probability of discharge being related to the nitrate concentration. The extent to which nitrate concentration depended on discharge was estimated using the probabilistic non-exceedance probability concept with the most suitable copula models for each station. Most previous studies regarding the relationship between discharge and nitrate concentration were followed a deterministic approach. For this purpose, the copula models by incorporating the IFM method were applied and the discharge and nitrate concentration at five stations of the Lower Illinois River basin, USA were investigated. This study utilized the observed daily discharge and nitrate concentration data from 1972 to 2012. Stations D-31 and D-32 are located along the mainstem, while DG-01, DJ-08, and E-25 are located in tributaries of the Lower Illinois River. The GPA, GEV, LN2, and LN3 distributions and Clayton, Frank, and Gumbel copulas were applied as marginal distributions and Archimedean copula models, respectively, to represent the probabilistic relationship between discharge and nitrate concentration. GOF tests (the PPCC test and Sn statistic) were applied to each copula model to determine which copula models were appropriate for representing discharge and nitrate concentration. L-skewness and L-kurtosis diagrams and extreme value plots were used to assess the marginal distributions of discharge and nitrate concentration at each station. The GPA distribution for discharge and the GPA and GEV distributions were generally wellfitted, excluding the higher-end tail. The GEV distribution provided a good fit for nitrate concentration data from D-31 and D-32, while the GPA distribution was a better fit for DG-01, DJ-08, and E-25
stations, including in the lower-end tail. In addition, the PPCC test was employed and the results of the test were with the critical values to determine the most suitable copula models. The results showed that the three copulas could be applied to all marginal distributions, excluding the LN2 distribution, at DG-01 station, and the LN2 and LN3 distributions at DG-01, DJ-08, and E-25 stations. Finally, the Frank copula model with the GPA-GPA (Q-N) distributions produced the most accurate results for D-31, D-32, and E25, while the Clayton copula model with the LN3-GPA (Q-N) distributions was the optimal model for DG-01 and DJ-08. Using the LN2 distribution as the marginal distribution for the nitrate concentration produced high Sn statistic values; however, the LN2 distribution has been typically used to the distribution of water quality pollutants. Furthermore, the conditional nitrate concentration percentiles for each station were calculated to assess the suitability of the copula models. The nitrate concentration percentiles for stations D31 and D-32 indicated that the probabilities of low nitrate concentration occurring under high-discharge at these stations were relatively higher than those at the other stations. Moreover, for stations DG-01 and DJ-08, the probability of a high nitrate concentration occurring was higher than that for the other stations, even under low discharge. In this study, a method for applying copula models to discharge and nitrate concentration and an approach to selecting the most appropriate copula model for estimating the probability of the nitrate concentration depending on discharge was developed. The results of this study will enable decision makers and users to estimate nitrate concentration based on discharge or vice versa using a probability as opposed to a deterministic approach. Future studies could extend this approach to estimate nitrate concentration with other variables, such as land use, precipitation and seasonality. Acknowledgment This paper was supported by Konkuk University in 2017. References Antonopoulos, V.Z., Papamichail, D.M., Mitsiou, K.A., 2001. Statistical and trend analysis of water quality and quantity data for the Strymon River in Greece. Hydrol. Earth Syst. Sci. 5 (4), 679e691. Bargaoui, Z.K., Bardossy, A., 2015. Modeling short duration extreme precipitation patterns using copula and generalized maximum pseudo-likelihood estimation with censoring. Adv. Water Resour. 84, 1e13. Bezak, N., Mikos, M., Sraj, M., 2014. Trivariate frequency analyses of peak discharge, hydrograph volume and suspended sediment concentration data using copulas. Water Resour. Manag. 28 (8), 2195e2212. Brechmann, E.C., Hendrich, K., Czado, C., 2013. Conditional copula simulation for systemic risk stress testing. Insur. Math. Econ. 53 (3), 722e732. Castillo, E., 2005. Extreme Value and Related Models with Applications in Engineering and Science. Wiley, Hoboken, N.J. Chowdhary, H., Deng, Z.Q., Singh, V.P., 2008. Assessing the effectiveness of agricultural BMPs using the copula approach. In: World Environmental and Water Resources Congress. Ahupua'A. Clayton, D.G., 1978. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65 (1), 141e151. Cohn, T.A., 1995. Recent advances in statistical methods for the estimation of sediment and nutrient transport in rivers. Rev. Geophys. 33 (S2), 1117e1123. Cohn, T.A., 2005. Estimating contaminant loads in rivers: an application of adjusted maximum likelihood to type 1 censored data. Water Resour. Res. 41 (7). Cohn, T.A., Caulder, D.L., Gilroy, E.J., Zynjuk, L.D., Summers, R.M., 1992. The validity of a simple statistical model for estimating fluvial constituent loads: an empirical study involving nutrient loads entering Chesapeake Bay. Water Resour. Res. 28 (9), 2353e2363. De Michele, C., Salvadori, G., 2003. A Generalized Pareto intensity-duration model of storm rainfall exploiting 2-Copulas. J. Geophys. Res. D Atmos. 108 (D2). De Michele, C., Salvadori, G., Canossi, M., Petaccia, A., Rosso, R., 2005. Bivariate statistical approach to check adequacy of dam spillway. J. Hydrol. Eng. 10 (1), 50e57.
D. Park et al. / Journal of Cleaner Production 222 (2019) 310e323 Di Toro, D.M., 1984. Probability Model of Stream Quality Due to Runoff. J. Environ. Eng. 110 (3), 607e628. e, B., 2007. Generalized maximum El Adlouni, S., Ouarda, T.B., Zhang, X., Roy, R., Bobe likelihood estimators for the nonstationary generalized extreme value model. Water Resour. Res. 43 (3), W03410. Filliben, J.J., 1975. The probability plot correlation coefficient test for normality. Technometrics 17 (1), 111e117. Frank, M.J., 1979. On the simultaneous associativity ofF (x, y) and xþy F (x,, y). Aequationes Math. 19 (1), 194e226. Fu, G.T., Butler, D., 2014. Copula-based frequency analysis of overflow and flooding in urban drainage systems. J. Hydrol. 510, 49e58. Genest, C., Favre, A.C., 2007. Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng. 12 (4), 347e368. Genest, C., Ghoudi, K., Rivest, L.P., 1995. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82 (3), 543e552. Genest, C., MacKay, R.J., 1986. Archimedean copulas and two-dimensional law diennes et families de lois families whose margins are given (Copules archime es). Can. J. Stat. 14 (2), 145e159. bidimensionnelles dont les marges sont donne Genest, C., Quessy, J.F., Remillard, B., 2006. Goodness-of-fit procedures for copula models based on the probability integral transformation. Scand. J. Stat. 33 (2), 337e366. Genest, C., Remillard, B., Beaudoin, D., 2009. Goodness-of-fit tests for copulas: a review and a power study. Insur. Math. Econ. 44 (2), 199e213. Geng, R., Wang, X., Sharpley, A.N., Meng, F., 2015. Spatially-distributed costeeffectiveness analysis framework to control phosphorus from agricultural diffuse pollution. PLoS One 10 (8), e0130607. Gumbel, E.J., 1960. Extreme value distributions in several dimensions (Distributions des valeurs extremes en plusiers dimensions), vol 9. Publ. Inst. Statist. Univ., Paris, pp. 171e173. Guo, Y.P., Markus, M., Demissie, M., 2002. Uncertainty of nitrate-N load computations for agricultural watersheds. Water Resour. Res. 38 (10). Hofert, M., Machler, M., 2011. Nested Archimedean copulas meet R: the nacopula package. J. Stat. Softw. 39 (9), 1e20. Hosking, J., Wallis, J., 1986. Paleoflood hydrology and flood frequency analysis. Water Resour. Res. 22 (4), 543e550. Hosking, J.R., 1990. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B 105e124. Hosking, J.R., Wallis, J.R., 1987. Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29 (3), 339e349. Hougaard, P., 1986. A class of multivariate failure time distributions. Biometrika 73 (3), 671e678. Illinois EPA (IEPA), 2007. Water Monitoring Strategy, 2007-2012, IEPA/BOW/07-005. Bureau of Water, Division of Water Pollution Control. Springfield, Illinois. Jenkinson, A.F., 1955. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Q. J. R. Meteorol. Soc. 81 (348), 158e171. Joe, H., 1997. Multivariate Models and Dependence Concepts, first ed. Chapman & Hall, London; New York. Joe, H., 2014. Dependence Modeling with Copulas. Chapman and Hall/CRC. Kao, S.C., Govindaraju, S., 2007. A bivariate frequency analysis of extreme rainfall with implications for design. J. Geophys. Res. D Atmos. 112 (D13). Katz, R.W., Parlange, M.B., Naveau, P., 2002. Statistics of extremes in hydrology. Adv. Water Resour. 25 (8e12), 1287e1304. Kroll, C.N., Vogel, R.M., 2002. Probability distribution of low streamflow series in the United States. J. Hydrol. Eng. 7 (2), 137e146. Li, Z.L., Li, C.C., Xu, Z.X., Zhou, X., 2014. Frequency analysis of precipitation extremes in Heihe River basin based on generalized Pareto distribution. Stoch. Environ. Res. Risk Assess. 28 (7), 1709e1721. Lian, Y.Q., Chan, I.C., Xie, H., Demissie, M., 2010. Improving HSPF modeling accuracy from FTABLES: case study for the Illinois River basin. J. Hydrol. Eng. 15 (8), 642e650. Liu, Y., Cao, S., Zhang, X., Li, F., Li, X., 2018. Joint improvement of river water quality indicators based on a multivariate joint probability distribution of the discharge and water quality. Nord. Hydrol. https://doi.org/10.2166/wst.2016.553. Malamud, B.D., Turcotte, D.L., 2006. The applicability of power-law frequency
323
statistics to floods. J. Hydrol. 322 (1e4), 168e180. Markus, M., Demissie, M., Short, M.B., Verma, S., Cooke, R.A., 2014. Sensitivity analysis of annual nitrate loads and the corresponding trends in the lower Illinois river. J. Hydrol. Eng. 19 (3), 533e543. McIsaac, G.F., David, M.B., Gertner, G.Z., 2016. Illinois river nitrate-nitrogen concentrations and loads: long-term variation and association with watershed nitrogen inputs. J. Environ. Qual. 45 (4), 1268e1275. Meybeck, M., 1993. C, N, P and S in Rivers: from Sources to Global Inputs, Interactions of C, N, P and S Biogeochemical Cycles and Global Change. Springer, Berlin Heidelberg, pp. 163e193. Min, S.K., Zhang, X.B., Zwiers, F.W., Hegerl, G.C., 2011. Human contribution to moreintense precipitation extremes. Nature 470 (7334), 378e381. Mitchell, G., 2005. Mapping hazard from urban non-point pollution: a screening model to support sustainable urban drainage planning. J. Environ. Manag. 74 (1), 1e9. Morgan, E.C., Lackner, M., Vogel, R.M., Baise, L.G., 2011. Probability distributions for offshore wind speeds. Energy Convers. Manag. 52 (1), 15e26. Nelsen, R.B., 2007. An Introduction to Copulas, second ed. Springer, New York. Novotny, V., 2004. Simplified databased total maximum daily loads, or the world is log-normal. J. Environ. Eng. ASCE 130 (6), 674e683. Ouarda, T.B.M.J., Girard, C., Cavadias, G.S., Bobee, B., 2001. Regional flood frequency estimation with canonical correlation analysis. J. Hydrol. 254 (1e4), 157e173. Panno, S.V., Kelly, W.R., Hackley, K.C., Hwang, H.H., Martinsek, A.T., 2008. Sources and fate of nitrate in the Illinois River basin, Illinois. J. Hydrol. 359 (1e2), 174e188. Pickands III, J., 1975. Statistical inference using extreme order statistics. Ann. Stat. 3 (1), 119e131. Rasmussen, P., 1994. The POT Method for Flood Estimation: a Review, Extreme Values: Flood and Droughts, Stochastic and Statistical Methods in Hydrology and Environmental Engineering, pp. 15e23. Silva, R.D., Lopes, H.F., 2008. Copula, marginal distributions and model selection: a Bayesian note. Stat. Comput. 18 (3), 313e320. Sklar, M., 1959. Distribution Functions with N Dimensions and their Margins N Dimensions Et Leurs Marges), 8. Publ. Inst. Statist. partition A (Fonctions De Re Univ. Paris, pp. 229e231. Smith, R.L., 1989. Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Stat. Sci. 4 (4), 367e377. Sraj, M., Bezak, N., Brilly, M., 2015. Bivariate flood frequency analysis using the copula function: a case study of the Litija station on the Sava River. Hydrol. Process. 29 (2), 225e238. Stedinger, J.R., Vogel, R.M., Foufoula-Georgiou, E., 1993. Frequency analysis of extreme events. In: Maidment, D.R. (Ed.), Handbook of Hydrology. McGraw-Hill, New York. Syvitski, J.P., Morehead, M.D., Bahr, D.B., Mulder, T., 2000. Estimating fluvial sediment transport: the rating parameters. Water Resour. Res. 36 (9), 2747e2760. Um, M.J., Joo, K., Nam, W., Heo, J.H., 2017. A comparative study to determine the optimal copula model for the wind speed and precipitation of typhoons. Int. J. Climatol. 37 (4), 2051e2062. Van Buren, M., Watt, W., Marsalek, J., 1997. Application of the log-normal and normal distributions to stormwater quality parameters. Water Res. 31 (1), 95e104. Vogel, R.M., Kroll, C.N., 1989. Low-flow frequency analysis using probability-plot correlation coefficients. J. Water Resour. Plan. Manag. 115 (3), 338e357. Vogel, R.M., Rudolph, B.E., Hooper, R.P., 2005. Probabilistic behavior of water-quality loads. J. Environ. Eng. ASCE 131 (7), 1081e1089. Vogel, R.M., Stedinger, J.R., Hooper, R.P., 2003. Discharge indices for water quality loads. Water Resour. Res. 39 (10). Vogel, R.M., Wilson, I., 1996. Probability distribution of annual maximum, mean, and minimum streamflows in the United States. J. Hydrol. Eng. 1 (2), 69e76. Wang, Y., Wang, P., Bai, Y.J., Tian, Z.X., Li, J.W., Shao, X., Mustavich, L.F., Li, B.L., 2013. Assessment of surface water quality via multivariate statistical techniques: a case study of the Songhua River Harbin region, China. J. Hydro-Environ. Res. 7 (1), 30e40. Wang, X., Zang, N., Liang, P., Cai, Y., Li, C., Yang, Z., 2017. Identifying priority management intervals of discharge and TN/TP concentration with copula analysis for Miyun Reservoir inflows, North China. Sci. Total Environ. 609, 1258e1269.