Developing a dual entropy-transinformation criterion for hydrometric network optimization based on information theory and copulas

Developing a dual entropy-transinformation criterion for hydrometric network optimization based on information theory and copulas

Journal Pre-proof Developing a dual entropy-transinformation criterion for hydrometric network optimization based on information theory and copulas He...

6MB Sizes 0 Downloads 23 Views

Journal Pre-proof Developing a dual entropy-transinformation criterion for hydrometric network optimization based on information theory and copulas Heshu Li, Dong Wang, Vijay P. Singh, Yuankun Wang, Jianfeng Wu, Jichun Wu, Ruimin He, Ying Zou, Jiufu Liu, Jianyun Zhang PII:

S0013-9351(19)30610-3

DOI:

https://doi.org/10.1016/j.envres.2019.108813

Reference:

YENRS 108813

To appear in:

Environmental Research

Received Date: 12 September 2018 Revised Date:

4 October 2019

Accepted Date: 7 October 2019

Please cite this article as: Li, H., Wang, D., Singh, V.P., Wang, Y., Wu, J., Wu, J., He, R., Zou, Y., Liu, J., Zhang, J., Developing a dual entropy-transinformation criterion for hydrometric network optimization based on information theory and copulas, Environmental Research (2019), doi: https://doi.org/10.1016/ j.envres.2019.108813. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Inc.

Title: Developing a dual entropy-transinformation criterion for hydrometric network optimization based on information theory and Copulas

Authors: Heshu Li1, Dong Wang*1, Vijay P. Singh2, Yuankun Wang1, Jianfeng Wu*1, Jichun Wu1, Ruimin He3, Ying Zou3, Jiufu Liu3, Jianyun Zhang3 1

Key Laboratory of Surficial Geochemistry, Ministry of Education, Department of Hydrosciences,

School of Earth Sciences and Engineering, State Key Laboratory of Pollution Control and Resource Reuse, Nanjing University, Nanjing, P.R. China 2

Department of Biological and Agricultural Engineering, Zachry Department of Civil Engineering, Texas A & M University, College Station, TX77843, USA 3

Nanjing Hydraulic Research Institute, Nanjing, P.R. China‐

(*Corresponding author: Dong Wang, [email protected]; Jianfeng Wu, [email protected].)

1

Abstract: Hydrometric information collected by monitoring networks is fundamental for effective management of water resources. In recent years, entropy-based multi-objective criterions have been developed for the evaluation and optimization of hydrometric networks, and copula functions have been frequently used in hydrological frequency analysis to model multivariate dependence structures. This study developed a dual entropy-transinformation criterion (DETC) to identify and prioritize significant stations and generate candidate network optimization solutions. The criterion integrated an entropy index computed with mathematical floor function and a transinformation index computed with copula entropy through a tradeoff weight. The best fitted copula models were selected from three Archimedean copula families, i.e., Gumbel, Frank and Clayton. DETC was applied to a streamflow monitoring network in the Fenhe River basin and two rainfall monitoring networks in the Beijing Municipality and the Taihu Lake basin, which covers different network classification, network scale, and climate type. DETC was assessed by the commonly used dual entropy-multiobjective optimization (DEMO) criterion and was compared with a minimum transinformation (MinT) based criterion for network optimization. Results showed that DETC could effectively prioritize stations according to their significance and incorporate decision preference on information content and information redundancy. Comparison of the isohyet maps of two rainstorm events between DECC and MinT showed that DETC had advantage of restoring the spatial distribution of precipitation. Key words: Information theory; Copula functions; Hydrometric networks; Multi-objective optimization

2

1. Introduction Hydrometric information, which is mainly collected by monitoring networks, constitutes the fundamental input for planning, design, operation, and management of water resources systems (Li et al., 2012). An adequately designed hydrometric network with optimal siting of monitoring gauges is required to efficiently collect the needed information. A number of methods or criteria have been developed for hydrometric network evaluation and design. The WMO (2008) minimum network density recommendation, based on physiographic units, has been used as a benchmark in several studies (Coulibaly et al., 2013, Leach et al., 2015; Mishra and Coulibaly, 2010; Samuel et al., 2013). Mishra and Coulibaly (2009) reviewed the methodological development in surface water network design and categorized the commonly used approaches. Chacon-Hurtado et al. (2017) reviewed the existing applications in rainfall and streamflow sensor network design and classified the methods for network evaluation as: (1) statistics-based methods; (2) information-theory-based methods; (3) methods based on expert recommendations; and (4) other methods such as fractal-based methods. Among the various approaches, the entropy-based methods have been proved to be efficient. The concept of entropy proposed by Shannon (1948) defines information and quantifies uncertainty of a random variable or its underlying distribution. With the entropy concept, the spatial and temporal variability of hydrometric time series can be investigated (Zhang et al., 2016). It is a nonparametric method without any assumptions regarding a specific distribution pattern. Entropy-based approaches have been widely used in the evaluation and design of hydrometric networks, including rainfall gauge networks (Chebbi et al., 2011; Chen et al., 2008; Ridolfi et al., 2011; Su and You, 2014; Volkmann et al., 2010), streamflow gauge networks (Alfonso et al., 2012; Leach et al., 2015; Li et al., 2012; Mishra and Coulibaly, 2010; Mishra et al., 2014; Samuel et al., 2013), water level monitoring networks (Alfonso et al., 2010, 2014; Leach et al., 2016), and water quality monitoring networks (Mogheir and Singh, 2002; Mogheir et al., 2009; Ozkul et al., 2000). Keum et al (2017) summarized the common entropy terms and recent entropy applications to water monitoring network design. Most developed models have combined entropy-based methods and multi-objective optimization for network design, and proposed different criterions regarding entropy terms such as joint entropy, transinformation and total correlation. Yang and Burn (1994) defined a directional information transfer index (DIT) as a measure of the information flow between gauging stations. Alfonso et al. (2010) applied DIT, together with a total correlation index, to develop a method for design of water level monitoring networks in polders (WMPs). Chen et al. (2008) used minimum transinformation (MinT) as a criterion for prioritizing rainfall gauges. Mishra and Coulibaly (2010) used transinformation index (TI) to identify important stations as well as critical areas, and used seasonal streamflow information (SSI) to evaluate the effect of seasonality (Mishra et al., 2014). In recent 3

years, a multi-objective optimization criterion combining joint entropy and total correlation, also known as dual entropy-multiobjective optimization (DEMO), has been widely used in hydrometric network evaluation (Alfonso et al., 2012, 2014; Keum et al., 2018; Leach et al., 2015, 2016; Li et al., 2012; Ridolfi et al., 2014; Samuel et al., 2013). Li et al. (2012) incorporated transinformation and proposed an MIMR criterion, and compared it with Alfonso’s WMPs (Alfonso et al., 2010) criterion through two cases. Samuel et al. (2013) proposed a combined regionalization and dual entropy-multiobjective optimization (CRDEMO) method for designing optimal minimum hydrometric networks. Generally, network optimization aims at increasing total information and decreasing overlapping information. A fundamental basis of the entropy-based approach is that stations should share as little information as possible, meaning that the stations are more independent of each other (Leach et al., 2015; Mishra et al., 2014). Transinformation (or mutual information) is a typical measure of the shared information between two random variables. However, mutual information remains difficult to calculate because of the estimation of multivariate joint distributions. Although the problem can be addressed by distribution assumption, kernel density estimation, or histogram-based technique, each of these methods has its own merits and demerits and no one is proved better than the others. Copula functions introduced by Sklar (1959), which are multivariate distribution functions with uniform margins, provide a way of constructing joint distributions of random variables independently of their marginal distributions through dependence modelling (Favre et al., 2004; Nelsen, 2006; Vandenberghe et al., 2011). Currently, copulas are widely applied in hydrological frequency analysis, including drought modeling (Shiau, 2006; Song and Singh, 2010a, 2010b; Xu et al., 2015), intensity-duration modeling of rainfall extremes (Kao and Govindaraju, 2007; Li et al., 2019; Michele and Salvadori, 2003; Vandenberghe et al., 2010, 2011; Zhang and Singh, 2007), and peak-volume-duration modeling of floods (Favre et al., 2004; Genest et al., 2007; Tong et al., 2014; Zhang and Singh, 2006). In these studies, copula functions from the Archimedean family are most frequently used, e.g., the Gumbel copulas, the Frank copulas, and the Clayton copulas. Recent hydrologic studies have dealt with the integration of entropy and copulas. A detailed introduction can be found in the work of Hao and Singh (2015), in which two broad branches of integration were reviewed. A straightforward integration of entropy and copulas is to derive the joint distribution function based on the principle of maximum entropy (POME) or minimum cross-entropy (Hao and Singh, 2012, 2015; Kong et al., 2015; Liu et al., 2017). Another intensive coupling of entropy and copulas is the estimation of mutual information with the copula function (Xu et al., 2017). Ma and Sun (2011) proved that mutual information is actually negative copula entropy and provided a simple method for estimating mutual information. Chen et al. (2014) provided a 4

computation technique based on sample estimation. Copula entropy, as a nonparametric method, avoids the estimation of joint probability density function, which makes it simple and less computationally burdensome to measure the dependence of variables (Hao and Singh, 2015). In this study, we adopted copula entropy to calculate the transinformation index to measure the shared information among hydrometric stations. Combined with a marginal entropy index, a dual entropy-transinformation criterion (DETC) was developed to identify significant stations for network evaluation and optimization. A streamflow monitoring network in the Fenhe River Basin and two rainfall monitoring networks respectively in the Beijing Municipality and the Taihu Lake Basin were used as study areas to illustrate the application of DETC. The paper is organized as follows. Section 2 introduces the basic entropy and copula terms. Section 3 illustrates the establishment of DETC and its assessment using the dual entropy-multiobjective optimization (DEMO) criterion for streamflow and rainfall monitoring networks. Section 4 presented the application of DETC through three case studies. Conclusions are drawn in Section 5. 2. Entropy and copula concepts 2.1. Entropy concept The entropy concept in information theory domain was defined by Shannon (1948) as a measure of the uncertainty of random variables, also regarded as the information content (Cover and Thomas, 2006). Suppose X denotes a hydrometric variable, e.g., a rainfall or streamflow variable recorded by a monitoring station over a period of time. The marginal entropy of X is defined as:

 = − ∑    log  

(1)

where p(xi) is the probability of the value xi, and n is the number of elementary events (bins). The unit of entropy depends on the base of logarithm, e.g., bit for base 2 and nat for base e. Equation (1) can be promoted to measure the joint entropy H(X, Y) of two random variables X and Y:  ,  = − ∑

 ∑  ,   log  ,  

(2)

where p(xi, yj) is the joint probability of variables X and Y with m and n events. Concerning

the

shared

(overlapping)

information

between

two

random

variables,

transinformation (or mutual information) is defined as    ,  = ∑

 ∑  ,   log  

 ,  



(3)

It is a general measurement capturing both linear and nonlinear dependence (Li et al., 2012). Transinformation can be referred to as the information of X (or Y) inferred from that of Y (or X), or the reduction of uncertainty in X (or Y) given Y (or X), whereas the remaining uncertainty in X (or Y) given Y (or X) can be defined by the conditional entropy:

 | = − ∑

 ∑  ,   log  | 

(4) 5

and

| =  − , 

(5)

For multiple variables X1, X2,…, XN denoting N monitoring stations, there is a natural extension of equation (1). The multivariate joint entropy can be defined as:

 !   , … , !  = − ∑

 ∑ ⋯ ∑#  , , , , … , !,#  log  , , , , … , !,# 

where  , , , , … , !,#  is the joint probability of values , , , , … , !,# .

(6)

To assess the redundant information among stations, the concept of total correlation is defined as (McGill, 1954; Watanabe, 1960) :

$  , … , !  = ∑!

   −   , … , ! 

(7)

Total correlation is the multi-variate extension of mutual information and can be calculated using the grouping property (Alfonso et al., 2010). The value of total correlation is always non-negative and equals zero if and only if all variables are independent. 2.2. Archimedean Copulas The copula concept was first introduced by Sklar (1959) for dependence modeling. One may refer to the general textbooks of Nelsen (2006) for detailed information about the copula theory and related models. The present study will focus on the bivariate cases. Given two random variables X and Y that are uniformly distributed on the unit interval [0,1], the joint cumulative distribution function (CDF) can be formulated as:

%&,'  ,  = $(), *+, ), * ∈ (0,1+

(8)

where u=FX(x) and v=GY(y) represent the marginal CDFs of X and Y, and C: [0,1]2→[0,1] represents the copula function. Analogous to the probability density function, a copula density function can be defined as:

/), * =

0 1 23,4 0304

(9)

It can be easily deduced that:

/), * = 55

(10)

$), * = 6 7 (6) + 6*+

(11)

5,

Specifically, an Archimedean copula is constructed by a generating function 6:

where 6: (0,1]→[0,∞) is strictly decreasing and satisfies 61 = 0.

A generator uniquely determines an Archimedean copula. Three most commonly used Archimedean copula families are the Gumbel, Frank, and Clayton copulas. Information of the above copula families are presented in Table 1, including the mathematical expressions, generator functions, and so on. 6

Insert Table 1 here.

The methods for estimating copula parameter include fully parametric method (including the classical maximum likelihood estimator – MLE, and the inference function for marginal – IFM), semiparametric method, and nonparametric method (Tong et al., 2014). The IFM method decomposes the estimation of the marginal distribution parameter α, β and the copula parameter θ into two stages. The semiparametric method derived from IFM substitutes the marginal CDFs with empirical CDFs. In addition, for the three aforementioned Archimedean families, θ can be directly estimated through Kendall’s τ (see Table 1). To ensure the computational accuracy, MLE method was adopted in this study. For bivariate cases, the log-likelihood function of MLE method can be expressed as:

9: = ∑  ln < ,   = ∑  => /(%&  ; @, %'  ; A; B+ + ∑  =>C<&  ; @<'  ; AD

(12)

where Θ=(α,β;θ) denotes the parameter vector, among which α, β are parameters of the marginal distributions FX(x) and FY(y), θ is the copula parameter, and n denotes the sample size. According to the extremum principle, the estimator of Θ is calculated as: :EFGH = arg max 9:

(13)

and

P 0S = 0 N0QR =0 O 0T N0QR = 0 M 0U 0QR

(14)

Goodness-of-fit tests help choose a best-fit copula model. Commonly used methods include the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), which are formulated as:

AIC = > lnMSE + 2]

(15)

BIC = > lnMSE + ] ln > where

(16)

MSE =  ∑ _ −   

(17)

is the mean square error, pi denotes the theoretical joint probability, pei denotes the empirical joint probability, n denotes the sample size, and m denotes the number of parameters (Zhang and Singh, 2006; Genest and Favre, 2007; Song et al., 2012). 2.3. Copula entropy 7

Copula entropy defined by Nelsen (2006) provides a method for the estimation of

transinformation. Let /) , ) , … , )!  be copula density function of random variables X1, X2,…, XN

with marginal cumulative distribution functions (CDFs) ) , ) , … , )! , then the corresponding copula entropy can be expressed as:

` ) , ) , … , )!  = − a ⋯ a /) , ) , … , )!  log /) , ) , … , )! b) b) ⋯ b)!

(18)

Ma and Sun (2011) proved that the transinformation of random variables equals their negative copula entropy:

 ,  , … , !  = −` ) , ) , … , )! 

(19)

In the bivariate case, that is:

,  = −` ), *

(20)

Furtherly, Chen et al. (2014) provided the method for sample estimation:

` ), * = −c(log /), *+

(21)

where n denotes the sample size. 3. Dual entropy-transinformation criterion (DETC) This study developed a dual entropy-transinformation criterion (DETC) to rank the stations in a hydrometric network, the validity of which was assessed by the dual entropy-multiobjective optimization (DEMO) criterion (Alfonso et al., 2012; Samuel et al., 2013). 3.1. Hydrometric network evaluation based on DETC The DETC criterion is a combination of two indexes, the entropy index E and the transinformation index T, which are further integrated through a tradeoff weight λ into an index D. Let X1, X2,…, XN denote N hydrometric stations within the study area. Indexes E, T, and D can be computed for each single station, and the station ranks are obtained accordingly. A DETC based evaluation of the network is conducted following the steps below: 

Computation of entropy index E.

The value of index E of a station Xi is computed as the Min-Max normalization of its marginal entropy H(Xi), that is:

 c = def]( + = hkig&

g& 7h ig& j m, n  j7h ig& j

= 1,2, … , d.

(22)

Prior to the calculation of marginal entropy, a mathematical floor function formulated as p = q r

sk k

t

(23)

was used for data discretization, where u∙w represents the conventional mathematical floor

function. Thus, a continuous value x is converted to its nearest lowest integer multiple of a constant a (Alfonso et. al., 2010). Li et al. (2012) presented the advantages of mathematical floor 8

function and the guidance for determining an appropriate a. A station with higher index E value implies larger information content and is supposed to rank higher in the network. 

Computation of transinformation index T.

The value of index T of station Xi measures the transinformation it shares with the other (N-1) stations. A stations with lower T value shares less overlapping information with other stations, thus are more significant and deserves a higher rank. To obtain T, a simulated observation series of Xi is first calculated through multiple linear regression using the observations from the other (N-1) stations, denoted as XiMLT. The stepwise technique can be applied in the process to avoid overfitting. Accordingly, index T is expressed as the Min-Max normalization of the transinformation between Xi and XiMLT:

 {|} x = def]( ,  FGy + = hkiz& ,&

z& ,&

7h iz& ,&{|} j

{|} j7h iz& ,&{|} j



Specifically,  ,  FGy  is calculated with copula entropy.



m, n = 1,2, … , d.

(24)

Computation of integrated index D. The index D is constructed as an objective function integrating E and T with a tradeoff weight λ:

~ = c − 1 − x  ∈ 0,1, m = 1,2, … , d.

(25)

To convert the values into interval [0,1], ~ is calculated as the Min-Max normalization of ~ :

~ = def]C~ D = hki€

€ 7h i€ j

 j7h i€ j

m, n = 1,2, … , d.

(26)

A station with higher D value relatively contains more marginal information and less overlapping information, which is more significant in the network. On the contrary, a lower D value implies that duplicated information takes up larger proportion in the information content of a station. Therefore, DETC criterion can be used to rank the stations according to their priorities, as well as identify critical areas where existing stations should be reserved, additional stations can be located, and redundant stations can be removed. The tradeoff weight λ reflects the decision preference concerning increasing information content and cutting information redundancy, which directly influences the optimization solutions. 3.2 Assessment of DETC with the dual entropy-multi-objective optimization (DEMO) criterion The dual entropy-multiobjective optimization (DEMO) criterion was used in this study for the assessment of DECC. Basically, DEMO criterion regards the network with more total information and less total correlation as more effective, which are measured respectively by joint entropy and total correlation: 9

 =  ,  , … , ! 

(28)

$ = $ ,  , … , ! 

(29)

Generally, increasing joint entropy (H) and decreasing total correlation (C) are conflicting objectives. For this reason, the ideal configuration N stations should simultaneously fulfill both objectives (Alfonso et al., 2012; Samuel et al., 2013). For comparison, we adopted another criterion for hydrometric network optimization, which also ranks the stations according to their priorities, i.e., the minimum transinformation (MinT) based criterion proposed by Chen et al. (2008). For a network containing N monitoring stations, the MinT criterion is expressed as: (1) the station with the maximum marginal entropy H(xi) is selected as the first station x1; (2) the station providing the minimum transinformation with x1, i.e., minimum I(x1,xi), is ranked as the second station x2. (3) analogously, the ith 2 ≤ m ≤ d) station is determined as the one with the minimum mutual information   ,  , … , 7 ; .

In this study, we assessed the station ranks generated from DETC and MinT respectively using the DEMO indexes H and C. The process was conducted following the steps below: 

Step 1. Rank the monitoring stations according to the D values of DETC criterion with tradeoff weight λ taken as different values, e.g., λ = 0.1, 0.3, 0.5, 0.7, 0.9, and obtain the corresponding station priority sequences;



Step 2. Rank the stations according to MinT criterion and obtain the corresponding sequence;



Step 3. Network optimization solutions are obtained by selecting the first n stations in each sequence in Step 1 and Step 2 (n can be a certain proportion of the total number of all stations, e.g., 20%, 33%, 50%, 67%, 80%, etc.);



Step 4. Compute the joint entropy (H) and total correlation (C) values of each optimization solution in Step 3 and compare the results generated by DETC and MinT.

The flowchart of the process is shown in Fig. 1.

Insert Fig. 1 here.

4. Application In this section, we applied DETC to a streamflow gauge network in the Fenhe River basin, a rainfall monitoring network of Beijing Municipality, and a rainfall monitoring network in the Taihu Lake basin. The three cases represent different optimization scenarios concerning network classification (streamflow and rainfall), network scale (12 streamflow monitoring stations in Case 1, 10

18 rainfall gauge stations in Case 2 and 95 rainfall gauge statioins in Case 3) and climatic feature (Case 1 and Case 2 for semi-humid climate in northern China and Case 3 for humid climate in southern China). In all cases, there is a need for evaluation and optimization of the networks to economically provide accurate hydrometric information. 4.1. Case 1: Optimization of a streamflow monitoring network in the Fenhe River basin 4.1.1. Study area and data The Fenhe River is located in the Shanxi Province of China and is the second largest tributary of the Yellow River. Its basin covers a drainage area of 38,728 km2, with a total length of 710 km and annual runoff volume of 2.286×109 m3. Location of the Fenhe River basin and the 12 streamflow monitoring stations (numbered as S01-S12) is shown in Fig. 2. The upper basin suffers from serious soil and water losses due to densely distributed gullies; the middle basin restricted by high mountains and deep gullies has unstable river channel. The lower basin has a gentle gradient and open land suitable for agricultural irrigation. The Fenhe River basin is a semi-arid area with temperate monsoon climate. The annual mean temperature is 9–12

. The annual precipitation varies from 300 mm to

750 mm, and 60–80 % of which falls in the flood season (June to September), causing a high risk of hydrogeological disasters. The precipitation has obvious spatial variation, which declines from south to north. Monthly runoff data of the 12 stations available in the flood season (April to September) of 5 years (2008-2012) was used for analysis.

Insert Fig. 2 here.

4.1.2. Parameter sensitivity analysis and copula selection The entropy index E was calculated using the mathematical floor function, and copula modeling was conducted to obtain the index T. Index D was obtained by integrating E and T with a tradeoff weight λ. In this section, we analyze the sensitivity of the index values to the floor function parameter a, the copula family selected, and the weight parameter λ. The determination of parameter a in the mathematical floor function for data discretization remains a problem in entropy computation. In this study, sensitivity analysis was performed to find an optimal a value. Taking a=1, 2, 5, 10, 20 respectively, the index E values and the corresponding ranks of the 12 stations are presented in Table 2. As can be seen, ranks of the 12 stations were generally stable as a varies. According to Li et al. (2012), a reasonable a value should guarantee that all candidate stations had significant and distinguishable information contents. The table shows that with the increase of a, the ranging scale of E turned smaller and equal values were more frequent 11

among stations: three 0.00 values for a=5; three 0.00 values and two 0.36 values for a=10; and five 0.00 values, two 0.14 values and two 1.00 values for a=20. For a>5, the filtering effect of discretization is obvious, e.g. the E values of stations S08, S10, and S12 were all measured as 0.00. According to Table 2, a=1 was adopted for subsequent analysis to ensure distinguishable entropy values among stations.

Insert Table 2 here.

The maximum likelihood estimator (MLE) method was used for copula parameter estimation. The best-fit copula was selected from three Archimedean families using the goodness-of-fit tests with criterions AIC and BIC. Comparison of the AIC and BIC values of the candidate copula families is shown in Table 3. It can be found that AIC and BIC yielded consistent results: the Gumbel copula was adopted by nine stations as the best fitted copula; the Frank copula was best fitted for the other three stations, and the Clayton copula was not adopted by any station. With the copula function selected, the index T values were computed through copula entropy. Table 4 presents the T values of each station calculated using the corresponding best copula compared with using other copulas. As is seen, the Gumbel and the Frank copula generated similar T values and station ranks, while the Clayton copula had relatively larger discrepancy, especially for stations S02, S03, S05, and S11, which indicated that the effect of copula selection on index calculation was non- negligible.

Insert Table 3 here. Insert Table 4 here.

The tradeoff weight λ of the DETC criterion reflects the decision preference concerning increasing information content and cutting redundancy. Fig. 3 shows the variation of index D as parameter λ shifts. As is seen, the stations had different degree of sensitivity to the tradeoff weight. Stations S01, S05 and S06 generally held stable D values as λ increased, while stations S03, S07, S09 and S11 had increased D and stations S04, S08, S10 and S12 had decreased D values. When λ was taken as a low value, e.g., 0.1, stations that shared less overlapping information were preferred and would score high for D index, including stations S01, S04, S06 and S10. On the contrary, in the case of λ=0.9, stations S01, S07, S09 and S11 reached high D values. Specifically, S01, S05 and S06 remained highly ranked as λ varied from 0 to 1, indicating that the three stations were more significant in the network. 12

Insert Fig. 3 here.

4.1.3. Network optimization and criterion assessment The network evaluation results of the two criterions – the DETC criterion with varying λ values (λ = 0.1, 0.3, 0.5, 0.7, 0.9), and the MinT criterion, are illustrated in Fig. 4, in which stations ranked high (No.1-No.3) and low (No.10-No.12) in each scenario were highlighted in colors. It is seen that in cases when the weight parameter λ was small (e.g., λ = 0.1), stations located at the upper stream or tributaries of the Fenhe River got high priority, such as stations S04 and S10, meaning these stations were useful for cutting information redundancy; those located at the lower or main stream were the least significant, such as stations S09 and S11, meaning the stations were important for acquiring more information content. The cases when λ is large (e.g., λ = 0.9) was just the opposite. Stations located at the main stream such as S07, S09 and S11 were ranked high while those located at tributaries such as S08, S10 and S12 got low ranks. However, a balance was seen in the case when λ = 0.5, i.e., no preference on entropy index E or transinformation index T, and stations of high and low priority showed no preference at specific location of the river. By contrast, evaluation results of the MinT criterion prioritized stations at the tributaries of the river, e.g., S10 and S12, while those at the main stream, e.g., S05, S07 and S11 were ranked low. This could be explained as MinT criterion basically aimed at selecting stations with less overlapping information rather than more information content.

Insert Fig. 4 here.

Network optimization results were assessed by the DEMO criterion following the steps in Section 3.2, as is shown in Table 5. For simplicity, we assessed five optimizations solutions, the number of stations of which were respectively 20%, 33%, 50%, 67%, and 80% of the total station number. For each percentage, the first certain number of stations ranked high were selected to compose the station combination as an optimization solution, and joint entropy (H) and total correlation (C) values were calculated for each scenario. It is seen that DETC had an obvious advantage of reaching large H, i.e., total information, and disadvantage of reaching large C, i.e., total redundancy, at the same time. DETC produced the maximum H values for four solutions and the minimum C values for all the five solutions. By comparison, the MinT criterion tended to generate solutions with less information content and naturally less information redundancy, as most of the corresponding H and C values were lower than those of DETC. However, as the network enlarged, index values of the MinT criterion increased, e.g., in the case of 80% of stations were selected, H 13

reached the maximum and C surpassed those of the DETC criterion. The results indicated that DETC could produce optimization solutions with reasonable information content and redundancy at an appropriate station number.

Insert Table 5 here.

4.2. Case 2: Optimization of a rainfall monitoring network in Beijing Municipality 4.2.1. Study area and data Beijing Municipality lies in the North China Plain, covering an area of 16,410.54 km2, with 62% of which in plain area and 38% in mountainous or hilly area. There are 18 rainfall monitoring stations (numbered as G01-G18) in total (Fig. 5). The elevation of the area decreases from the northwest to the southeast. The area features a typical semi-humid continental monsoon climate, with an annual temperature of 10~12

and precipitation of 585 mm. Around 80% of the precipitation

concentrated in summer (June, July and August). Rivers originated from the northwest mountains meander through the plain area and afflux into the Bohai sea. Daily precipitation data from 1981 to 2010 was used to generate consecutive multi-day rainfall series for network evaluation. To obtain the series, for each station, rainfall records separated by one day or longer interval, i.e., no precipitation recorded for consecutive 24 hours, were regarded as individual rainfall events. Daily precipitation belonging to the same rainfall event were summed up to form the multi-day rainfall series.

Insert Fig. 5 here.

4.2.2. Parameter sensitivity analysis According to DETC, the indexes E and T were computed for each rainfall monitoring station. In this case, we adopted a=1 as the floor function parameter, as case 1 showed that it generated the most distinguishable entropy values. Using AIC and BIC criterions, it was proved that the Gumbel copula was the best fitted model for all the 18 stations, therefore was adopted to compute transinformation. With E and T index values determined, the network evaluation and optimization results was supposed to depend on the tradeoff weight parameter λ. Fig. 6 shows the variation of index D values as λ varied. Is is observed that stations G03, G07 and G18 maintained stable and relatively high D values, and station G08 maintained low D values. These stations were less sensitive to parameter selection. Stations G01, G05, G09, G12, G14 and G17 showed an increasing trend of D as λ turned larger, meaning their importance of improving the information content level. Stations G02 and G11 14

showed decreasing trend of D, which could be explained as they were more dominant for cutting information redundancy.

Insert Fig. 6 here.

4.2.3. Network optimization and criterion assessment Fig. 7 illustrates the optimization results corresponding to DETC with λ respectively taken as 0.1, 0.3, 0.5, 0.7, 0.9 and the MinT criterion, in which the stations of high and low ranks are highlighted. As is shown, DETC prioritized the stations located in the north an west mountainous areas, e.g., stations G03 and G18, while those located in the southeast urban areas were regarded as less significant, e.g., G12, G15 and G08. Station G04 located in the mountain-plain transition area was also ranked high. Exception include G11, which had its rank changed from the top to the bottom as λ shifted from 0.1 to 0.9, and G17, which were highly ranked though located in the plain area. By contrast, significant stations identified by the MinT criterion showed no correlation with the topography. Stations G10 and G08, which were prioritized, featured flat terrain. Also, the insignificant stations identified, i.e. G01, G17 and G18, had obvious discrepancy with those identified by DETC.

Insert Fig. 7 here.

Table 6 shows the assessment of the optimization results with the DEMO criterion. With station numbers set as 20%, 33%, 50%, 67% and 80% of the total stations in the network, joint entropy (H) and total correlation (C) were computed for optimization solutions produced by DETC with different λ and MinT. Analogous to case 1, DETC produced most of the solutions with the maximum H values, while MinT produced all the minimum C values. This indicated the advantage of DETC in reaching large information content as well as its shortage in cutting redundancy. However, it is noteworthy that as the station number increases, solutions of MinT also yielded high joint entroy values, indicating its good performance for optimization of relatively large scale networks.

Insert Table 6 here.

4.3. Case 3: Evaluation of a rainfall network in the Taihu Lake Basin 4.3.1. Study area and data The study area is located in the Taihu Lake basin, covering an area of 13,480 km2, with 95 15

rainfall monitoring stations (numbered as G01-G95) distributing on mountains, hills and plains (Fig. 8). The area is covered by crisscross stream channels. Rivers originate from the mountains and independently drain into the Taihu Lake or the plain. The basin features a subtropical monsoon climate, which has four distinct seasons with average annual temperature of 15-17

and

precipitation of 1177 mm. The summer monsoon brings plentiful precipitation and the resulting rainstorms can lead to urban flood. As a result of the comprehensive effect of climate and terrain factors, precipitation has a decreasing trend from the south mountain area to the north plain area. Daily precipitation data from 2006 to 2012 of 95 rainfall gauges were used to generate consecutive multi-day rainfall series for network optimization. The data processing procedure was analogous to that in case 2.

Insert Fig. 8 here.

4.3.2. Parameter sensitivity analysis In order to apply DETC to the rainfall monitoring network in this case, index E and T values were first computed with the floor function parameter set as a=1 and the Gumbel copula as the best fitted copula. Analysis of the sensitivity of index D to the tradeoff weight parameter λ was conducted, and the results is shown in Fig. 9. As can be seen, most of the stations had varied D values as λ increased. Stations with large variation included G25, G36-G43, G45-G55, which had increasing D, and stations G76-G78 and G84-G87, which had decreasing D. As can be inferred, the former stations were critical to enlarging the total information content and the later stations were useful for reducing redundancy. By comparison, stations G09-G12, G19, G44, G68-G72, G79-G82, G93-G95 showed little variation and held relatively stable and medium D values. The results indicated that the determination of λ was crucial for the evaluation process and optimization solutions.

Insert Fig. 9 here.

4.3.3. Network optimization and criterion assessment As the rainfall monitoring stations are densely distributed in the area, in this case, we regionalized the index values using the kriging interpolation method to present an overview of the spatial variation. Fig. 10 presents the distribution of indexes E and T. As can be seen, the variation of E was generally consistent with the elevation of the study area. The south mountain and middle hill areas got high E values, which could be explained as the undulating topography caused larger variation of precipitation and thus larger marginal entropy. The north plain area, as expected, 16

presented low E values, since the precipitation there was relatively stable. By contrast, the index T values had a discrepancy with the terrain, i.e., area with high T values stretched from the south mountain to the north plain. In the middle of the study area, the highlighted area was westward moved compared with case of the E index. Note that the stations of this region were more concentrated, which might cause information redundancy. From the results, it can be deduced that gauges with more marginal information generally contained more duplicated information, while exceptions might occur as a result of network configuration.

Insert Fig. 10 here.

Fig. 11 presents the distribution of index D with tradeoff weight λ = 0.1, 0.5 and 0.9, respectively, representing different decision preference. As is shown, in case of λ = 0.1, the index D values had an increasing trend from the southwest mountain area to the northeast plain area, meaning the northeast stations were more important for cutting duplicated information. However, in case of λ = 0.9, when the need for improving information content was strengthened, the variation of D was just the opposite, i.e., decreasing from south to north, indicating the significance of stations in the mountain area. The case of λ =0.5 was a transitional state of the former two cases, since equal weight was given to the entropy and transinformation indexes. High D values appeared not only in the southwest mountain area but also the middle and northeast areas near the Taihu Lake, while low D values appeared in the northwest plain area. Obviously, there was an equilibrium effect on indexes E and T. The stations and areas highlighted by DETC, were either indicating high importance, meaning additional gauges were needed, or of low importance, meaning redundant stations were required to be removed.

Insert Fig. 11 here.

To make further analysis, the optimization results obtained from DETC as well as the MinT criterion were assessed with DEMO. Analogous to case 1 and case 2, candidate optimization solutions containing 20%, 33%, 50%, 67% and 80% of the 95 stations were generated by selecting the highly ranked stations corresponding to different criterions, and the joint entropy (H) and total correlation (C) of each solution were computed. Results are shown in Table 7. As H and C increased with the station number, it is observed that for four out of the five solution groups, DETC yielded the maximum H values, i.e., 5.22, 5.29, 5.34 and 5.39, which were obtained with λ = 0.3 and 0.5. The maximum H for the 80% group, however, was yielded by the MinT criterion, measured as 5.42. This 17

was in consistence with the former two cases, i.e., MinT had the advantage of reaching high joint entropy at a large station number. As was expected, the minimum C values of the five groups were given by MinT. Results indicated that when a certain number of stations were selected, DETC tended to provide more total information content, while MinT tended to reach less redundant information. This was of great use for selecting an appropriate method to evaluate and rank the gauges to fulfill specific optimization requirements.

Insert Table 7 here.

To intuitively present the effectiveness of the optimization solutions produced by different criterion, we plotted the isohyet maps of two rainstorm events using the precipitation data recorded by the significant stations prioritized (highly ranked) by DETC (λ = 0.5) and MinT, respectively. Fig. 12 and 13 respectively shows the isohyet maps of two rainstorm events happened on July 13 and August 8 in 2012, with maximum daily precipitation measured as 76.5 mm and 396.5 mm. Each figure presents different scenarios corresponding to the original monitoring network containing 95 gauges, and the prioritized 1/3 (i.e., station number n = 32) or 2/3 (n=63) stations obtained by DETC or MinT. Fig. 12(a) shows a decreasing trend of precipitation from west to east. In the southwest mountain area, the isohyet maps of DETC generally restored the precipitation distribution at n=32 and became more accurate at n=63. In the hill and plain area in the north, DETC showed discrepancy with the original map at n=32 but improved a lot at n=63. MinT, on the contrary, restored the contours well in the north area at n=32 and n=63 but showed apparent deviation from the original map in the south mountain area even when n went larger. This indicated that DETC had better performance than MinT concerning restoring the precipitation distribution, especially at a low station number. More obvious comparison is shown in Fig. 13, where two rainstorm centers on the southwest and middle mountain areas were well restored with stations prioritized by DECC. By contrast, large discrepancy was seen on the isohyet maps of MinT regarding both the rainstorm centers and contour values. Still, MinT had advantage over DETC in the north plain area. Even though, given that locating rainstorm centers and measuring extreme rainfall intensity are crucial for hydrologic forecasting, flood control and risk analysis, DETC could provide more reasonable station optimization solution to collect essential information. Based on

the foregoing analysis, this could

be explained as DETC tended to acquire more total information, while MinT tended to cut the overlapping information.

Insert Fig. 12 here. 18

Insert Fig. 13 here.

5. Conclusions This study developed a dual entropy-transinformation criterion (DETC) for hydrometric network evaluation and optimization. DETC ranked the stations according to their significance in the network and provided reference for network configuration. Two indexes, the entropy index E and the transinformation index T, which respectively measured the information content and information redundancy of each station, were integrated by a tradeoff weight λ into a final index D. Index E was computed through a mathematical floor function and index T was computed through copula entropy, with the best fitted copula model selected from three Archimedean copula families, i.e., Gumbel, Frank and Clayton. The application of DETC was illustrated through three case studies: a streamflow monitoring network in the Fenhe River basin, a small rainfall monitoring network in Beijing, and a large rainfall monitoring network in the mountain-hill-plain transition area in the Taihu Lake basin, covering different network optimization scenarios concerning network classification, network scale and climatic feature. Optimization results were assessed by the commonly used dual entropy-multiobjective (DEMO) criterion with the joint entropy (H) and total correlation (C) indexes. We also compared DETC with a minimum transinformation based optimization criterion proposed by Chen et al. (2008), i.e., the MinT criterion. The following conclusions were drawn from the case studies: 

Evaluation results of DETC was sensitive to the tradeoff weight λ, which adjusted the decision preference on information content and information redundancy.



The optimization solutions produced by DETC tended to acquire more information content, while those produced by MinT tended to avoid information redundancy.



DETC performed better at restoring the isohyet maps, especially the rainstorm centers in the mountain areas, while MinT performed better in the plain areas.

Generally, DETC designates each station a rank indicating its signification in the network, which is a balance of its marginal entropy and transinformation with other stations, and provides a way of identifying critical areas where additional stations should be located, as well as redundant stations that can be removed. DETC was proved reliable and effective through the assessment with DEMO and comparison with MinT. Further studies may consider incorporating physical (e.g. seasonality) and social (e.g. cost) factors and make the criterion more practical for decision makers.

19

Acknowledgments This research was financially supported by the National Key Research and Development Program of China (2017YFC1502704, 2016YFC0401501) and the National Natural Science Fund of China (No. 41571017, 51679118, 91647203). The work is made possible by datasets from the National Earth System Science Date Sharing Infrastructure, the Annual Hydrological Report compiled by the Ministry of Water Resources (MWR) of China and the National Meteorological Information Center of the Meteorological Administration of China.

20

References Alfonso, L., He, L., Lobbrecht, A., & Price, R. (2012). Information theory applied to evaluate the discharge monitoring network of the Magdalena River. J. Hydroinform. 15 (1), 211-228. Alfonso, L., Lobbrecht, A., & Price, R. (2010). Information theory-based approach for location of monitoring water level gauges in polders. Water Resour. Res. 46 (3), 374-381. Chacon-Hurtado, J. C., Alfonso, L., & Solomatine, D. P. (2017). Rainfall and streamflow sensor network design: A review of applications, classification, and a proposed framework. Hydrol. Earth Syst. Sci. 21 (6), 1-33. Chebbi, A., Bargaoui, Z. K., & Cunha, M. D. C. (2011). Optimal extension of rain gauge monitoring network for rainfall intensity and erosivity index interpolation. J. Hydrol. Eng. 16 (8), 665-676. Chen, Y. C., Wei, C., & Yeh, H. C. (2008). Rainfall network design using kriging and entropy. Hydrol. Process. 22 (3), 340-346. Chen, L., Ye, L., Singh, V., Zhou, J., & Guo, S. (2014). Determination of Input for Artificial Neural Networks for Flood Forecasting Using the Copula Entropy Method. J. Hydrol. Eng. 19 (11). Coulibaly, P., Samuel, J., Pietroniro, A., & Harvey, D. (2013). Evaluation of Canadian national hydrometric network density based on WMO 2008 standards. Can. Water Resour. J. 38 (2), 159-167. Cover, T. M. and Thomas, J. A. (2006). Elements of information theory, Second edition, John Wiley & Sons, Inc. Favre, A. C., Adlouni, S. E., Perreault, L., Thiémonge, N., & Bobée, B. (2004). Multivariate hydrological frequency analysis using copulas. Water Resour. Res. 40 (1), 290-294. Genest, C. & Favre, A. C. (2007). Everything you always wanted to know about Copula modeling but were afraid to ask. J. Hydrol. Eng. 12 (4), 347-368. Genest, C., Favre, A. C., Béliveau, J., & Jacques, C. (2007). Metaelliptical copulas and their use in frequency analysis of multivariate hydrological data. Water Resour. Res. 43 (9), 223-236. Hao, Z., & Singh, V. P. (2012). Entropy-copula method for single-site monthly streamflow simulation. Water Resour. Res. 48 (6), 6604. Hao, Z., & Singh, V. P. (2015). Integrating entropy and copula theories for hydrologic modeling and analysis. Entropy. 17 (4), 2253-2280. Kao, S., & Govindaraju, R. S. (2007). A bivariate frequency analysis of extreme rainfall with implications for design. J. Geophys. Res.-Atmos. 112, D13119. Keum, J., Coulibaly, P., Razavi, T., Tapsoba, D., Gobena, A., Weber, F., & Pietroniro, A. (2018). Application of SNODAS and hydrologic models to enhance entropy-based snow monitoring network design. J. Hydrol. 561, 688-701. 21

Keum, J., Kornelsen, K., Leach, J., & Coulibaly, P. (2017). Entropy applications to water monitoring network design: A review. Entropy. 19 (11), 613. Kong, X. M., Huang, G. H., Fan, Y. R., & Li, Y. P. (2015). Maximum entropy-gumbel-hougaard copula method for simulation of monthly streamflow in Xiangxi River, China. Stoch. Environ. Res. Risk Assess. 29 (3), 833-846. Leach, J. M., Coulibaly, P., & Guo, Y. (2016). Entropy based groundwater monitoring network design considering spatial distribution of annual recharge. Adv. Water Resour. 96, 108-119. Leach, J. M., Kornelsen, K. C., Samuel, J., & Coulibaly, P. (2015). Hydrometric network design using streamflow signatures and indicators of hydrologic alteration. J. Hydrol. 529, 1350-1359. Li, C., Singh, V. P., & Mishra, A. K. (2012). Entropy theory based criterion for hydrometric network evaluation and design: maximum information minimum redundancy. Water Resour. Res. 48 (48), 5521. Li, H., Wang, D., Singh, V. P., Wang, Y., Wu, J., Wu, J., et al. (2019). Non-stationary frequency analysis of annual extreme rainfall volume and intensity using Archimedean copulas: A case study in eastern China. J. Hydrol. 571, 114-131. Liu, D., Wang, D., Singh, V. P., Wang, Y., Wu, J., & Wang, L., et al. (2017). Optimal moment determination in pome-copula based hydrometeorological dependence modelling. Adv. Water Resour. 105, 39-50. Ma, J., & Sun, Z. (2011). Mutual information is copula entropy. Tsinghua Sci. Technol. 16 (1), 51-54. Mcgill, W. (1954). Multivariate information transmission. Psychometrika. 19 (2), 97-116. Michele, C. D., & Salvadori, G. (2003). A generalized pareto intensity-duration model of storm rainfall exploiting 2-copulas. J. Geophys. Res.-Atmos. 108 (D2), 171-181. Mishra, A. K., & Coulibaly, P. (2009). Developments in hydrometric network design: a review. Rev. Geophys. 47 (2), 2415-2440. Mishra, A. K., & Coulibaly, P. (2010). Hydrometric network evaluation for Canadian watersheds. J. Hydrol. 380 (3), 420-437. Mishra, A. K., & Coulibaly, P. (2014). Variability in Canadian seasonal streamflow information and its implication for hydrometric network design. J. Hydrol. Eng. 19 (8), 05014003. Mogheir, Y., & Singh, V. P. (2002). Application of information theory to groundwater quality monitoring networks. Water Resour. Manag. 16 (1), 37-49. Mogheir, Y., Lima, J. L. M. P. D., & Singh, V. P. (2009). Entropy and multi-objective based approach for groundwater quality monitoring network assessment and redesign. Water Resour. Manag. 23 (8), 1603-1620. Nelsen, B. (2006). An Introduction to Copulas, Springer, New York. 22

Ozkul, S., Harmancioglu, N. B., & Singh, V. P. (2000). Entropy-based assessment of water quality monitoring networks. J. Hydrol. Eng. 5 (1), 90-100. Ridolfi, E., Alfonso, L., Baldassarre, G. D., Dottori, F., Russo, F., & Napolitano, F. (2014). An entropy approach for the optimization of cross-section spacing for river modelling. Hydrol. Sci. J. 59 (1), 126-137. Ridolfi, E., Montesarchio, V., Russo, F., & Napolitano, F. (2011). An entropy approach for evaluating the maximum information content achievable by an urban rainfall network. Nat. Hazards Earth Syst. Sci. 11 (11), 2075-2083. Samuel, J., Coulibaly, P., & Kollat, J. (2013). Crdemo: combined regionalization and dual entropy multiobjective optimization for hydrometric network design. Water Resour. Res. 49 (12), 8070-8089. Shannon, C. E., (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27 (3), 379-423. Shiau, J. T. (2006). Fitting drought duration and severity with two-dimensional copulas. Water Resour. Manag. 20 (5), 795-815. Sklar, A. (1959). Fonction de re’partition a’ n dimensions etleursmarges. 8, Publications de L’Institute de Statistique, Universite’ de Paris, Paris, pp. 229-231. Song, S., & Singh, V. P. (2010a). Meta-elliptical copulas for drought frequency analysis of periodic hydrologic data. Stoch. Environ. Res. Risk Assess. 24 (3), 425-444. Song, S., & Singh, V. P. (2010b). Frequency analysis of droughts using the plackett copula and parameter estimation by genetic algorithm. Stoch. Environ. Res. Risk Assess. 24 (5), 783-805. Song, S., Cai, H., Jin, J. & Kang, Y. (2012). Copulas function and its application in hydrology, Science Press, Beijing, Su, H. T., & You, J. Y. (2014). Developing an entropy-based model of spatial information estimation and its application in the design of precipitation gauge networks. J. Hydrol. 519, 3316-3327. Tong, X., Wang, D., Singh, V. P., Wu, J. C., Chen, X., & Chen, Y. F. (2014). Impact of data length on the uncertainty of hydrological copula modeling. J. Hydrol. Eng. 05014019 Vandenberghe, S., Verhoest, N. E. C., & Baets, B. D. (2010). Fitting bivariate copulas to the dependence structure between storm characteristics: a detailed analysis based on 105 year 10 min rainfall. Water Resour. Res. 46 (1), 489-496. Vandenberghe, S., Verhoest, N. E. C., Onof, C., & De Baets, B. (2011). A comparative copula-based bivariate frequency analysis of observed and simulated storm events: a case study on bartlett-lewis modeled rainfall. Water Resour. Res. 47 (7), 197-203. 23

Volkmann, T. H. M., Lyon, S. W., Gupta, H. V., & Troch, P. A. (2010). Multicriteria design of rain gauge networks for flash flood prediction in semiarid catchments with complex terrain. Water Resour. Res. 46 (11), 2387-2392. Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 4 (1), 66-82. World Meteorological Organization (WMO) (2008). Guide to hydrological practices, volume

:

Practices hydrology-From measurement to hydrological information, WMO 168, 16th ed., World Meteorological Organization (WMO), Geneva, Switzerland. Xu, P., Wang, D., Singh, V. P., Wang, Y., Wu, J., Wang, L., et al. (2017). A two-phase copula entropy-based multiobjective optimization approach to hydrometeorological gauge network design. J. Hydrol. 555, 228-241. Xu, K., Yang, D., Xu, X., & Lei, H. (2015). Copula based drought frequency analysis considering the spatio-temporal variability in southwest china. J. Hydrol. 527, 630-640. Yang, Y., & Burn, D. H. (1994). An entropy approach to data collection network design. J. Hydrol. 157 (1-4), 307-324. Zhang, L., & Singh, V. P. (2006). Bivariate flood frequency analysis using the copula method. J. Hydrol. Eng. 11 (2), 150-164. Zhang, L., & Singh, V. P. (2007). Bivariate rainfall frequency distributions using Archimedean copulas. J. Hydrol. 332 (1), 93-109. Zhang, Q., Zheng, Y., Singh, V. P., Xiao, M., & Liu, L. (2016). Entropy based spatiotemporal patterns of precipitation regimes in the Huai River basin, China. Int. J. Climatol. 36 (5), 2335-2344.

24

1

List of Tables

2

Table 1. Information of three Archimedean copula families (Gumbel, Frank, and Clayton).

3

Table 2. Index E values and the corresponding ranks of the 12 streamflow monitoring stations

4

computed with different floor function parameter a.

5

Table 3. AIC and BIC values of copula modelling corresponding to the 12 streamflow monitoring

6

stations using three candidate copula families: Gumbel, Frank and Clayton.

7

Table 4. Index T values corresponding to the 12 streamflow monitoring stations computed with the

8

best fitted copula and the candidate copulas.

9

Table 5. Assessment of optimization results produced by DETC and MinT with the DEMO criterion

10

(Case 1).

11

Table 6. Assessment of optimization results produced by DETC and MinT with the DEMO criterion

12

(Case 2).

13

Table 7. Assessment of optimization results produced by DETC and MinT with the DEMO criterion

14

(Case 3).

15

25

16

List of Figures

17

Fig. 1. Flowchart of network optimization and criterion assessment.

18

Fig. 2. Location of the Fenhe River basin and the streamflow monitoring network.

19

Fig. 3. Variation of index D values of DETC with the tradeoff weight parameter (Case 1).

20

Fig. 4. Network evaluation results produced by DETC with different tradeoff weight and the MinT

21

criterion (Case 1).

22

Fig. 5. Location of the Beijing Municipality and the rainfall monitoring network.

23

Fig. 6. Variation of index D values of DETC with the tradeoff weight parameter (Case 2).

24

Fig. 7. Network evaluation results produced by DETC with different tradeoff weight and the MinT

25

criterion (Case 2).

26

Fig. 8. Location of the study area and rainfall monitoring network of Case 3.

27

Fig. 9. Variation of index D values of DETC with the tradeoff weight parameter (Case 3).

28

Fig. 10. Distribution of indexes E and T values.

29

Fig. 11. Distribution of index D values corresponding to different tradeoff weight of DETC.

30

Fig. 12. Isohyet maps of the rainstorm event on July 13, 2012 plotted with data from: (a) the original

31

monitoring network containing 95 gauges; (b) the prioritized 32 stations of DETC (λ = 0.5); (c) the

32

prioritized 63 stations of DETC (λ = 0.5); (d) the prioritized 32 stations of MinT; and (e) the

33

prioritized 63 stations of MinT.

34

Fig. 13. Isohyet maps of the rainstorm event on August 8, 2012 plotted with data from: (a) the

35

original monitoring network containing 95 gauges; (b) the prioritized 32 stations of DETC (λ = 0.5);

36

(c) the prioritized 63 stations of DETC (λ = 0.5); (d) the prioritized 32 stations of MinT; and (e) the

37

prioritized 63 stations of MinT.

38

26

39

Table 1

40

Information of three Archimedean copula families (Gumbel, Frank, and Clayton). Copula

Gumbel



exp …−C− ln )U + − ln *U DU † )7U + * 7U − 1

Clayton

Frank

41

6‚

C(u,v)

7



U

‹

_ ‘ 7

1 7U ‚ − 1 B

 U

1 ‰ 7U3 − 1‰ 7U4 − 1 − ln ˆ1 + Š B ‰ 7U − 1

Note: ~ −B = U a’

− ln ‚U

− ln

b‚ , ~ −B = ~ B + 

U

42

27

‰ 7U‹ − 1 ‰ 7U − 1

Kendall’s τ

θ (1,



0, ∞ R\Ž0

1−

1 B

B B+2

4 1 − (~ −B − 1+ B

43

Table 2

44

Index E values and the corresponding ranks of the 12 streamflow monitoring stations computed with

45

different floor function parameter a. Values of index E and corresponding station ranks Station ID

a=1

a=2

a=5

a=10

a=20

Value

Rank

Value

Rank

Value

Rank

Value

Rank

Value

Rank

S01

0.86

5

0.78

4

0.71

4

0.46

6

0.60

4

S02

0.69

7

0.58

8

0.50

8

0.36

7

0.14

6

S03

0.69

8

0.62

7

0.52

7

0.36

7

0.14

6

S04

0.38

9

0.37

9

0.18

9

0.00

9

0.00

8

S05

0.87

4

0.76

5

0.68

5

0.50

4

0.42

5

S06

0.80

6

0.67

6

0.55

6

0.49

5

0.00

8

S07

0.96

3

1.00

1

1.00

1

0.95

3

0.83

3

S08

0.29

10

0.24

10

0.00

10

0.00

9

0.00

8

S09

1.00

1

0.99

2

0.91

2

0.97

2

1.00

1

S10

0.00

12

0.00

12

0.00

10

0.00

9

0.00

8

S11

0.99

2

0.89

3

0.86

3

1.00

1

1.00

2

S12

0.18

11

0.10

11

0.00

10

0.00

9

0.00

8

46 47

28

48

Table 3

49

AIC and BIC values of copula modelling corresponding to the 12 streamflow monitoring stations

50

using three candidate copula families: Gumbel, Frank and Clayton. Station

AIC

BIC

ID

Gumbel

Frank

Clayton

Gumbel

Frank

Clayton

S01

-18.43

-14.62

-6.42

-17.03

-13.22

-5.02

S02

-60.99

-54.50

-20.31

-59.59

-53.10

-18.91

S03

-65.13

-59.64

-30.01

-63.73

-58.24

-28.61

S04

-16.33

-14.78

-8.21

-14.93

-13.38

-6.81

S05

-34.19

-34.75

-21.20

-32.79

-33.34

-19.79

S06

-21.76

-19.04

-12.37

-20.36

-17.64

-10.97

S07

-58.79

-55.26

-40.93

-57.39

-53.86

-39.52

S08

-17.66

-20.20

-15.93

-16.25

-18.80

-14.53

S09

-67.60

-65.13

-43.29

-66.20

-63.73

-41.88

S10

-7.48

-8.30

-3.36

-6.07

-6.90

-1.95

S11

-67.75

-61.12

-48.83

-66.34

-59.72

-47.43

S12

-34.67

-32.37

-24.40

-33.26

-30.97

-23.00

51 52

29

53

Table 4

54

Index T values corresponding to the 12 streamflow monitoring stations computed with the best fitted

55

copula and the candidate copulas. Values of index T and corresponding station ranks Station

Best

ID

copula

Best copula

Gumbel

Frank

Clayton

Value

Rank

Value

Rank

Value

Rank

Value

Rank

S01

Gumbel 0.16

3

0.16

3

0.17

3

0.18

3

S02

Gumbel 0.68

8

0.68

8

0.63

8

0.29

6

S03

Gumbel 1.00

12

1.00

12

1.00

12

0.84

10

S04

Gumbel 0.12

2

0.12

2

0.05

2

0.04

2

S05

Frank

0.44

7

0.43

7

0.44

7

0.60

8

S06

Gumbel 0.22

5

0.22

5

0.19

4

0.18

4

S07

Gumbel 0.77

9

0.77

9

0.82

9

0.85

11

S08

Frank

4

0.16

4

0.21

5

0.26

5

S09

Gumbel 0.83

10

0.83

10

0.86

11

0.76

9

S10

Frank

1

0.00

1

0.00

1

0.00

1

S11

Gumbel 0.87

11

0.87

11

0.84

10

1.00

12

S12

Gumbel 0.41

6

0.41

6

0.41

6

0.38

7

0.21

0.00

56 57

30

58

Table 5

59

Assessment of optimization results produced by DETC and MinT with the DEMO criterion (Case 1). Number of stations

Optimization criterion Percentage of stations

DETC λ=0.1

λ=0.3

λ=0.5

λ=0.7

λ=0.9

MinT

Joint Entropy (H) 2

20%

1.46

1.74

1.74

1.74

0.70

1.39

4

33%

2.72

2.72

2.72

2.64

1.85

2.28

6

50%

3.11

3.11

3.00

2.74

2.74

2.85

8

67%

3.25

3.25

3.09

3.09

2.98

3.12

10

80%

3.25

3.25

3.25

3.17

3.17

3.25

Total Correlation (C) 2

20%

0.13

0.12

0.12

0.12

0.53

0.05

4

33%

0.75

0.75

0.98

0.90

0.94

0.70

6

50%

2.36

2.36

2.11

2.15

2.15

1.73

8

67%

3.59

3.56

3.58

3.41

3.52

3.34

10

80%

5.00

5.00

5.03

5.10

5.10

5.03

60 61

31

62

Table 6

63

Assessment of optimization results produced by DETC and MinT with the DEMO criterion (Case 2). Number of stations

Optimization criterion Percentage of stations

DETC λ=0.1

λ=0.3

λ=0.5

λ=0.7

λ=0.9

MinT

Joint Entropy (H) 4

20%

3.02

3.02

3.02

3.02

2.92

2.94

6

33%

3.13

3.10

3.08

3.13

3.13

3.13

9

50%

3.24

3.16

3.24

3.21

3.21

3.17

12

67%

3.24

3.24

3.24

3.24

3.24

3.21

14

80%

3.24

3.24

3.24

3.24

3.24

3.24

Total Correlation (C) 4

20%

4.46

4.46

4.46

4.46

4.64

4.19

6

33%

7.52

7.83

8.05

8.04

8.09

7.39

9

50%

12.81

12.97

13.12

13.37

13.37

12.42

12

67%

18.06

18.18

18.46

18.63

18.63

17.67

14

80%

21.63

21.73

22.01

22.01

22.08

21.26

64 65

32

66

Table 7

67

Assessment of optimization results produced by DETC and MinT with the DEMO criterion (Case 3). Number of stations

Optimization criterion Percentage of stations

DETC λ=0.1

λ=0.3

λ=0.5

λ=0.7

λ=0.9

MinT

Joint Entropy (H) 19

20%

4.94

5.13

5.22

5.17

5.15

4.97

32

33%

5.13

5.29

5.27

5.23

5.23

5.04

48

50%

5.29

5.34

5.34

5.32

5.31

5.19

63

67%

5.37

5.39

5.38

5.35

5.35

5.30

76

80%

5.39

5.41

5.41

5.39

5.38

5.42

Total Correlation (C) 19

20%

58.78

60.02

63.67

65.06

65.38

57.27

32

33%

102.72

104.82

109.48

112.20

112.37

100.79

48

50%

157.28

160.64

165.56

168.57

168.86

155.61

63

67%

209.41

212.61

217.46

220.13

220.36

208.47

76

80%

256.37

257.84

262.86

263.71

264.02

255.35

68 69

33

70 71

Fig. 1. Flowchart of network optimization and criterion assessment.

72 73

34

74 75

Fig. 2. Location of the Fenhe River basin and the streamflow monitoring network.

76 77

35

78 79

Fig. 3. Variation of index D values of DETC with the tradeoff weight parameter (Case 1).

80 81

36

82 83

Fig. 4. Network evaluation results produced by DETC with different tradeoff weight and the MinT

84

criterion (Case 1).

85

37

86 87

Fig. 5. Location of the Beijing Municipality and the rainfall monitoring network.

88

38

89 90

Fig. 6.. Variation of index D values of DETC with the tradeoff weight parameter (Case 2).

91 92

39

93 94

Fig. 7. Network evaluation results produced by DETC with different tradeoff weight and the MinT

95

criterion (Case 2).

96 97

40

98 99

Fig. 8. Location of the study area and rainfall monitoring network of Case 3.

100 101

41

102 103

Fig. 9. Variation of index D values of DETC with the tradeoff weight parameter (Case 3).

104

42

105 106

Fig. 10. Distribution of indexes E and T values.

107

43

108 109

Fig. 11. Distribution of index D values corresponding to different tradeoff weight of DETC.

110

44

111 112

Fig. 12. Isohyet maps of the rainstorm event on July 13, 2012 plotted with data from: (a) the original

113

monitoring network containing 95 gauges; (b) the prioritized 32 stations of DETC (λ = 0.5); (c) the

114

prioritized 63 stations of DETC (λ = 0.5); (d) the prioritized 32 stations of MinT; and (e) the

115

prioritized 63 stations of MinT.

116 117

45

118 119

Fig. 13. Isohyet maps of the rainstorm event on August 8, 2012 plotted with data from: (a) the

120

original monitoring network containing 95 gauges; (b) the prioritized 32 stations of DETC (λ = 0.5);

121

(c) the prioritized 63 stations of DETC (λ = 0.5); (d) the prioritized 32 stations of MinT; and (e) the

122

prioritized 63 stations of MinT.

123

46

Highlights: 

DETC was developed to prioritize stations for hydrometric network optimization.



DETC incorporated decision preference on total information and redundancy.



DETC surpassed MinT in restoring the spatial distribution of precipitation.