Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm

Advances in Water Resources 32 (2009) 582–593 Contents lists available at ScienceDirect Advances in Water Resources journal homepage: www.elsevier.c...

Download PDF

857KB Sizes 0 Downloads 38 Views

Report

PDF Reader
Full Text

Advances in Water Resources 32 (2009) 582–593

Contents lists available at ScienceDirect

Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres

Input variable selection for water resources systems using a modiﬁed minimum redundancy maximum relevance (mMRMR) algorithm Mohamad I. Hejazi, Ximing Cai * Ven-Te Chow Hydrosystems Laboratory, Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, 205 N. Mathews Avenue, Urbana, IL 61801, United States

a r t i c l e

i n f o

Article history: Received 3 September 2008 Received in revised form 20 January 2009 Accepted 20 January 2009 Available online 4 February 2009 Keywords: Mutual information Input selection MRMR mMRMR Modeling

a b s t r a c t Input variable selection (IVS) is a necessary step in modeling water resources systems. Neglecting this step may lead to unnecessary model complexity and reduced model accuracy. In this paper, we apply the minimum redundancy maximum relevance (MRMR) algorithm to identifying the most relevant set of inputs in modeling a water resources system. We further introduce two modiﬁed versions of the MRMR algorithm (a-MRMR and b-MRMR), where a and b are correction factors that are found to increase and decrease as a power-law function, respectively, with the progress of the input selection algorithms and the increase of the number of selected input variables. We apply the proposed algorithms to 22 reservoirs in California to predict daily releases based on a set from a 121 potential input variables. Results indicate that the two proposed algorithms are good measures of model inputs as reﬂected in enhanced model performance. The a-MRMR and b-MRMR values exhibit strong negative correlation to model performance as depicted in lower root-mean-square-error (RMSE) values. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction Input selection is an initial, necessary step of any modeling exercise and a proper selection dictates the modeling accuracy of systems. Selecting irrelevant inputs can greatly inﬂuence model accuracy and add unnecessary model complexity inﬂuencing model reliability. In the hydrology and water resources arena, a hydrologist is often faced with the challenge of preprocessing a huge set of possible inputs into a hydrologic system. However, many hydrologic variables possess redundant information and not all variables are relevant or even necessary [2]. A proper input selection mechanism is of particular importance when dealing with data-driven models with unknown sets of inputs. For example, Bowden et al. [2] illustrated the importance of input selection in the context of employing artiﬁcial neural networks (ANN) models in water resources applications. They recognized input selection in ANN as one of the most important steps in the model development process. They further stated that selecting only the most relevant input variables in a system could potentially reduce input dimensionality and consequently reduce computational complexity, ease the learning process of ANN, improve accuracy, and enable better understanding of the ‘true’ driving forces of the modeled system. However, an effective method that can measure correlation (linear and non-linear) between a set of potential input variables and an output variable is still needed [7,2,9].

* Corresponding author. E-mail address: [email protected] (X. Cai). 0309-1708/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2009.01.009

Input variable selection (IVS) methods generally rely on one of the following techniques: using experts’ knowledge of the system, linear cross-correlation, heuristic techniques, sensitivity analysis, or information-theoretic measures [2, and references therein]. In this paper, we build on the use of information-theoretic measures to quantify the level of inference among several random variables. The goal is to craft an algorithm to correctly identify the set of input variables that collectively possess the largest amount of information about the system being modeled without including irrelevant input variables. In the rest of the paper, we ﬁrst present the background on IVS techniques. Following that we layout the methodology framework proposed in the paper and its merits; apply the minimum redundancy maximum relevance (MRMR) algorithm to the synthetic test problems used by Sharma [15] and subsequently by Bowden et al. [2] to compare its performance to the two previous studies and to validate its selection accuracy. Next we introduce a real world case study to illustrate the performance of the newly proposed approach, and ﬁnally we discuss the results and their implications followed by the conclusion. 2. Background In hydrology, highly non-linear relationships have been reported in the literature in areas such as: hydroclimatology [10], irrigation hydrology [22], contaminant hydrology [1], ecological processes [19], rainfall–runoff processes [13], surface ﬂuxes and surface moisture availability [3], and inﬁltration, evaporation and

583

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

runoff processes [23]. Since relationships in hydrology are typically non-linear, using standard linear correlation methods may only capture linear relationships and misleadingly overlook the existence of non-linear relationships [7]. The use of information theory [14] – and more speciﬁcally mutual information – to measure linear and non-linear correlation between two variables is becoming increasingly popular due to its basis as an information-theoreticbased relevance criterion [6]. Estimating the mutual information between two random variables requires the determination of their joint probability density function with sufﬁcient data. Existing methods of estimating the joint probability density function include building histograms (e.g., [8,12]) and using kernel estimation techniques (e.g., [15,2]). However, as the number of random variables increases, the computational reliability of multidimensional mutual information quickly becomes impractical [9,11]. Additionally, hydrologic and water resources systems often involve many input variables. Motivated by this shortcoming of computing mutual information among several random variables, research efforts have expanded to other techniques such as partial mutual information (PMI) [15,2], maximum dependence, maximum relevance (or average mutual information), minimum redundancy, and minimum redundancy maximum relevance [9]. We brieﬂy describe each of the methods next.

premise of his approach is to sequentially select an additional input variable while extracting the information explained by the previously selected variables using conditional expectations. The PMI method as described by Sharma [15] requires the computation of conditional expectations under the assumption of Gaussian for the joint probability of the variables. This assumption is likely invalid for non-linear systems. Furthermore, the conditional expectation calculations involve inverting a matrix, which may become burdensome as the dimension of input variables increases. More recently, Bowden et al. [2] used general regression neural network (GRNN) as an approximated measure of the joint probability density function. GRNN is a feed-forward artiﬁcial neural network (ANN) that can capture non-linear relationships between the inputs and outputs. Unlike standard ANNs, GRNN does not require the prior selection of a network structure, and is computationally efﬁcient [2]. However, the PMI method with GRNN is inefﬁcient if working with a relatively large set of input variables since it would require running GRNN between each of the remaining and each of the pre-selected variables at each iteration.

2.1. Mutual information

Another frontier in the advancement of IVS algorithms is on establishing approximate methods to calculate mutual information among several random variables. Theoretically, computing mutual information among a set of variables is equivalent to measuring their dependency; thus, determining the set of inputs variables with the highest mutual information is a dependency maximization problem. This process can be implemented using the maximum dependency algorithm [9]. The maximum dependency algorithm computes the dependency as the mutual information between a set of m input variables (Sm) and the output variable Y in a forward fashion until the addition of any new input variable (Xi) becomes negligible:

Mutual information is a measure of the linear and non-linear dependence between a set of variables. Mutual information between two random variables is a measure of the information one random variable explains about the other. It takes a minimum value of zero when no dependence exists between the two variables and a positive value when a strong dependence (linear or non-linear) exists between the two variables. Mutual information between two random variables X and Y can be quantiﬁed as shown in the following equation [4]:

IðX; YÞ ¼

Z Z

pðx; yÞ log

pðx; yÞ dx dy pðxÞ pðyÞ

ð1Þ

where x and y represent realizations of X and Y, I(X; Y) is the mutual information between the two random variables X and Y, p(x, y) is their joint probability mass function, and p(x) and p(y) are the marginal probability mass functions of X and Y, respectively. Measuring mutual information between a set of input variables {Xi 2 Sm, i = 1, . . ., m} and an output variable Y can be estimated by Eq. (2) where Sm is the set of m input variables:

Z Z

pðSm ; yÞ pðSm ; yÞ log dSm dy pðSm Þ pðyÞ Z Z ¼ pðx1 ; . . . ; xm ; yÞ

IðSm ; YÞ ¼

log

pðx1 ; . . . ; xm ; yÞ dx1 dxm dy pðx1 Þ pðxm Þ pðyÞ

ð2Þ

Although mutual information is a robust measure of non-linear correlation between a set of random variables [6], its applicability diminishes beyond the two-variable case [11]. The challenge lies in the need to reliably estimate joint probabilities of the dimension of the number of variables at stake. For instance, computing the mutual information among four variables requires one to determine a four-dimensional joint probability distribution.

2.3. Maximum dependency, maximum relevance, and minimum redundancy

maximize IðSm ; YÞ;

I ¼ IðfX i ; i ¼ 1; . . . ; m; YgÞ

ð3Þ

That is when no additional input variable will yield an improvement in mutual information (I) that is higher than a prescribed threshold value e, the algorithm stops. Sequentially, the algorithm ﬁnds the best single most important input variable, then the second, third, etc., most important variables until the uncertainty reduction of the output variable Y becomes less than e. This algorithm is not computationally reliable when it involves computing joint probability distributions of high dimensionality. To rectify this limit, a commonly used approximation of maximum dependency algorithm is the maximum relevance (D) algorithm – also called average mutual information (AMI) [17] – as presented in Eq. (4). The maximum relevance algorithm works in a similar manner to the maximum dependency algorithm. It, however, assumes that the input variables {xi 2 Sm, i = 1, . . ., m} are independent – an assumption often violated when autocorrelated temporal and spatial variants of hydrologic input variables are considered in the selection process. This algorithm does not account for any redundancy among the input variables, thus, lacks the discriminative power to avoid selecting redundant input variables:

maximize DðSm ; YÞ;

D¼

1 X IðX i ; YÞ m X 2S i

ð4Þ

m

2.2. Partial mutual information Recently, Sharma [15] introduced the concept of partial mutual information (PMI) to compute the partial amount of mutual information attained with the addition of a new input variable. The

To overcome the shortcoming of not accounting for redundancy, Ding and Peng [5] introduced the minimum redundancy (R) algorithm which seeks ﬁnding the most mutually exclusive set of input variables (Eq. (5)). The minimum redundancy (R) algorithm alone

584

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

2.4. Minimum redundancy maximum relevance (MRMR) A more effective approach is to combine the maximum relevance and minimum redundancy algorithms. Peng et al. [9] proposed the idea of combining the maximum relevance and minimum redundancy algorithms into a single maximization problem of the form in Eq. (6), where D and R are deﬁned in Eqs. (4) and (5), respectively, and U is an approximation of the true mutual information value I:

max UðD; RÞ; Fig. 1. Deﬁnition of mutual information in the context of Venn diagrams.

is a poor estimate of mutual information because a mutually exclusive set of input variables may very well possess no dependence with the output variable Y:

minimize RðSm Þ;

R¼

1 X IðX i ; X j Þ m2 X ;X 2S i

j

m

ð5Þ

U¼DR

ð6Þ

They named the combined method the minimum redundancy maximum relevance (MRMR) algorithm. They showed that it works efﬁciently even for a relatively large set of inputs and provided an analytical proof that a ﬁrst order MRMR model collapses to a maximum dependency problem. Although the method is an approximation, intuitively it should yield a better selection algorithm than either the maximum relevance or minimum redundancy algorithms alone. The maximum relevance drives the selection process in favor of the most relevant set of variables with no attention to redundancy while the minimum redundancy favors input variables that

Fig. 2. Venn diagram representation of the mutual information between 2, 3, 4, and 5 random variables; here, Y is the dependent variable and X1, . . ., X4 are the independent variables.

585

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 1 Summary of the Venn areas corresponding to measures of mutual information (I) for the scenarios of 1, 2, 3, and 4 input variables. Scenario number

Mutual information (I(Xi; Y), for i = 1, . . ., 4)

1 2 3 4

I(X1; Y) = AB I(X1, X2; Y) = AB + AC + ABC I(X1, X2, X3; Y) = AB + AC + AD + ABC + ABD + ACD + ABCD I(X1, X2, X3, X4; Y) = AB + AC + AD + AE + ABC + ABD + ACD + ABE + ACE + ADE + ABCD + ABCE + ABDE + ACDE + ABCDE

are irrelevant to one another without any attention to how important they are to the output variable (e.g., Y). MRMR drives the selection process based on a balance between these two selection processes favoring those variables that bring high relevance and low redundancy on average. This gives the MRMR algorithm a discriminative power that allows it to avoid selecting redundant variables. However, the relative importance of the R and D terms in Eq. (6) changes with the increase of the number of selected input variables. To explain why theoretically this should be true, we use Venn diagrams and simple algebra to illustrate the premise of a modiﬁed MRMR algorithm next. Additionally, in this paper, we use Venn diagrams to justify a new deﬁnition of the redundancy (R) term (Eq. (5)) in MRMR (Eq. (6)). 3. Modiﬁed minimum redundancy maximum relevance (mMRMR) In this paper, we represent the concept of mutual information in the context of Venn diagrams. Venn diagrams were ﬁrst introduced in 1880 by John [21] and have seen great use since then in many applications. As shown in Fig. 1, each circle resembles the uncertainty contained in each random variable and the overlapped areas represent the explained (shared) uncertainty based on knowledge of other random variables. Mathematically, the mutual information, I, between variables X and Y is equal to the sum of marginal entropies of each variable, H(X) + H(Y), minus their joint entropy, H(Y, X) (Eq. (7)). Hence, the joint entropy term is represented by the union of the two Venn circles (Fig. 1). Similarly, the mutual information between a set of two random variables (X1, X2) and a third random variable (Y) can be computed based on Eq. (8), which includes a three-dimensional joint entropy term:

IðX 1 ; YÞ ¼ HðXÞ þ HðYÞ HðY; XÞ

ð7Þ

IðX 1 ; X 2 ; YÞ ¼ HðYÞ þ HðX 1 ; X 2 Þ HðX 1 ; X 2 ; YÞ

ð8Þ

For simplicity, suppose Y is an output variable and X1, . . ., X4 are four input variables, and areas encompassed by variables Y, X1, X2, X3, and X4 are referred to as A, B, C, D, and E, respectively (Fig. 2). For Scenario 1 in Fig. 2, AB represents the mutual information between X1 and Y, where AB refers to the intersection between A and B only. For Scenario 2, however, AB + AC + ABC is the mutual information between the two input variables X1 and X2, and the output variable Y. Similarly, for Scenarios 3 and 4, the area of mutual information is summarized in Table 1.

Table 2 Summary of the Venn areas corresponding to measures of mutual information (I), maximum relevance (D), minimum redundancy (R), and MRMR (U) for the three random variables scenario; D, R, and U are computed without normalizing the D and R quantities. Term

Eq. (5)

Eq. (9)

D R

AB + AC + 2ABC B + C + AB + AC + 4BC + 4ABC (B + C + 4BC + 2ABC) AB + AC + ABC

AB + AC + 2ABC BC + ABC AB + AC + ABC BC AB + AC + ABC

U I

REq. (5) = I(X1; X2) + I(X2; X1) + I(X1; X1) + I(X2; X2); REq. (9) = I(X1; X2).

Note that computing R based on Eq. (5) (including self-entropies), e.g., I(Xi, Xi) would exacerbate the MRMR approximation of I. For the sake of clarity, suppose D (Eq. (4)) and R (Eq. (5)) are total quantities instead of averaged quantities. For instance, for Scenario 2, I = AB + AC + ABC (Table 1). Self-entropies are incorporated in the deﬁnition of R (as deﬁned in Eq. (5)) and subsequently U (see Table 2). Then R expands from only two terms (BC and ABC) to include other terms such as B, C, AC, and AB. Speciﬁcally, R = B + C + AB + AC + 4BC + 4ABC and U = (B + C + 4BC + 2ABC). Hence, terms B and C have no relevance to the uncertainty in the output variable Y. Furthermore, U does not include either AB or AC, and also takes ABC as a negative quantity (Table 2). Thus, including the self-entropy terms in R (when i = j) and consequently in U renders the approximation of I inappropriate. Since mutual information is a symmetric metric (I(Xi; Xj) = I(Xj; Xi)), we also exclude symmetric terms (i P j) when computing R. In this paper, we deﬁne R as indicated in Eq. (9). The redundancy (R) term in Eqs. (5) and (6) is normalized by the number of possible combinations. Hence, to account for the omitted self-entropy (i = j) and symmetric (i P j) terms, the denominator term (m2) (Eq. (5)) becomes 2(m2 m) (Eq. (9)). This is the difference between our deﬁnition of MRMR and the deﬁnition presented by [9]:

minimize RðSm Þ;

R¼

X 1 IðX i ; X j Þ mÞ X ;X 2S

2ðm2

i

j

ð9Þ

m

i
Based on the modiﬁed deﬁnition of R, we compute the areas for the mutual information (I), total relevance (D), total redundancy (R), and MRMR (U) for the one-, two-, three- and four-input variable scenarios (Table 3). In Table 3, when we have one-input variable (X1), both D and U are perfect estimates of I. For the case of two input variables (X1, X2), D overestimates while U underestimates the value of I. However, we argue that the latter is a better approximation than the former in input variable selection. D overestimates I by double-counting the ABC region, thus misleadingly indicating that X1 and X2 explain more of the uncertainty in Y than they actually do. U, however, underestimates I by subtracting a new term (BC) which reﬂects the redundancy among the two input variables (X1 and X2). Although subtracting BC underestimates the ‘true’ measure of dependence, it serves in favor of selecting input variables that are as different as possible (e.g., the lowest redundancy). This is favorable in statistical model input selection because it simpliﬁes model structures and avoids irrational model functional forms. As the number of variables increases, more weight is given to avoid redundancy among the selected variables. The addition of each new variable is assessed in term of its dependency (e.g., mutual information) to the output variable Y and to the set of pre-selected input variables (Xi). In Table 3, U is computed (Eq. (6)) without normalizing the D and R quantities as described in Eqs. (4) and (9), respectively. 3.1. Merits of mMRMR The MRMR algorithm assigns equal weights to the relevance (D) and redundancy (R) terms in Eq. (6); however, as illustrated in Table 3, the redundancy term is an overestimate of the ‘true’ redundancy. For instance, for the two-input variable case (Scenario 2),

586

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

Table 3 Summary of the Venn areas corresponding to measures of mutual information (I), maximum relevance (D), minimum redundancy (R), and MRMR (U) for each of the four scenarios illustrated in Fig. 2; D, R, and U are computed without normalizing the D and R quantities; Xi: input variables and Y: is the dependent variable; values are coefﬁcients multiplied by their corresponding areas; e.g., for Scenario 2, U = AB + AC BC + ABC.

only ABC is relevant redundancy while BC is irrelevant redundancy. Here we deﬁne redundancy (or total redundancy, R) as the sum of relevant (RR) and irrelevant (IR) redundancies. We deﬁne relevant redundancy (RR) as the overlapped areas (information) between a P set of input variables and an output variable, e.g., RR ¼ i ðX i \ YÞ; we deﬁne irrelevant redundancy (IR) as the overlapped areas (information) among the input variables but not with the output P PP variable, e.g., IR ¼ i j ðX i \ X j Þ i ðX i \ YÞ. Accounting for the irrelevant redundancy terms grants the MRMR algorithm a tendency to favor uncorrelated variables. Ideally, the redundancy term (R) should include only the relevant redundancy (RR) terms and exclude the irrelevant redundancy (IR) terms. Thus, we introduce a as a correction factor of the redundancy term in the original form of the MRMR equation form (Eq. (6)). We refer to the new form as the a-MRMR algorithm:

maximize Ua ðSm ; YÞ;

Ua ¼ D a R

Factor a is only indicative of the relative importance of dependency and redundancy on average; however, it does not distinguish between the relevant (RR) and irrelevant (IR) redundancies. Thus, we introduce another formulation of mMRMR and to avoid confusion with a-MRMR (Eq. (10)), we call the new form b-MRMR (Eq. (11)). Unlike the previous formulation (Eq. (10)), in Eq. (11), D and R are not normalized. Here again, we deﬁne b as the correction factor of the redundancy term (Eq. (12)). From the redundancy point of view, when b = 0, it implies that all captured redundancy is due to irrelevant redundancy (IR); when b = 1, it implies that all captured redundancy is due to relevant redundancy (RR):

maximize Ub ðSm ; YÞ;

Ub ¼ m D b 2ðm2 mÞ R ¼

ð10Þ

where D and R are deﬁned in Eqs. (4) and (9), respectively. Hence, a is a quantity ranging between zero and one. When a = 0, the aMRMR algorithm collapses to a maximum relevance algorithm; when a = 1, the a-MRMR algorithm becomes equivalent to the MRMR algorithm. Furthermore, D and R are deﬁned as average quantities of the relevance and redundancy that a set of inputs possesses. In that sense, a is a weight that indicates the relative importance of dependency and redundancy on average on the selection process, and it may vary as the number of selected variables increases. Therefore, one could repeat the a-MRMR run for different a values and different numbers of variables to select the ‘best’ set of values.

X

IðX i ; YÞ b

xi 2Sm

b¼

RR RR ¼ RR þ IR R

X

IðX i ; X j Þ

ð11Þ

X i ;X j 2Sm i
ð12Þ

Theoretically, one could compute exactly the value of the b coefﬁcient for a different number of variables. Based on the results in Table 3 and employing Eq. (12), the exact formula for each of the four discussed cases is determined (see Table 4). b is equal to one under the one-input variable case (Scenario 1) but quickly diminishes as the number of variables considered increases. That is to say, the amount of irrelevant redundancy (IR) dominates the redundancy term R as the number of variables increases. This will be veriﬁed through a case study.

587

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 4 Venn-based theoretical deﬁnitions of the b terms for Scenarios 1–4. Scenario number

b

1 2 3

b=1 ABC b¼ ABC þ BC 2ABCD þ ABC þ ABD þ ACD b¼ 3ABCD þ ABC þ ABD þ ACD þ 3BCD þ BC þ BD þ CD

4

b¼

3ABCDE þ 2ðABCD þ ABCE þ ABDE þ ACDEÞ þ ðABC þ ABD þ ABE þ ACD þ ACE þ ADEÞ 6ABCDE þ 3ðABCD þ ABCE þ ABDE þ ACDEÞ þ ðABC þ ABD þ ABE þ ACD þ ACE þ ADEÞ þ 6BCDE þ 3ðBCD þ BCE þ BDE þ CDEÞ þ ðBC þ BD þ BE þ CD þ CE þ DEÞ

4. Feed-forward incremental modiﬁed minimum redundancy maximum relevance In practice, one is often faced with the need to select from among a huge set of input variables and enumerating all possible combinations may be computationally prohibitive. Although computing Eq. (6)is a quick calculation, the need to repeat the compuS tations times can be computationally overwhelming. m Suppose we have a set of 50 potential input variables and we want to select the best 10 variables, then we have to enumerate about 1.03 1010 times. A more practical and computationally feasible approach is to follow a feed-forward incremental approach, by selecting the ‘best’ input variable and then selecting the second next ‘best’ variable based on the previously selected variable and so on until any additional variable would add no value to explaining the uncertainty of Y. The feed-forward incremental MRMR model form is presented in Eq. (13). This equation can be particularly useful when the number of variables is very large and one cannot afford to enumerate all possible combinations:

2 maximize Uincr ðS; YÞ;

Start

Initialization

Input Selection Algorithm (e.g., MRMR, α – MRMR, β –MRMR)

Increment number of allowed inputs by 1

3

6

1

Uincr ¼ 6 4IðX j ; YÞ m 1

X X i ;X j 2Sm1 i
7 IðX j ; X i Þ7 5

NO

Systems Model (e.g., GRNN)

Δ RMSE < ε YES

ð13Þ Similarly, we introduce the feed-forward incremental mMRMR equations (a-MRMR and b-MRMR) as described in Eqs. (14) and (15). Hence, the values of a and b need to be pre-speciﬁed prior to running either of the two algorithms. In this study, we shed light on the nature of those coefﬁcients with the increase of the number of input variables:

2 max Uincr a ðS; YÞ;

6

1

2 max Uincr b ðS; YÞ;

6

Uincr ¼6 b 4IðX j ; YÞ b

X X i 2Sm1 i
X X i 2Sm1 i
7 IðX j ; X i Þ7 5 ð14Þ 3

7 IðX j ; X i Þ7 5

Fig. 3. Schematic of the input selection algorithm (e.g., MRMR, a-MRMR, and bMRMR) coupled with a modeling component (e.g., GRNN).

prove the RMSE error value beyond a pre-speciﬁed threshold value,

e. As a demonstrating example, the general regression neural net-

3

6 Uincr a ¼ 4IðX j ; YÞ a m1

End

ð15Þ

In real applications, one does not have prior knowledge of the number of variables that explain the most about the uncertainty of a particular variable of interest. Thus, the selection algorithm needs to include a stopping criterion or mechanism to iteratively increase the number of input variables until a pre-speciﬁed condition has been satisﬁed. MRMR and its variants are selection algorithms that identify a set of variables but have no means of specifying a stopping point. A logical stopping mechanism is to couple the input selection algorithm (e.g., MRMR, a-MRMR, b-MRMR) with a simulation model of the system to iteratively evaluate the model performance with each additional input variable until a percent improvement in a prescribed modeling accuracy term such as the root-mean-square-error (RMSE) is satisﬁed (Fig. 3). The IVS algorithm stops when any additional input variable would not im-

work (GRNN) is used as the systems model as shown in Fig. 3. 5. The general regression neural network (GRNN) The general regression neural network (GRNN) is a fast feedforward artiﬁcial neural network (ANN) ﬁrst developed by Specht [16]. It has a ﬁxed structure and thus avoids the subjective selection of a network structure, runs very efﬁciently, and can capture linear and non-linear relationships [2]. Thus, it is commonly used as a universal approximator for smooth functions and thus can approximate any linear and non-linear relationships between a set of inputs and an output variable given enough data [16,2]. For more details about the derivations of the GRNN algorithm, readers are referred to Specht [16]. The only required input to simulate GRNN is a value of the smoothness factor, r, which is not known a priori and is commonly established with a trial-and-error process. When r is a small value, only a few nearby neighbors play a role; but when r is large, distant neighbors also affect the estimate at X yielding a smoother estimate. At the extreme, when r is zero, Y becomes dependent solely on the closest Xi value; and when r approaches 1, Y becomes the mean value of all Yi. Hence, the choice of the smoothing

588

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

Input Variables Xi

Mutual Information

I (Xi ; Y)

Equivalent Output Variable Y

1 RMSE

GRNN

Fig. 4. Schematic of the assumption of equivalence between mutual information and GRNN (e.g., I RMSE1).

parameter is important. Here, we adapt a trial-and-error procedure to select a r value with a balance between accuracy and smoothness, following the work of Bowden et al. [2]. In this paper, we use GRNN as a measure of the true relationship between a set of input variables and an output variable; and input variables with a lower RMSE are considered to possess a stronger relationship to the output variable. The assumption here is that the inverse of the RMSE is taken as the surrogate of the mutual information measure (see Fig. 4). This assumption was previously used by Strobl and Forte [18]. They employed ANN as a means to determine the most relevant factors in two drainage network derivation case studies, where ANN gauged the relationship strength between a set of factors and a particular variable. In this paper, we replace ANN with GRNN since it is more efﬁcient and does not require the speciﬁcation of the neural network structure. Tomandl and Schober [20] summarize some of the deﬁciencies of the traditional back-propagation neural networks which are overcome with the GRNN algorithm. They further established a modiﬁed version of GRNN for cases when datasets are not of equal length, but this is not a concern in this study.

Table 5 Input selection results of the ﬁve test problems using the MRMR algorithm; stopping criterion was based on a minimum percent improvement in RMSE value (e = 3%) and simulating GRNN with a constant smoothing factor (r) of 1; input variables in gray are selected.

6. Synthetic test problems To illustrate the competence of the MRMR in identifying the most relevant set of inputs, we apply it to the same test problems used by Sharma [15] and Bowden et al. [2] to evaluate the competence of the PMI algorithm as an input selection method. The test problems are with known dependence attributes and based on synthetic data. Sharma [15] used ﬁve autoregressive models; the ﬁrst three are simple autoregressive models with varying orders of time and dependence while the last two are non-linear threshold autoregressive model. Bowden et al. [2] tested against three of Sharma’s test problems. Here we use all ﬁve test problems, namely:

points were generated and the ﬁrst 20 points were discarded to reduce the effect of initialization. The ﬁrst 15 lags of the data (e.g., xt1, xt2, . . ., xt15) were taken as potential candidate inputs. The MRMR algorithm was applied to each of the model and the results are shown in Table 5. The results in Table 5 are based on r = 1 and e = 3%. The input variables were correctly identiﬁed for all ﬁve models (Eqs. (16)–(20)). The results are insensitive to the value of r, but the number of inputs selected depends on the percent improvement threshold value of RMSE (e). By trial and error, a value of 3% is found appropriate for these test problems. Hence, one should test against several e values to decide on the appropriate number of input variables.

(1) AR1

xt ¼ 0:9 xt1 þ 0:866 et

ð16Þ

(2) AR4

xt ¼ 0:6 xt1 0:4 xt4 þ et

ð17Þ

(3) AR9

xt ¼ 0:3 xt1 0:6 xt4 0:5 xt9 þ et

ð18Þ

(4) TAR1 – threshold autoregressive order 1

xt ¼

0:9 xt3 þ 0:1 et

if xt3 6 0

0:4 xt3 þ 0:1 et

if xt3 > 0

ð19Þ

(5) TAR2 – threshold autoregressive order 2

xt ¼

0:5 xt6 þ 0:5 xt10 þ 0:1 et

if xt6 6 0

0:8 xt10 þ 0:1 et

if xt6 > 0

ð20Þ

In all test models, et is Gaussian random variant with a zero mean and unit standard deviation. For each of the models, 520 data

7. Case study Twenty-two reservoirs in California with daily release, storage, and inﬂow data are used in this study. The data span between January 1, 2004 and December 31, 2006 for all reservoirs. The reservoir selection process relies on the availability of observed release, storage, and inﬂow time series at the daily scale and continuity of the datasets. The output variable here is the current release value in a particular day (t). The input variables are a combination of past inﬂows, releases, and storages, and future inﬂows. Two categories of potential input variables are deﬁned: a long set and a short set. The former is used to further illustrate the competence of the MRMR algorithm while the latter set is used to further investigate the nature of the a and b coefﬁcients in the aMRMR and b-MRMR algorithms (Eqs. (10) and (11)). The two-input variable sets are selected with respect to their potential roles in reservoir operations following the work of Hejazi et al. [8]. The long set consists of 121 input variables to explain the uncertainty in current release. The variables are: past releases

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

(Rt1, Rt2, . . ., Rt30), storages (St1, St2, . . ., St30), and inﬂows (Qt1, Qt2, . . ., Qt30), plus current inﬂow (Qt), and future inﬂows (Qt+1, Qt+2, . . ., Qt+30). The short set consists of nine input variables: past 1-day and 2-day releases (Rt1 and Rt2), storages (St1 and St2), and inﬂows (Qt1 and Qt2), plus current inﬂow (Qt), and future day and 2-day inﬂows (Qt+1 and Qt+2). Note, inﬂow and release are deﬁned as averaged values while storage is deﬁned as the end of the period storage. Also future inﬂows are taken as inputs under the assumption of perfect forecast. To conﬁrm the quality of MRMR and the improved quality due to the use of a-MRMR and b-MRMR algorithms, we apply all three algorithms to the long data set for all 22 reservoirs introduced above. The objective here is to investigate if there exists a strong correlation between RMSE and the approximators of mutual information (e.g., U, Ua, and Ub). For each reservoir, we ﬁrst randomly select a prescribed number of input variables (e.g., 1, 2, . . ., 5) and then compute the corresponding RMSE and U, Ua, and Ub values. With 121 potential inputs, repeating the process for all possible combinations is unrealistic beyond the two inputs scenario. Thus, we run the process for 1000 combinations of the inputs for speciﬁed number of variables and for each reservoir. Based on each scenario of the 1000 simulations, Fig. 5 shows a strong negative correlation between RMSE and U, Ua, and Ub, respectively; data are shown for ﬁve reservoirs only but the rest exhibit similar trends. There clearly exists a strong inverse relationship between

589

RMSE and each of U, Ua, and Ub, making the approximate methods appropriate surrogates for the former. In other words, variables that yield high values of U, Ua or Ub have the tendency to induce low RMSE values. In the case of one input variable, U, Ua, and Ub all converge to the exact value of mutual information; thus, the correlation is strongest to RMSE. With more than one variable, U, Ua, and Ub are only approximates of the true dependence leading to lower correlations. When introducing a correction factor to the MRMR equation (e.g., a-MRMR, Eq. (10); b-MRMR, Eq. (11)), a gain is attained in the correlation strength between Ua (or Ub) and RMSE. Fig. 5 shows that a-MRMR and b-MRMR outperform MRMR by yielding stronger correlations to RMSE. Furthermore, although a-MRMR, and b-MRMR have different deﬁnitions of the correction factor, they produce the same gain in competence. Hence, whether we optimize the value of a or b, we attain the same improvement in correlation, e.g., Correl(RMSE, Ua) = Correl(RMSE, Ub). Further tests should be conducted with other cases in order to justify the advantage of the two formulations. To further illustrate the advantage of adopting the a-MRMR or b-MRMR, we show the results from an individual reservoir, Trinity Lake Reservoir. Fig. 6 shows the improvement in correlation with RMSE by narrowed spread of data. Selected inputs that exhibit high values of Ua and Ub are likely to yield low RMSE values. The milder slope of the relation between RMSE and Ub may reﬂect greater

Fig. 5. Comparison of the correlation measure between RMSE and U (MRMR), Ua (a-equivalent MRMR), and Ub (b-MRMR) averaged over all 22 reservoirs; data based on long set and 1000 simulations.

590

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

Fig. 6. Comparison of the relationship between RMSE and U (MRMR), Ua (a-MRMR), and Ub (b-MRMR); data are based on Trinity Lake Reservoir results and 1000 simulations.

robustness of Ub over Ua. A small discrepancy in the value of Ub would yield a smaller increase in RMSE than a small discrepancy in Ua. Thus, Ua and Ub reﬂect different levels of sensitivity to model accuracy (RMSE). Optimal values of a and b that yield the strongest correlation with RMSE for each of the 22 reservoirs with the long set are sum-

marized in Table 6A. The value of a increases and b decreases, with the increase of the number of input variables. However, since we only test for a randomly 1000 combinations out of a much larger pool of possible combinations, the optimal values of a and b may be inﬂuenced by the randomness of the selected combinations. To further conﬁrm the sensitivity of a and b to the increase in

Table 6A Summary of optimal values of a that yield the strongest correlation with RMSE for each of the 22 reservoirs for each of the long and short sets. Reservoir number

Long set

Short set

Number of input variables

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Number of input variables

1

2

3

4

5

6

1

2

3

4

5

6

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.065 0.171 0.170 0.213 0.125 0.066 0.256 0.840 0.252 0.272 0.319 0.281 0.147 0.169 0.210 0.212 0.155 0.161 0.237 0.250 0.136 0.047

0.120 0.271 0.317 0.345 0.234 0.093 0.376 1.000 0.480 0.455 0.376 0.789 0.248 0.311 0.327 0.357 0.269 0.368 0.496 0.398 0.235 0.086

0.153 0.314 0.395 0.415 0.332 0.118 0.436 1.000 0.602 0.533 0.393 0.781 0.320 0.348 0.426 0.437 0.348 0.578 0.655 0.495 0.300 0.123

0.190 0.368 0.509 0.501 0.314 0.127 0.517 1.000 0.718 0.624 0.430 0.748 0.422 0.347 0.480 0.543 0.382 0.752 0.814 0.549 0.337 0.155

0.240 0.382 0.546 0.500 0.399 0.144 0.558 1.000 0.748 0.623 0.438 0.619 0.422 0.350 0.580 0.564 0.400 0.953 0.938 0.585 0.385 0.173

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.065 0.172 0.144 0.198 0.101 0.926 0.201 0.216 0.182 0.292 0.238 0.151 0.153 0.083 0.318 0.234 0.141 0.128 0.186 0.387 0.204 0.068

0.116 0.310 0.312 0.356 0.154 0.508 0.347 0.262 0.312 0.613 0.215 0.235 0.271 0.137 0.253 0.418 0.314 0.276 0.389 0.571 0.286 0.130

0.145 0.427 0.510 0.490 0.212 0.354 0.527 0.284 0.395 0.901 0.300 0.200 0.374 0.165 0.320 0.564 0.593 0.443 0.717 0.766 0.380 0.176

0.163 0.521 0.777 0.606 0.261 0.253 0.746 0.281 0.455 1.000 0.407 0.191 0.455 0.188 0.436 0.677 1.000 0.647 1.000 0.936 0.472 0.218

0.163 0.585 1.000 0.701 0.283 0.171 1.000 0.264 0.509 0.952 0.525 0.198 0.504 0.213 0.581 0.755 1.000 0.939 1.000 1.000 0.557 0.261

591

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 6B Summary of optimal values of b that yield the strongest correlation with RMSE for each of the 22 reservoirs for each of the long and short sets. Reservoir number

Long set

Short set

Number of input variables

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Number of input variables

1

2

3

4

5

6

1

2

3

4

5

6

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0.065 0.171 0.170 0.213 0.125 0.066 0.256 0.840 0.252 0.272 0.319 0.281 0.147 0.169 0.210 0.212 0.155 0.161 0.237 0.250 0.136 0.047

0.060 0.136 0.159 0.172 0.117 0.047 0.188 1.000 0.240 0.228 0.188 0.394 0.124 0.156 0.164 0.178 0.134 0.184 0.248 0.199 0.117 0.043

0.051 0.105 0.132 0.138 0.111 0.039 0.145 1.000 0.201 0.178 0.131 0.260 0.107 0.116 0.142 0.146 0.116 0.193 0.218 0.165 0.100 0.041

0.048 0.092 0.127 0.125 0.078 0.032 0.129 0.495 0.180 0.156 0.107 0.187 0.106 0.087 0.120 0.136 0.095 0.188 0.204 0.137 0.084 0.039

0.048 0.076 0.109 0.100 0.080 0.029 0.112 0.480 0.150 0.125 0.088 0.124 0.084 0.070 0.116 0.113 0.080 0.191 0.188 0.117 0.077 0.035

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0.065 0.172 0.144 0.198 0.101 0.926 0.201 0.216 0.182 0.292 0.238 0.151 0.153 0.083 0.318 0.234 0.141 0.128 0.186 0.387 0.204 0.068

0.058 0.155 0.156 0.178 0.077 0.254 0.174 0.131 0.156 0.307 0.107 0.118 0.136 0.069 0.126 0.209 0.157 0.138 0.194 0.286 0.143 0.065

0.048 0.142 0.170 0.163 0.071 0.118 0.176 0.095 0.132 0.300 0.100 0.067 0.125 0.055 0.107 0.188 0.198 0.148 0.239 0.255 0.127 0.059

0.041 0.130 0.194 0.151 0.065 0.063 0.187 0.070 0.114 0.251 0.102 0.048 0.114 0.047 0.109 0.169 0.279 0.162 0.336 0.234 0.118 0.055

0.033 0.117 0.244 0.140 0.057 0.034 0.206 0.053 0.102 0.190 0.105 0.040 0.101 0.043 0.116 0.151 0.497 0.188 0.615 0.212 0.111 0.052

the number of input variables, we determine the values of a and b based on the previously described short set of input variables (Table 6B). In the case of the short set, enumerating all possible combinations is feasible, although this is impossible with the long list of 121 input variables. Note, a and b behave differently when the number of variables increases; the value of a and b generally increases and decreases, respectively, with the increase of the number of variables for each individual reservoir (Tables 6A and 6B). When averaged over all 22 reservoirs, the mean optimal values of a and b that yield the strongest correlation with RMSE increase and decrease, respectively, with the increase of number of variables (Fig. 7). Recall, a is a measure of the relative importance of redundancy term in the MRMR equation (Eq. (10)) while b is a measure of the relative redundancy as a fraction of the total redundancy term (Eq. (11)). Hence, when dealing with the single variable case, a approaches zero and b approaches 1. As the number of variables increases, the importance of the redundancy term in aMRMR increases as depicted by higher a values. Similarly, as the number of variables increases, the fraction of relative redundancy to total redundancy decreases as depicted by the lower b values. This is clearly shown in Fig. 8. A ﬁtting curve is imposed on top of each of the long and short sets as shown in Fig. 8. Note, although both a and b follow a power-law form, they exhibit opposite trends with the increase of input variables. Table 7 summarizes the ﬁtting parameters associated with each of the two sets and each of the two correction factors, a and b. Hence, each of the two ﬁtting models of a and b requires a single ﬁtting parameter, a (see Table 7). To further illustrate the capability of MRMR and its variants as input selection algorithms, we apply the algorithms to a single reservoir (Trinity Lake Reservoir) and utilize the ﬁtting models in Table 7. We assume that the stopping criterion is based on a minimum improvement in RMSE of 1% (e.g., e = 1%), and we also assume the systems model is a regular back-propagation ANN with 50% of the data used in training and the remaining 50% in testing to avoid overﬁtting. Starting with the 121 potential inputs coupled with a single hidden node and a single hidden layer ANN model, we employ ﬁve scenarios as described in Table 8 to evaluate the performance of the input selection algorithms in identifying input variables.

Fig. 7. Mean values of a and b averaged over all 22 reservoirs; long set: 121 potential input variables and 1000 simulations, short set: nine mix variables; with the increase of input variables a increases while b decreases.

Fig. 9 compares the ANN model performance under each of the ﬁve scenarios. The baseline scenario is used as a reference point with a ﬁxed number of inputs a prior. The MRMR and a-MRMR scenarios identify fewer input variables but induce a worse RMSE than the baseline scenario. The b-MRMR and the ‘optimal’ scenarios (Scenarios 4 and 5, respectively) ﬁnd a combination of four and three input variables which yield a lower RMSE value than the baseline scenario. The b-MRMR identiﬁes a combination of inputs that yield a better model performance with fewer inputs than the baseline scenario. This is important, because it supports the claim that input selection algorithms can reduce unnecessary model complexity and improve competence. Table 9 lists the set of inputs identiﬁed by each of the ﬁve scenarios. Note, MRMR, a-MRMR, and b-MRMR share the same three input variables (Rt1, Rt2, Rt3)

592

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593

Fig. 8. Fitted models to the a and b values attained from the long and short sets.

Table 7 Summary of model ﬁtting parameters and statistics for the long and short set cases. Algorithm

Model

a-MRMR

a¼1

b-MRMR

b ¼ xai

xai

Dataset

a

R2

Long Short Long Short

0.4074 0.4251 1.5742 1.6627

0.9956 0.9577 0.9639 0.9638

x is the number of input variables considered at iteration i.

and only differ on one variable (Table 9). These three scenarios are approximation algorithms of mutual information; hence, they would not necessarily end up with identical solutions. 8. Conclusions Input selection can be a challenging step in modeling and correctly identifying the most relevant inputs to describe a water resources system. It can lead to improved model accuracy without inducing unnecessary model complexity. In this paper, we propose an algorithm of approximation to mutual information to pinpoint the set of inputs that contains the greatest amount of information about the uncertainty of a system. Speciﬁcally, we employ the MRMR algorithm and further introduce two variants of MRMR (e.g., a-MRMR or b-MRMR) as IVS algorithms. a-MRMR and b-MRMR both outperform the MRMR algorithm in yielding stronger correlation to RMSE from simulating a GRNN model. a increases while b decreases approximately in a powerlaw form when the number of input variables increases. When dealing with a single input variable, both a-MRMR and b-MRMR become equivalent to mutual information (maximum dependence)

Fig. 9. Comparison among the various input selection algorithms (MRMR, a-MRMR, and b-MRMR) in identifying the most important set of inputs for an ANN model for the daily release for Trinity Lake Reservoir; inputs are selected from the long set of inputs (121); the values of a and b are determined from Table 4; baseline scenario is based on a pre-selected set of ﬁve inputs variables; the ‘optimal’ scenario is based on simulating all possible combinations incrementally to ﬁnd the global best of set of inputs; here, e = 1%.

between two random variables, and a becomes zero while b becomes one. The optimal a and b values have been tested for 22 reservoirs and two sets of inputs (long and short) and further research is needed to identify any other factors that may inﬂuence the shape of changing rate of a and b with the increase in the number of input variables. Thus, further research may be directed to other data sets

Table 8 Summary of scenarios in selecting the most relevant inputs to predict the daily release of the Trinity Lake Reservoir. Scenario number

Name

Description

1

Baseline

2 3 4 5

MRMR a-MRMR b-MRMR ‘Optimal’

Selecting ﬁve inputs (one with highest mutual information from each of the ﬁve categories of input variables, e.g., past release, past storage, past, current, and future inﬂow) Inputs selected incrementally with the MRMR algorithm and e = 1% Inputs selected incrementally with the a-MRMR algorithm and e = 1% Inputs selected incrementally with the b-MRMR algorithm and e = 1% Inputs selected by testing all combinations of inputs with ANN to ﬁnd the global optimal set of inputs; stopping with e = 1%

M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 9 Summary of the selected input variables to predict the daily release of the Trinity Lake Reservoir. Scenario number

Name

Number of selected input variables

Selected input variables

1 2 3 4 5

Baseline MRMR a-MRMR b-MRMR ‘Optimal’

5 4 3 4 2

Qt, Rt1, St24, Qt24, Qt+30 Rt1, Qt1, Rt3, Rt2 Rt1, Rt3, Rt2 Rt1, St24, Rt3, Rt2 Rt1, St30, Rt3

to justify if the detected functional forms of the two correction factors are universal to other data sets. Also, since the ﬁtted models for a and b just require a single ﬁtting parameter (a), one may simply re-perform the modeling exercise with various values of a, until satisfactory model accuracy is achieved. The stopping criterion adopted in this study is based on whether a small e threshold in RMSE improvement is met. The RMSE value for model predictions is obtained using a GRNN model. The GRNN, however, can be replaced by any other models. Although the percent improvement threshold stopping criterion sufﬁces in this study, additional research on constructing a more objective stopping criterion may be beneﬁcial. In this study, an important assumption is that GRNN is qualitatively identical to mutual information between a set of variables and another variable (Fig. 4). GRNN may pose some limitations as it requires the selection of a smoothness factor and the availability of sufﬁcient data to accurately represent the ‘true’ relationship among variables. To avoid the necessity of this assumption, further research may be directed at using synthetic data with known mutual information. The two proposed IVS algorithms (a-MRMR and b-MRMR) may be employed in any modeling exercises. Their premise lies in their ability to identify the most relevant input variables to a modeling system at a very efﬁcient rate. In doing so, modelers can construct models to be as complex as necessary leaving out any unnecessary inputs. Acknowledgements The ﬁrst author wishes to extend his thanks to Dr. Momcilo Markus for his insightful discussions at the early stage of this work. The authors are very grateful to Benjamin Ruddell for his constructive comments and proofreading the manuscript. The authors wish also to thank Dr. Hayri Önal, Yonas Demissie, and Jiing-Yun You for reviewing an earlier version of the manuscript. Partial ﬁnancial support for this research was provided by US National Science Foundation (NSF) grant CBET 0747276.

593

References [1] Ataie-Ashtiani B, Hassanizadeh SM, Celia MA. Effects of heterogeneities on capillary pressure–saturation–relative permeability relationships. J Contam Hydrol 2002;56:175–92. [2] Bowden GV, Dandy GC, Maier HR. Input determination for neural network models in water resources applications. Part 1 – Background and methodology. J Hydrol 2005;301:75–92. [3] Carlson TN, Arthur ST. The impact of land use–land cover changes due to urbanization on surface microclimate and hydrology: a satellite perspective. Glob Planet Change 2000;25:49–65. [4] Cover TM, Thomas JA. Elements of information theory. New York, USA: John Wiley and Sons Inc.; 1991. [5] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the second IEEE computational systems bioinformatics conference; 2003. p. 523–8. [6] Francios D, Rossi F, Wertz V, Verleysen M. Resampling methods for parameterfree and robust feature selection with mutual information. Neurocomputing 2007;70:1276–88. [7] Harnold TI, Sharma A, Sheather S. Selection of kernel bandwidth for measuring dependence in hydrologic time series using the mutual information criteria. Stoch Environ Res Risk Assess 2001;15:310–24. [8] Hejazi MI, Cai X, Ruddell BL. The role of hydrologic information in reservoir operation – learning from historical releases. Adv Water Resour 2008;31: 1636–50. [9] Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and minimum redundancy. IEEE Trans Pattern Anal Mach Intell 2005;27(8):1226–38. [10] Poveda G, Mesa OJ. Feedbacks between hydrological processes in tropical south america and large-scale ocean–atmospheric phenomena. J Climate 1997;10:2690–702. [11] Rossi F, Lendasse A, Francios D, Wertz V, Verleysen M. Mutual information for the selection of relevant variables in spectrometric nonlinear modeling. Chemomet Intell Lab Syst 2006;80:215–26. [12] Ruddell BL, Kumar P. Ecohydrologic process networks. Part 1 – Identiﬁcation. Water Resour Res. doi:10.1029/2008WR007279. [13] Sankarasubramanian A, Vogel RM. Hydroclimatology of the continental United States. Geophys Res Lett 2003;30(7):1363. [14] Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948;27:379–423, 623–56. [15] Sharma A. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management. Part 1 – A strategy for system predictor identiﬁcation. J Hydrol 2000;239:232–9. [16] Specht DF. A general regression neural network. IEEE Trans Neural Network 1991;2(6):568–76. [17] Stokelj T, Paravan D, Golob R. Enhanced artiﬁcial neural network inﬂow forecasting algorithm for run-of-river hydropower plants. J Water Res Plan Manage 2002;128(6):415–23. [18] Strobl RO, Forte F. Artiﬁcial neural network exploration of the inﬂuential factors in drainage network derivation. Hydrol Process 2007;21:2965–78. [19] Tana CO, Bekliogluc M. Modeling complex nonlinear responses of shallow lakes to ﬁsh and hydrology using artiﬁcial neural networks. Ecol Model 2006;196:183–94. [20] Tomandl D, Schober A. A modiﬁed general regression neural network (MGRNN) with new efﬁcient training algorithms as a robust ‘black box’ – tool for data analysis. Neural Network 2001;14:1023–34. [21] Venn John. On the diagrammatic and mechanical representation of propositions and reasonings. Philos Mag J Sci 1880;9(59). [22] Wallender WW, Grismer ME. Irrigation hydrology: crossing scales. J Irrig Drain Eng 2002;128(4):203–11. [23] Wood EF. Effects of soil moisture aggregation on surface evaporative ﬂuxes. J Hydrol 1997;190:397–412.

Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm

Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm

Recommend Documents