Advances in Water Resources 32 (2009) 582–593
Contents lists available at ScienceDirect
Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres
Input variable selection for water resources systems using a modified minimum redundancy maximum relevance (mMRMR) algorithm Mohamad I. Hejazi, Ximing Cai * Ven-Te Chow Hydrosystems Laboratory, Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, 205 N. Mathews Avenue, Urbana, IL 61801, United States
a r t i c l e
i n f o
Article history: Received 3 September 2008 Received in revised form 20 January 2009 Accepted 20 January 2009 Available online 4 February 2009 Keywords: Mutual information Input selection MRMR mMRMR Modeling
a b s t r a c t Input variable selection (IVS) is a necessary step in modeling water resources systems. Neglecting this step may lead to unnecessary model complexity and reduced model accuracy. In this paper, we apply the minimum redundancy maximum relevance (MRMR) algorithm to identifying the most relevant set of inputs in modeling a water resources system. We further introduce two modified versions of the MRMR algorithm (a-MRMR and b-MRMR), where a and b are correction factors that are found to increase and decrease as a power-law function, respectively, with the progress of the input selection algorithms and the increase of the number of selected input variables. We apply the proposed algorithms to 22 reservoirs in California to predict daily releases based on a set from a 121 potential input variables. Results indicate that the two proposed algorithms are good measures of model inputs as reflected in enhanced model performance. The a-MRMR and b-MRMR values exhibit strong negative correlation to model performance as depicted in lower root-mean-square-error (RMSE) values. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Input selection is an initial, necessary step of any modeling exercise and a proper selection dictates the modeling accuracy of systems. Selecting irrelevant inputs can greatly influence model accuracy and add unnecessary model complexity influencing model reliability. In the hydrology and water resources arena, a hydrologist is often faced with the challenge of preprocessing a huge set of possible inputs into a hydrologic system. However, many hydrologic variables possess redundant information and not all variables are relevant or even necessary [2]. A proper input selection mechanism is of particular importance when dealing with data-driven models with unknown sets of inputs. For example, Bowden et al. [2] illustrated the importance of input selection in the context of employing artificial neural networks (ANN) models in water resources applications. They recognized input selection in ANN as one of the most important steps in the model development process. They further stated that selecting only the most relevant input variables in a system could potentially reduce input dimensionality and consequently reduce computational complexity, ease the learning process of ANN, improve accuracy, and enable better understanding of the ‘true’ driving forces of the modeled system. However, an effective method that can measure correlation (linear and non-linear) between a set of potential input variables and an output variable is still needed [7,2,9].
* Corresponding author. E-mail address:
[email protected] (X. Cai). 0309-1708/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2009.01.009
Input variable selection (IVS) methods generally rely on one of the following techniques: using experts’ knowledge of the system, linear cross-correlation, heuristic techniques, sensitivity analysis, or information-theoretic measures [2, and references therein]. In this paper, we build on the use of information-theoretic measures to quantify the level of inference among several random variables. The goal is to craft an algorithm to correctly identify the set of input variables that collectively possess the largest amount of information about the system being modeled without including irrelevant input variables. In the rest of the paper, we first present the background on IVS techniques. Following that we layout the methodology framework proposed in the paper and its merits; apply the minimum redundancy maximum relevance (MRMR) algorithm to the synthetic test problems used by Sharma [15] and subsequently by Bowden et al. [2] to compare its performance to the two previous studies and to validate its selection accuracy. Next we introduce a real world case study to illustrate the performance of the newly proposed approach, and finally we discuss the results and their implications followed by the conclusion. 2. Background In hydrology, highly non-linear relationships have been reported in the literature in areas such as: hydroclimatology [10], irrigation hydrology [22], contaminant hydrology [1], ecological processes [19], rainfall–runoff processes [13], surface fluxes and surface moisture availability [3], and infiltration, evaporation and
583
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
runoff processes [23]. Since relationships in hydrology are typically non-linear, using standard linear correlation methods may only capture linear relationships and misleadingly overlook the existence of non-linear relationships [7]. The use of information theory [14] – and more specifically mutual information – to measure linear and non-linear correlation between two variables is becoming increasingly popular due to its basis as an information-theoreticbased relevance criterion [6]. Estimating the mutual information between two random variables requires the determination of their joint probability density function with sufficient data. Existing methods of estimating the joint probability density function include building histograms (e.g., [8,12]) and using kernel estimation techniques (e.g., [15,2]). However, as the number of random variables increases, the computational reliability of multidimensional mutual information quickly becomes impractical [9,11]. Additionally, hydrologic and water resources systems often involve many input variables. Motivated by this shortcoming of computing mutual information among several random variables, research efforts have expanded to other techniques such as partial mutual information (PMI) [15,2], maximum dependence, maximum relevance (or average mutual information), minimum redundancy, and minimum redundancy maximum relevance [9]. We briefly describe each of the methods next.
premise of his approach is to sequentially select an additional input variable while extracting the information explained by the previously selected variables using conditional expectations. The PMI method as described by Sharma [15] requires the computation of conditional expectations under the assumption of Gaussian for the joint probability of the variables. This assumption is likely invalid for non-linear systems. Furthermore, the conditional expectation calculations involve inverting a matrix, which may become burdensome as the dimension of input variables increases. More recently, Bowden et al. [2] used general regression neural network (GRNN) as an approximated measure of the joint probability density function. GRNN is a feed-forward artificial neural network (ANN) that can capture non-linear relationships between the inputs and outputs. Unlike standard ANNs, GRNN does not require the prior selection of a network structure, and is computationally efficient [2]. However, the PMI method with GRNN is inefficient if working with a relatively large set of input variables since it would require running GRNN between each of the remaining and each of the pre-selected variables at each iteration.
2.1. Mutual information
Another frontier in the advancement of IVS algorithms is on establishing approximate methods to calculate mutual information among several random variables. Theoretically, computing mutual information among a set of variables is equivalent to measuring their dependency; thus, determining the set of inputs variables with the highest mutual information is a dependency maximization problem. This process can be implemented using the maximum dependency algorithm [9]. The maximum dependency algorithm computes the dependency as the mutual information between a set of m input variables (Sm) and the output variable Y in a forward fashion until the addition of any new input variable (Xi) becomes negligible:
Mutual information is a measure of the linear and non-linear dependence between a set of variables. Mutual information between two random variables is a measure of the information one random variable explains about the other. It takes a minimum value of zero when no dependence exists between the two variables and a positive value when a strong dependence (linear or non-linear) exists between the two variables. Mutual information between two random variables X and Y can be quantified as shown in the following equation [4]:
IðX; YÞ ¼
Z Z
pðx; yÞ log
pðx; yÞ dx dy pðxÞ pðyÞ
ð1Þ
where x and y represent realizations of X and Y, I(X; Y) is the mutual information between the two random variables X and Y, p(x, y) is their joint probability mass function, and p(x) and p(y) are the marginal probability mass functions of X and Y, respectively. Measuring mutual information between a set of input variables {Xi 2 Sm, i = 1, . . ., m} and an output variable Y can be estimated by Eq. (2) where Sm is the set of m input variables:
Z Z
pðSm ; yÞ pðSm ; yÞ log dSm dy pðSm Þ pðyÞ Z Z ¼ pðx1 ; . . . ; xm ; yÞ
IðSm ; YÞ ¼
log
pðx1 ; . . . ; xm ; yÞ dx1 dxm dy pðx1 Þ pðxm Þ pðyÞ
ð2Þ
Although mutual information is a robust measure of non-linear correlation between a set of random variables [6], its applicability diminishes beyond the two-variable case [11]. The challenge lies in the need to reliably estimate joint probabilities of the dimension of the number of variables at stake. For instance, computing the mutual information among four variables requires one to determine a four-dimensional joint probability distribution.
2.3. Maximum dependency, maximum relevance, and minimum redundancy
maximize IðSm ; YÞ;
I ¼ IðfX i ; i ¼ 1; . . . ; m; YgÞ
ð3Þ
That is when no additional input variable will yield an improvement in mutual information (I) that is higher than a prescribed threshold value e, the algorithm stops. Sequentially, the algorithm finds the best single most important input variable, then the second, third, etc., most important variables until the uncertainty reduction of the output variable Y becomes less than e. This algorithm is not computationally reliable when it involves computing joint probability distributions of high dimensionality. To rectify this limit, a commonly used approximation of maximum dependency algorithm is the maximum relevance (D) algorithm – also called average mutual information (AMI) [17] – as presented in Eq. (4). The maximum relevance algorithm works in a similar manner to the maximum dependency algorithm. It, however, assumes that the input variables {xi 2 Sm, i = 1, . . ., m} are independent – an assumption often violated when autocorrelated temporal and spatial variants of hydrologic input variables are considered in the selection process. This algorithm does not account for any redundancy among the input variables, thus, lacks the discriminative power to avoid selecting redundant input variables:
maximize DðSm ; YÞ;
D¼
1 X IðX i ; YÞ m X 2S i
ð4Þ
m
2.2. Partial mutual information Recently, Sharma [15] introduced the concept of partial mutual information (PMI) to compute the partial amount of mutual information attained with the addition of a new input variable. The
To overcome the shortcoming of not accounting for redundancy, Ding and Peng [5] introduced the minimum redundancy (R) algorithm which seeks finding the most mutually exclusive set of input variables (Eq. (5)). The minimum redundancy (R) algorithm alone
584
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
2.4. Minimum redundancy maximum relevance (MRMR) A more effective approach is to combine the maximum relevance and minimum redundancy algorithms. Peng et al. [9] proposed the idea of combining the maximum relevance and minimum redundancy algorithms into a single maximization problem of the form in Eq. (6), where D and R are defined in Eqs. (4) and (5), respectively, and U is an approximation of the true mutual information value I:
max UðD; RÞ; Fig. 1. Definition of mutual information in the context of Venn diagrams.
is a poor estimate of mutual information because a mutually exclusive set of input variables may very well possess no dependence with the output variable Y:
minimize RðSm Þ;
R¼
1 X IðX i ; X j Þ m2 X ;X 2S i
j
m
ð5Þ
U¼DR
ð6Þ
They named the combined method the minimum redundancy maximum relevance (MRMR) algorithm. They showed that it works efficiently even for a relatively large set of inputs and provided an analytical proof that a first order MRMR model collapses to a maximum dependency problem. Although the method is an approximation, intuitively it should yield a better selection algorithm than either the maximum relevance or minimum redundancy algorithms alone. The maximum relevance drives the selection process in favor of the most relevant set of variables with no attention to redundancy while the minimum redundancy favors input variables that
Fig. 2. Venn diagram representation of the mutual information between 2, 3, 4, and 5 random variables; here, Y is the dependent variable and X1, . . ., X4 are the independent variables.
585
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 1 Summary of the Venn areas corresponding to measures of mutual information (I) for the scenarios of 1, 2, 3, and 4 input variables. Scenario number
Mutual information (I(Xi; Y), for i = 1, . . ., 4)
1 2 3 4
I(X1; Y) = AB I(X1, X2; Y) = AB + AC + ABC I(X1, X2, X3; Y) = AB + AC + AD + ABC + ABD + ACD + ABCD I(X1, X2, X3, X4; Y) = AB + AC + AD + AE + ABC + ABD + ACD + ABE + ACE + ADE + ABCD + ABCE + ABDE + ACDE + ABCDE
are irrelevant to one another without any attention to how important they are to the output variable (e.g., Y). MRMR drives the selection process based on a balance between these two selection processes favoring those variables that bring high relevance and low redundancy on average. This gives the MRMR algorithm a discriminative power that allows it to avoid selecting redundant variables. However, the relative importance of the R and D terms in Eq. (6) changes with the increase of the number of selected input variables. To explain why theoretically this should be true, we use Venn diagrams and simple algebra to illustrate the premise of a modified MRMR algorithm next. Additionally, in this paper, we use Venn diagrams to justify a new definition of the redundancy (R) term (Eq. (5)) in MRMR (Eq. (6)). 3. Modified minimum redundancy maximum relevance (mMRMR) In this paper, we represent the concept of mutual information in the context of Venn diagrams. Venn diagrams were first introduced in 1880 by John [21] and have seen great use since then in many applications. As shown in Fig. 1, each circle resembles the uncertainty contained in each random variable and the overlapped areas represent the explained (shared) uncertainty based on knowledge of other random variables. Mathematically, the mutual information, I, between variables X and Y is equal to the sum of marginal entropies of each variable, H(X) + H(Y), minus their joint entropy, H(Y, X) (Eq. (7)). Hence, the joint entropy term is represented by the union of the two Venn circles (Fig. 1). Similarly, the mutual information between a set of two random variables (X1, X2) and a third random variable (Y) can be computed based on Eq. (8), which includes a three-dimensional joint entropy term:
IðX 1 ; YÞ ¼ HðXÞ þ HðYÞ HðY; XÞ
ð7Þ
IðX 1 ; X 2 ; YÞ ¼ HðYÞ þ HðX 1 ; X 2 Þ HðX 1 ; X 2 ; YÞ
ð8Þ
For simplicity, suppose Y is an output variable and X1, . . ., X4 are four input variables, and areas encompassed by variables Y, X1, X2, X3, and X4 are referred to as A, B, C, D, and E, respectively (Fig. 2). For Scenario 1 in Fig. 2, AB represents the mutual information between X1 and Y, where AB refers to the intersection between A and B only. For Scenario 2, however, AB + AC + ABC is the mutual information between the two input variables X1 and X2, and the output variable Y. Similarly, for Scenarios 3 and 4, the area of mutual information is summarized in Table 1.
Table 2 Summary of the Venn areas corresponding to measures of mutual information (I), maximum relevance (D), minimum redundancy (R), and MRMR (U) for the three random variables scenario; D, R, and U are computed without normalizing the D and R quantities. Term
Eq. (5)
Eq. (9)
D R
AB + AC + 2ABC B + C + AB + AC + 4BC + 4ABC (B + C + 4BC + 2ABC) AB + AC + ABC
AB + AC + 2ABC BC + ABC AB + AC + ABC BC AB + AC + ABC
U I
REq. (5) = I(X1; X2) + I(X2; X1) + I(X1; X1) + I(X2; X2); REq. (9) = I(X1; X2).
Note that computing R based on Eq. (5) (including self-entropies), e.g., I(Xi, Xi) would exacerbate the MRMR approximation of I. For the sake of clarity, suppose D (Eq. (4)) and R (Eq. (5)) are total quantities instead of averaged quantities. For instance, for Scenario 2, I = AB + AC + ABC (Table 1). Self-entropies are incorporated in the definition of R (as defined in Eq. (5)) and subsequently U (see Table 2). Then R expands from only two terms (BC and ABC) to include other terms such as B, C, AC, and AB. Specifically, R = B + C + AB + AC + 4BC + 4ABC and U = (B + C + 4BC + 2ABC). Hence, terms B and C have no relevance to the uncertainty in the output variable Y. Furthermore, U does not include either AB or AC, and also takes ABC as a negative quantity (Table 2). Thus, including the self-entropy terms in R (when i = j) and consequently in U renders the approximation of I inappropriate. Since mutual information is a symmetric metric (I(Xi; Xj) = I(Xj; Xi)), we also exclude symmetric terms (i P j) when computing R. In this paper, we define R as indicated in Eq. (9). The redundancy (R) term in Eqs. (5) and (6) is normalized by the number of possible combinations. Hence, to account for the omitted self-entropy (i = j) and symmetric (i P j) terms, the denominator term (m2) (Eq. (5)) becomes 2(m2 m) (Eq. (9)). This is the difference between our definition of MRMR and the definition presented by [9]:
minimize RðSm Þ;
R¼
X 1 IðX i ; X j Þ mÞ X ;X 2S
2ðm2
i
j
ð9Þ
m
i
Based on the modified definition of R, we compute the areas for the mutual information (I), total relevance (D), total redundancy (R), and MRMR (U) for the one-, two-, three- and four-input variable scenarios (Table 3). In Table 3, when we have one-input variable (X1), both D and U are perfect estimates of I. For the case of two input variables (X1, X2), D overestimates while U underestimates the value of I. However, we argue that the latter is a better approximation than the former in input variable selection. D overestimates I by double-counting the ABC region, thus misleadingly indicating that X1 and X2 explain more of the uncertainty in Y than they actually do. U, however, underestimates I by subtracting a new term (BC) which reflects the redundancy among the two input variables (X1 and X2). Although subtracting BC underestimates the ‘true’ measure of dependence, it serves in favor of selecting input variables that are as different as possible (e.g., the lowest redundancy). This is favorable in statistical model input selection because it simplifies model structures and avoids irrational model functional forms. As the number of variables increases, more weight is given to avoid redundancy among the selected variables. The addition of each new variable is assessed in term of its dependency (e.g., mutual information) to the output variable Y and to the set of pre-selected input variables (Xi). In Table 3, U is computed (Eq. (6)) without normalizing the D and R quantities as described in Eqs. (4) and (9), respectively. 3.1. Merits of mMRMR The MRMR algorithm assigns equal weights to the relevance (D) and redundancy (R) terms in Eq. (6); however, as illustrated in Table 3, the redundancy term is an overestimate of the ‘true’ redundancy. For instance, for the two-input variable case (Scenario 2),
586
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
Table 3 Summary of the Venn areas corresponding to measures of mutual information (I), maximum relevance (D), minimum redundancy (R), and MRMR (U) for each of the four scenarios illustrated in Fig. 2; D, R, and U are computed without normalizing the D and R quantities; Xi: input variables and Y: is the dependent variable; values are coefficients multiplied by their corresponding areas; e.g., for Scenario 2, U = AB + AC BC + ABC.
only ABC is relevant redundancy while BC is irrelevant redundancy. Here we define redundancy (or total redundancy, R) as the sum of relevant (RR) and irrelevant (IR) redundancies. We define relevant redundancy (RR) as the overlapped areas (information) between a P set of input variables and an output variable, e.g., RR ¼ i ðX i \ YÞ; we define irrelevant redundancy (IR) as the overlapped areas (information) among the input variables but not with the output P PP variable, e.g., IR ¼ i j ðX i \ X j Þ i ðX i \ YÞ. Accounting for the irrelevant redundancy terms grants the MRMR algorithm a tendency to favor uncorrelated variables. Ideally, the redundancy term (R) should include only the relevant redundancy (RR) terms and exclude the irrelevant redundancy (IR) terms. Thus, we introduce a as a correction factor of the redundancy term in the original form of the MRMR equation form (Eq. (6)). We refer to the new form as the a-MRMR algorithm:
maximize Ua ðSm ; YÞ;
Ua ¼ D a R
Factor a is only indicative of the relative importance of dependency and redundancy on average; however, it does not distinguish between the relevant (RR) and irrelevant (IR) redundancies. Thus, we introduce another formulation of mMRMR and to avoid confusion with a-MRMR (Eq. (10)), we call the new form b-MRMR (Eq. (11)). Unlike the previous formulation (Eq. (10)), in Eq. (11), D and R are not normalized. Here again, we define b as the correction factor of the redundancy term (Eq. (12)). From the redundancy point of view, when b = 0, it implies that all captured redundancy is due to irrelevant redundancy (IR); when b = 1, it implies that all captured redundancy is due to relevant redundancy (RR):
maximize Ub ðSm ; YÞ;
Ub ¼ m D b 2ðm2 mÞ R ¼
ð10Þ
where D and R are defined in Eqs. (4) and (9), respectively. Hence, a is a quantity ranging between zero and one. When a = 0, the aMRMR algorithm collapses to a maximum relevance algorithm; when a = 1, the a-MRMR algorithm becomes equivalent to the MRMR algorithm. Furthermore, D and R are defined as average quantities of the relevance and redundancy that a set of inputs possesses. In that sense, a is a weight that indicates the relative importance of dependency and redundancy on average on the selection process, and it may vary as the number of selected variables increases. Therefore, one could repeat the a-MRMR run for different a values and different numbers of variables to select the ‘best’ set of values.
X
IðX i ; YÞ b
xi 2Sm
b¼
RR RR ¼ RR þ IR R
X
IðX i ; X j Þ
ð11Þ
X i ;X j 2Sm i
ð12Þ
Theoretically, one could compute exactly the value of the b coefficient for a different number of variables. Based on the results in Table 3 and employing Eq. (12), the exact formula for each of the four discussed cases is determined (see Table 4). b is equal to one under the one-input variable case (Scenario 1) but quickly diminishes as the number of variables considered increases. That is to say, the amount of irrelevant redundancy (IR) dominates the redundancy term R as the number of variables increases. This will be verified through a case study.
587
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 4 Venn-based theoretical definitions of the b terms for Scenarios 1–4. Scenario number
b
1 2 3
b=1 ABC b¼ ABC þ BC 2ABCD þ ABC þ ABD þ ACD b¼ 3ABCD þ ABC þ ABD þ ACD þ 3BCD þ BC þ BD þ CD
4
b¼
3ABCDE þ 2ðABCD þ ABCE þ ABDE þ ACDEÞ þ ðABC þ ABD þ ABE þ ACD þ ACE þ ADEÞ 6ABCDE þ 3ðABCD þ ABCE þ ABDE þ ACDEÞ þ ðABC þ ABD þ ABE þ ACD þ ACE þ ADEÞ þ 6BCDE þ 3ðBCD þ BCE þ BDE þ CDEÞ þ ðBC þ BD þ BE þ CD þ CE þ DEÞ
4. Feed-forward incremental modified minimum redundancy maximum relevance In practice, one is often faced with the need to select from among a huge set of input variables and enumerating all possible combinations may be computationally prohibitive. Although computing Eq. (6)is a quick calculation, the need to repeat the compuS tations times can be computationally overwhelming. m Suppose we have a set of 50 potential input variables and we want to select the best 10 variables, then we have to enumerate about 1.03 1010 times. A more practical and computationally feasible approach is to follow a feed-forward incremental approach, by selecting the ‘best’ input variable and then selecting the second next ‘best’ variable based on the previously selected variable and so on until any additional variable would add no value to explaining the uncertainty of Y. The feed-forward incremental MRMR model form is presented in Eq. (13). This equation can be particularly useful when the number of variables is very large and one cannot afford to enumerate all possible combinations:
2 maximize Uincr ðS; YÞ;
Start
Initialization
Input Selection Algorithm (e.g., MRMR, α – MRMR, β –MRMR)
Increment number of allowed inputs by 1
3
6
1
Uincr ¼ 6 4IðX j ; YÞ m 1
X X i ;X j 2Sm1 i
7 IðX j ; X i Þ7 5
NO
Systems Model (e.g., GRNN)
Δ RMSE < ε YES
ð13Þ Similarly, we introduce the feed-forward incremental mMRMR equations (a-MRMR and b-MRMR) as described in Eqs. (14) and (15). Hence, the values of a and b need to be pre-specified prior to running either of the two algorithms. In this study, we shed light on the nature of those coefficients with the increase of the number of input variables:
2 max Uincr a ðS; YÞ;
6
1
2 max Uincr b ðS; YÞ;
6
Uincr ¼6 b 4IðX j ; YÞ b
X X i 2Sm1 i
X X i 2Sm1 i
7 IðX j ; X i Þ7 5 ð14Þ 3
7 IðX j ; X i Þ7 5
Fig. 3. Schematic of the input selection algorithm (e.g., MRMR, a-MRMR, and bMRMR) coupled with a modeling component (e.g., GRNN).
prove the RMSE error value beyond a pre-specified threshold value,
e. As a demonstrating example, the general regression neural net-
3
6 Uincr a ¼ 4IðX j ; YÞ a m1
End
ð15Þ
In real applications, one does not have prior knowledge of the number of variables that explain the most about the uncertainty of a particular variable of interest. Thus, the selection algorithm needs to include a stopping criterion or mechanism to iteratively increase the number of input variables until a pre-specified condition has been satisfied. MRMR and its variants are selection algorithms that identify a set of variables but have no means of specifying a stopping point. A logical stopping mechanism is to couple the input selection algorithm (e.g., MRMR, a-MRMR, b-MRMR) with a simulation model of the system to iteratively evaluate the model performance with each additional input variable until a percent improvement in a prescribed modeling accuracy term such as the root-mean-square-error (RMSE) is satisfied (Fig. 3). The IVS algorithm stops when any additional input variable would not im-
work (GRNN) is used as the systems model as shown in Fig. 3. 5. The general regression neural network (GRNN) The general regression neural network (GRNN) is a fast feedforward artificial neural network (ANN) first developed by Specht [16]. It has a fixed structure and thus avoids the subjective selection of a network structure, runs very efficiently, and can capture linear and non-linear relationships [2]. Thus, it is commonly used as a universal approximator for smooth functions and thus can approximate any linear and non-linear relationships between a set of inputs and an output variable given enough data [16,2]. For more details about the derivations of the GRNN algorithm, readers are referred to Specht [16]. The only required input to simulate GRNN is a value of the smoothness factor, r, which is not known a priori and is commonly established with a trial-and-error process. When r is a small value, only a few nearby neighbors play a role; but when r is large, distant neighbors also affect the estimate at X yielding a smoother estimate. At the extreme, when r is zero, Y becomes dependent solely on the closest Xi value; and when r approaches 1, Y becomes the mean value of all Yi. Hence, the choice of the smoothing
588
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
Input Variables Xi
Mutual Information
I (Xi ; Y)
Equivalent Output Variable Y
1 RMSE
GRNN
Fig. 4. Schematic of the assumption of equivalence between mutual information and GRNN (e.g., I RMSE1).
parameter is important. Here, we adapt a trial-and-error procedure to select a r value with a balance between accuracy and smoothness, following the work of Bowden et al. [2]. In this paper, we use GRNN as a measure of the true relationship between a set of input variables and an output variable; and input variables with a lower RMSE are considered to possess a stronger relationship to the output variable. The assumption here is that the inverse of the RMSE is taken as the surrogate of the mutual information measure (see Fig. 4). This assumption was previously used by Strobl and Forte [18]. They employed ANN as a means to determine the most relevant factors in two drainage network derivation case studies, where ANN gauged the relationship strength between a set of factors and a particular variable. In this paper, we replace ANN with GRNN since it is more efficient and does not require the specification of the neural network structure. Tomandl and Schober [20] summarize some of the deficiencies of the traditional back-propagation neural networks which are overcome with the GRNN algorithm. They further established a modified version of GRNN for cases when datasets are not of equal length, but this is not a concern in this study.
Table 5 Input selection results of the five test problems using the MRMR algorithm; stopping criterion was based on a minimum percent improvement in RMSE value (e = 3%) and simulating GRNN with a constant smoothing factor (r) of 1; input variables in gray are selected.
6. Synthetic test problems To illustrate the competence of the MRMR in identifying the most relevant set of inputs, we apply it to the same test problems used by Sharma [15] and Bowden et al. [2] to evaluate the competence of the PMI algorithm as an input selection method. The test problems are with known dependence attributes and based on synthetic data. Sharma [15] used five autoregressive models; the first three are simple autoregressive models with varying orders of time and dependence while the last two are non-linear threshold autoregressive model. Bowden et al. [2] tested against three of Sharma’s test problems. Here we use all five test problems, namely:
points were generated and the first 20 points were discarded to reduce the effect of initialization. The first 15 lags of the data (e.g., xt1, xt2, . . ., xt15) were taken as potential candidate inputs. The MRMR algorithm was applied to each of the model and the results are shown in Table 5. The results in Table 5 are based on r = 1 and e = 3%. The input variables were correctly identified for all five models (Eqs. (16)–(20)). The results are insensitive to the value of r, but the number of inputs selected depends on the percent improvement threshold value of RMSE (e). By trial and error, a value of 3% is found appropriate for these test problems. Hence, one should test against several e values to decide on the appropriate number of input variables.
(1) AR1
xt ¼ 0:9 xt1 þ 0:866 et
ð16Þ
(2) AR4
xt ¼ 0:6 xt1 0:4 xt4 þ et
ð17Þ
(3) AR9
xt ¼ 0:3 xt1 0:6 xt4 0:5 xt9 þ et
ð18Þ
(4) TAR1 – threshold autoregressive order 1
xt ¼
0:9 xt3 þ 0:1 et
if xt3 6 0
0:4 xt3 þ 0:1 et
if xt3 > 0
ð19Þ
(5) TAR2 – threshold autoregressive order 2
xt ¼
0:5 xt6 þ 0:5 xt10 þ 0:1 et
if xt6 6 0
0:8 xt10 þ 0:1 et
if xt6 > 0
ð20Þ
In all test models, et is Gaussian random variant with a zero mean and unit standard deviation. For each of the models, 520 data
7. Case study Twenty-two reservoirs in California with daily release, storage, and inflow data are used in this study. The data span between January 1, 2004 and December 31, 2006 for all reservoirs. The reservoir selection process relies on the availability of observed release, storage, and inflow time series at the daily scale and continuity of the datasets. The output variable here is the current release value in a particular day (t). The input variables are a combination of past inflows, releases, and storages, and future inflows. Two categories of potential input variables are defined: a long set and a short set. The former is used to further illustrate the competence of the MRMR algorithm while the latter set is used to further investigate the nature of the a and b coefficients in the aMRMR and b-MRMR algorithms (Eqs. (10) and (11)). The two-input variable sets are selected with respect to their potential roles in reservoir operations following the work of Hejazi et al. [8]. The long set consists of 121 input variables to explain the uncertainty in current release. The variables are: past releases
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
(Rt1, Rt2, . . ., Rt30), storages (St1, St2, . . ., St30), and inflows (Qt1, Qt2, . . ., Qt30), plus current inflow (Qt), and future inflows (Qt+1, Qt+2, . . ., Qt+30). The short set consists of nine input variables: past 1-day and 2-day releases (Rt1 and Rt2), storages (St1 and St2), and inflows (Qt1 and Qt2), plus current inflow (Qt), and future day and 2-day inflows (Qt+1 and Qt+2). Note, inflow and release are defined as averaged values while storage is defined as the end of the period storage. Also future inflows are taken as inputs under the assumption of perfect forecast. To confirm the quality of MRMR and the improved quality due to the use of a-MRMR and b-MRMR algorithms, we apply all three algorithms to the long data set for all 22 reservoirs introduced above. The objective here is to investigate if there exists a strong correlation between RMSE and the approximators of mutual information (e.g., U, Ua, and Ub). For each reservoir, we first randomly select a prescribed number of input variables (e.g., 1, 2, . . ., 5) and then compute the corresponding RMSE and U, Ua, and Ub values. With 121 potential inputs, repeating the process for all possible combinations is unrealistic beyond the two inputs scenario. Thus, we run the process for 1000 combinations of the inputs for specified number of variables and for each reservoir. Based on each scenario of the 1000 simulations, Fig. 5 shows a strong negative correlation between RMSE and U, Ua, and Ub, respectively; data are shown for five reservoirs only but the rest exhibit similar trends. There clearly exists a strong inverse relationship between
589
RMSE and each of U, Ua, and Ub, making the approximate methods appropriate surrogates for the former. In other words, variables that yield high values of U, Ua or Ub have the tendency to induce low RMSE values. In the case of one input variable, U, Ua, and Ub all converge to the exact value of mutual information; thus, the correlation is strongest to RMSE. With more than one variable, U, Ua, and Ub are only approximates of the true dependence leading to lower correlations. When introducing a correction factor to the MRMR equation (e.g., a-MRMR, Eq. (10); b-MRMR, Eq. (11)), a gain is attained in the correlation strength between Ua (or Ub) and RMSE. Fig. 5 shows that a-MRMR and b-MRMR outperform MRMR by yielding stronger correlations to RMSE. Furthermore, although a-MRMR, and b-MRMR have different definitions of the correction factor, they produce the same gain in competence. Hence, whether we optimize the value of a or b, we attain the same improvement in correlation, e.g., Correl(RMSE, Ua) = Correl(RMSE, Ub). Further tests should be conducted with other cases in order to justify the advantage of the two formulations. To further illustrate the advantage of adopting the a-MRMR or b-MRMR, we show the results from an individual reservoir, Trinity Lake Reservoir. Fig. 6 shows the improvement in correlation with RMSE by narrowed spread of data. Selected inputs that exhibit high values of Ua and Ub are likely to yield low RMSE values. The milder slope of the relation between RMSE and Ub may reflect greater
Fig. 5. Comparison of the correlation measure between RMSE and U (MRMR), Ua (a-equivalent MRMR), and Ub (b-MRMR) averaged over all 22 reservoirs; data based on long set and 1000 simulations.
590
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
Fig. 6. Comparison of the relationship between RMSE and U (MRMR), Ua (a-MRMR), and Ub (b-MRMR); data are based on Trinity Lake Reservoir results and 1000 simulations.
robustness of Ub over Ua. A small discrepancy in the value of Ub would yield a smaller increase in RMSE than a small discrepancy in Ua. Thus, Ua and Ub reflect different levels of sensitivity to model accuracy (RMSE). Optimal values of a and b that yield the strongest correlation with RMSE for each of the 22 reservoirs with the long set are sum-
marized in Table 6A. The value of a increases and b decreases, with the increase of the number of input variables. However, since we only test for a randomly 1000 combinations out of a much larger pool of possible combinations, the optimal values of a and b may be influenced by the randomness of the selected combinations. To further confirm the sensitivity of a and b to the increase in
Table 6A Summary of optimal values of a that yield the strongest correlation with RMSE for each of the 22 reservoirs for each of the long and short sets. Reservoir number
Long set
Short set
Number of input variables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Number of input variables
1
2
3
4
5
6
1
2
3
4
5
6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.065 0.171 0.170 0.213 0.125 0.066 0.256 0.840 0.252 0.272 0.319 0.281 0.147 0.169 0.210 0.212 0.155 0.161 0.237 0.250 0.136 0.047
0.120 0.271 0.317 0.345 0.234 0.093 0.376 1.000 0.480 0.455 0.376 0.789 0.248 0.311 0.327 0.357 0.269 0.368 0.496 0.398 0.235 0.086
0.153 0.314 0.395 0.415 0.332 0.118 0.436 1.000 0.602 0.533 0.393 0.781 0.320 0.348 0.426 0.437 0.348 0.578 0.655 0.495 0.300 0.123
0.190 0.368 0.509 0.501 0.314 0.127 0.517 1.000 0.718 0.624 0.430 0.748 0.422 0.347 0.480 0.543 0.382 0.752 0.814 0.549 0.337 0.155
0.240 0.382 0.546 0.500 0.399 0.144 0.558 1.000 0.748 0.623 0.438 0.619 0.422 0.350 0.580 0.564 0.400 0.953 0.938 0.585 0.385 0.173
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.065 0.172 0.144 0.198 0.101 0.926 0.201 0.216 0.182 0.292 0.238 0.151 0.153 0.083 0.318 0.234 0.141 0.128 0.186 0.387 0.204 0.068
0.116 0.310 0.312 0.356 0.154 0.508 0.347 0.262 0.312 0.613 0.215 0.235 0.271 0.137 0.253 0.418 0.314 0.276 0.389 0.571 0.286 0.130
0.145 0.427 0.510 0.490 0.212 0.354 0.527 0.284 0.395 0.901 0.300 0.200 0.374 0.165 0.320 0.564 0.593 0.443 0.717 0.766 0.380 0.176
0.163 0.521 0.777 0.606 0.261 0.253 0.746 0.281 0.455 1.000 0.407 0.191 0.455 0.188 0.436 0.677 1.000 0.647 1.000 0.936 0.472 0.218
0.163 0.585 1.000 0.701 0.283 0.171 1.000 0.264 0.509 0.952 0.525 0.198 0.504 0.213 0.581 0.755 1.000 0.939 1.000 1.000 0.557 0.261
591
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 6B Summary of optimal values of b that yield the strongest correlation with RMSE for each of the 22 reservoirs for each of the long and short sets. Reservoir number
Long set
Short set
Number of input variables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Number of input variables
1
2
3
4
5
6
1
2
3
4
5
6
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0.065 0.171 0.170 0.213 0.125 0.066 0.256 0.840 0.252 0.272 0.319 0.281 0.147 0.169 0.210 0.212 0.155 0.161 0.237 0.250 0.136 0.047
0.060 0.136 0.159 0.172 0.117 0.047 0.188 1.000 0.240 0.228 0.188 0.394 0.124 0.156 0.164 0.178 0.134 0.184 0.248 0.199 0.117 0.043
0.051 0.105 0.132 0.138 0.111 0.039 0.145 1.000 0.201 0.178 0.131 0.260 0.107 0.116 0.142 0.146 0.116 0.193 0.218 0.165 0.100 0.041
0.048 0.092 0.127 0.125 0.078 0.032 0.129 0.495 0.180 0.156 0.107 0.187 0.106 0.087 0.120 0.136 0.095 0.188 0.204 0.137 0.084 0.039
0.048 0.076 0.109 0.100 0.080 0.029 0.112 0.480 0.150 0.125 0.088 0.124 0.084 0.070 0.116 0.113 0.080 0.191 0.188 0.117 0.077 0.035
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0.065 0.172 0.144 0.198 0.101 0.926 0.201 0.216 0.182 0.292 0.238 0.151 0.153 0.083 0.318 0.234 0.141 0.128 0.186 0.387 0.204 0.068
0.058 0.155 0.156 0.178 0.077 0.254 0.174 0.131 0.156 0.307 0.107 0.118 0.136 0.069 0.126 0.209 0.157 0.138 0.194 0.286 0.143 0.065
0.048 0.142 0.170 0.163 0.071 0.118 0.176 0.095 0.132 0.300 0.100 0.067 0.125 0.055 0.107 0.188 0.198 0.148 0.239 0.255 0.127 0.059
0.041 0.130 0.194 0.151 0.065 0.063 0.187 0.070 0.114 0.251 0.102 0.048 0.114 0.047 0.109 0.169 0.279 0.162 0.336 0.234 0.118 0.055
0.033 0.117 0.244 0.140 0.057 0.034 0.206 0.053 0.102 0.190 0.105 0.040 0.101 0.043 0.116 0.151 0.497 0.188 0.615 0.212 0.111 0.052
the number of input variables, we determine the values of a and b based on the previously described short set of input variables (Table 6B). In the case of the short set, enumerating all possible combinations is feasible, although this is impossible with the long list of 121 input variables. Note, a and b behave differently when the number of variables increases; the value of a and b generally increases and decreases, respectively, with the increase of the number of variables for each individual reservoir (Tables 6A and 6B). When averaged over all 22 reservoirs, the mean optimal values of a and b that yield the strongest correlation with RMSE increase and decrease, respectively, with the increase of number of variables (Fig. 7). Recall, a is a measure of the relative importance of redundancy term in the MRMR equation (Eq. (10)) while b is a measure of the relative redundancy as a fraction of the total redundancy term (Eq. (11)). Hence, when dealing with the single variable case, a approaches zero and b approaches 1. As the number of variables increases, the importance of the redundancy term in aMRMR increases as depicted by higher a values. Similarly, as the number of variables increases, the fraction of relative redundancy to total redundancy decreases as depicted by the lower b values. This is clearly shown in Fig. 8. A fitting curve is imposed on top of each of the long and short sets as shown in Fig. 8. Note, although both a and b follow a power-law form, they exhibit opposite trends with the increase of input variables. Table 7 summarizes the fitting parameters associated with each of the two sets and each of the two correction factors, a and b. Hence, each of the two fitting models of a and b requires a single fitting parameter, a (see Table 7). To further illustrate the capability of MRMR and its variants as input selection algorithms, we apply the algorithms to a single reservoir (Trinity Lake Reservoir) and utilize the fitting models in Table 7. We assume that the stopping criterion is based on a minimum improvement in RMSE of 1% (e.g., e = 1%), and we also assume the systems model is a regular back-propagation ANN with 50% of the data used in training and the remaining 50% in testing to avoid overfitting. Starting with the 121 potential inputs coupled with a single hidden node and a single hidden layer ANN model, we employ five scenarios as described in Table 8 to evaluate the performance of the input selection algorithms in identifying input variables.
Fig. 7. Mean values of a and b averaged over all 22 reservoirs; long set: 121 potential input variables and 1000 simulations, short set: nine mix variables; with the increase of input variables a increases while b decreases.
Fig. 9 compares the ANN model performance under each of the five scenarios. The baseline scenario is used as a reference point with a fixed number of inputs a prior. The MRMR and a-MRMR scenarios identify fewer input variables but induce a worse RMSE than the baseline scenario. The b-MRMR and the ‘optimal’ scenarios (Scenarios 4 and 5, respectively) find a combination of four and three input variables which yield a lower RMSE value than the baseline scenario. The b-MRMR identifies a combination of inputs that yield a better model performance with fewer inputs than the baseline scenario. This is important, because it supports the claim that input selection algorithms can reduce unnecessary model complexity and improve competence. Table 9 lists the set of inputs identified by each of the five scenarios. Note, MRMR, a-MRMR, and b-MRMR share the same three input variables (Rt1, Rt2, Rt3)
592
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593
Fig. 8. Fitted models to the a and b values attained from the long and short sets.
Table 7 Summary of model fitting parameters and statistics for the long and short set cases. Algorithm
Model
a-MRMR
a¼1
b-MRMR
b ¼ xai
xai
Dataset
a
R2
Long Short Long Short
0.4074 0.4251 1.5742 1.6627
0.9956 0.9577 0.9639 0.9638
x is the number of input variables considered at iteration i.
and only differ on one variable (Table 9). These three scenarios are approximation algorithms of mutual information; hence, they would not necessarily end up with identical solutions. 8. Conclusions Input selection can be a challenging step in modeling and correctly identifying the most relevant inputs to describe a water resources system. It can lead to improved model accuracy without inducing unnecessary model complexity. In this paper, we propose an algorithm of approximation to mutual information to pinpoint the set of inputs that contains the greatest amount of information about the uncertainty of a system. Specifically, we employ the MRMR algorithm and further introduce two variants of MRMR (e.g., a-MRMR or b-MRMR) as IVS algorithms. a-MRMR and b-MRMR both outperform the MRMR algorithm in yielding stronger correlation to RMSE from simulating a GRNN model. a increases while b decreases approximately in a powerlaw form when the number of input variables increases. When dealing with a single input variable, both a-MRMR and b-MRMR become equivalent to mutual information (maximum dependence)
Fig. 9. Comparison among the various input selection algorithms (MRMR, a-MRMR, and b-MRMR) in identifying the most important set of inputs for an ANN model for the daily release for Trinity Lake Reservoir; inputs are selected from the long set of inputs (121); the values of a and b are determined from Table 4; baseline scenario is based on a pre-selected set of five inputs variables; the ‘optimal’ scenario is based on simulating all possible combinations incrementally to find the global best of set of inputs; here, e = 1%.
between two random variables, and a becomes zero while b becomes one. The optimal a and b values have been tested for 22 reservoirs and two sets of inputs (long and short) and further research is needed to identify any other factors that may influence the shape of changing rate of a and b with the increase in the number of input variables. Thus, further research may be directed to other data sets
Table 8 Summary of scenarios in selecting the most relevant inputs to predict the daily release of the Trinity Lake Reservoir. Scenario number
Name
Description
1
Baseline
2 3 4 5
MRMR a-MRMR b-MRMR ‘Optimal’
Selecting five inputs (one with highest mutual information from each of the five categories of input variables, e.g., past release, past storage, past, current, and future inflow) Inputs selected incrementally with the MRMR algorithm and e = 1% Inputs selected incrementally with the a-MRMR algorithm and e = 1% Inputs selected incrementally with the b-MRMR algorithm and e = 1% Inputs selected by testing all combinations of inputs with ANN to find the global optimal set of inputs; stopping with e = 1%
M.I. Hejazi, X. Cai / Advances in Water Resources 32 (2009) 582–593 Table 9 Summary of the selected input variables to predict the daily release of the Trinity Lake Reservoir. Scenario number
Name
Number of selected input variables
Selected input variables
1 2 3 4 5
Baseline MRMR a-MRMR b-MRMR ‘Optimal’
5 4 3 4 2
Qt, Rt1, St24, Qt24, Qt+30 Rt1, Qt1, Rt3, Rt2 Rt1, Rt3, Rt2 Rt1, St24, Rt3, Rt2 Rt1, St30, Rt3
to justify if the detected functional forms of the two correction factors are universal to other data sets. Also, since the fitted models for a and b just require a single fitting parameter (a), one may simply re-perform the modeling exercise with various values of a, until satisfactory model accuracy is achieved. The stopping criterion adopted in this study is based on whether a small e threshold in RMSE improvement is met. The RMSE value for model predictions is obtained using a GRNN model. The GRNN, however, can be replaced by any other models. Although the percent improvement threshold stopping criterion suffices in this study, additional research on constructing a more objective stopping criterion may be beneficial. In this study, an important assumption is that GRNN is qualitatively identical to mutual information between a set of variables and another variable (Fig. 4). GRNN may pose some limitations as it requires the selection of a smoothness factor and the availability of sufficient data to accurately represent the ‘true’ relationship among variables. To avoid the necessity of this assumption, further research may be directed at using synthetic data with known mutual information. The two proposed IVS algorithms (a-MRMR and b-MRMR) may be employed in any modeling exercises. Their premise lies in their ability to identify the most relevant input variables to a modeling system at a very efficient rate. In doing so, modelers can construct models to be as complex as necessary leaving out any unnecessary inputs. Acknowledgements The first author wishes to extend his thanks to Dr. Momcilo Markus for his insightful discussions at the early stage of this work. The authors are very grateful to Benjamin Ruddell for his constructive comments and proofreading the manuscript. The authors wish also to thank Dr. Hayri Önal, Yonas Demissie, and Jiing-Yun You for reviewing an earlier version of the manuscript. Partial financial support for this research was provided by US National Science Foundation (NSF) grant CBET 0747276.
593
References [1] Ataie-Ashtiani B, Hassanizadeh SM, Celia MA. Effects of heterogeneities on capillary pressure–saturation–relative permeability relationships. J Contam Hydrol 2002;56:175–92. [2] Bowden GV, Dandy GC, Maier HR. Input determination for neural network models in water resources applications. Part 1 – Background and methodology. J Hydrol 2005;301:75–92. [3] Carlson TN, Arthur ST. The impact of land use–land cover changes due to urbanization on surface microclimate and hydrology: a satellite perspective. Glob Planet Change 2000;25:49–65. [4] Cover TM, Thomas JA. Elements of information theory. New York, USA: John Wiley and Sons Inc.; 1991. [5] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the second IEEE computational systems bioinformatics conference; 2003. p. 523–8. [6] Francios D, Rossi F, Wertz V, Verleysen M. Resampling methods for parameterfree and robust feature selection with mutual information. Neurocomputing 2007;70:1276–88. [7] Harnold TI, Sharma A, Sheather S. Selection of kernel bandwidth for measuring dependence in hydrologic time series using the mutual information criteria. Stoch Environ Res Risk Assess 2001;15:310–24. [8] Hejazi MI, Cai X, Ruddell BL. The role of hydrologic information in reservoir operation – learning from historical releases. Adv Water Resour 2008;31: 1636–50. [9] Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and minimum redundancy. IEEE Trans Pattern Anal Mach Intell 2005;27(8):1226–38. [10] Poveda G, Mesa OJ. Feedbacks between hydrological processes in tropical south america and large-scale ocean–atmospheric phenomena. J Climate 1997;10:2690–702. [11] Rossi F, Lendasse A, Francios D, Wertz V, Verleysen M. Mutual information for the selection of relevant variables in spectrometric nonlinear modeling. Chemomet Intell Lab Syst 2006;80:215–26. [12] Ruddell BL, Kumar P. Ecohydrologic process networks. Part 1 – Identification. Water Resour Res. doi:10.1029/2008WR007279. [13] Sankarasubramanian A, Vogel RM. Hydroclimatology of the continental United States. Geophys Res Lett 2003;30(7):1363. [14] Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948;27:379–423, 623–56. [15] Sharma A. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management. Part 1 – A strategy for system predictor identification. J Hydrol 2000;239:232–9. [16] Specht DF. A general regression neural network. IEEE Trans Neural Network 1991;2(6):568–76. [17] Stokelj T, Paravan D, Golob R. Enhanced artificial neural network inflow forecasting algorithm for run-of-river hydropower plants. J Water Res Plan Manage 2002;128(6):415–23. [18] Strobl RO, Forte F. Artificial neural network exploration of the influential factors in drainage network derivation. Hydrol Process 2007;21:2965–78. [19] Tana CO, Bekliogluc M. Modeling complex nonlinear responses of shallow lakes to fish and hydrology using artificial neural networks. Ecol Model 2006;196:183–94. [20] Tomandl D, Schober A. A modified general regression neural network (MGRNN) with new efficient training algorithms as a robust ‘black box’ – tool for data analysis. Neural Network 2001;14:1023–34. [21] Venn John. On the diagrammatic and mechanical representation of propositions and reasonings. Philos Mag J Sci 1880;9(59). [22] Wallender WW, Grismer ME. Irrigation hydrology: crossing scales. J Irrig Drain Eng 2002;128(4):203–11. [23] Wood EF. Effects of soil moisture aggregation on surface evaporative fluxes. J Hydrol 1997;190:397–412.