Predicting gastrointestinal infection morbidity based on environmental pollutants: Deep learning versus traditional models

Predicting gastrointestinal infection morbidity based on environmental pollutants: Deep learning versus traditional models

Ecological Indicators 82 (2017) 76–81 Contents lists available at ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/locate/ecol...

498KB Sizes 4 Downloads 41 Views

Ecological Indicators 82 (2017) 76–81

Contents lists available at ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

Predicting gastrointestinal infection morbidity based on environmental pollutants: Deep learning versus traditional models Qin Songa, Mei-Rong Zhaoa, Xiao-Han Zhoub, Yu Xuec,d, Yu-Jun Zhengb,

MARK



a

College of Environmental Engineering, Zhejiang University of Technology, Hangzhou 310014, China College of Computer Science & Technology, Zhejiang University of Technology, Hangzhou 310023, China c School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China d School of Engineering and Computer Science, Victoria University of Wellington, Wellington 6140, New Zealand b

A R T I C L E I N F O

A B S T R A C T

Keywords: Environmental pollutants Gastrointestinal infections Deep neural network (DNN) Prediction

Accurate morbidity prediction can contribute greatly to the efficiency of medical services. Gastrointestinal infectious diseases are largely influenced by environmental pollutants, but predicting their morbidity based on pollution indicators is quite difficult because of the complex relationship between the pollutants and the infections. This study presents a deep neural network (DNN) model for estimating the morbidity of gastrointestinal infections based on 129 types of pollutants contained in soil and water. The DNN uses a deep Boltzmann machine (DBM) to model the unknown probabilistic relationship between the pollutants, and employs a Gaussian mixture model (GMM) to output the estimated morbidity. We also propose an evolutionary algorithm for efficiently training the DNN. Experiment on a data set from four counties in central China shows that the proposed model can estimate the morbidity much more accurately than traditional neural network and linear regression models.

1. Introduction In health management practices, if we can accurately estimate the morbidity of some diseases, then we can utilize available medical resources much more effectively, and thus increase cure rate, decrease mortality, and reduce cost (Hu and Root, 2005). However, an accurate morbidity prediction can be very difficult not only because of the huge number of influence indicators and the challenges in measuring and monitoring the indicators, but also because of the unknown probabilistic relationship between the indicators and the diseases (Song et al., 2017). Consequently, most of traditional prediction models, such as decision trees and linear regression models, are incapable of tackling with such complex prediction problems. In recent decades, artificial neural networks (ANNs) have been shown to be powerful tools for predictive modeling, especially when the underlying data relationships are unknown (Lek and Guégan, 1999). Inspired by biological neural networks, an ANN organizes a set of artificial neurons, where each neuron of the input layer receives the value of an influence indicator, and each other neuron computes a function of the weighted sum of all its inputs. ANNs are trained by adjusting the weights of the neurons to minimize the errors between the actual output values and the target output values of given input data sets. After training, they can be used to predict the output of new



Corresponding author. E-mail address: [email protected] (Y.-J. Zheng).

http://dx.doi.org/10.1016/j.ecolind.2017.06.037 Received 14 April 2017; Received in revised form 5 June 2017; Accepted 19 June 2017 1470-160X/ © 2017 Elsevier Ltd. All rights reserved.

independent input data. ANNs have been widely used in the fields of ecological and environmental modeling (Tan et al., 2012; Zhang et al., 2013; Dai et al., 2014; Chang et al., 2015; Lee et al., 2016; Granata et al., 2017; Mukherjee et al., 2017; Świetlicka et al., 2017). In particular, a number of studies have been devoted to the use of ANN to predict the morbidity of respiratory diseases. Bibi et al. (2002) use an ANN to model the effect of atmospheric changes, including pollutants, on emergency department visits for respiratory symptoms. Wang et al. (2008) use an ANN to predict the death rate of respiratory diseases from the main air pollutants from 2005 to 2008 in Beijing. The applications of ANNs for evaluating the effect of air pollution on respiratory diseases have also been reported by Moseholm et al. (1993) and Junk et al. (2009). But the number of input indicators of the above ANN models are relatively small, typically ranging from 5 to 10. In this paper we are interested in predicting the morbidity of gastrointestinal infectious diseases including diarrhea, dysentery, acute gastroenteritis, typhoid fever and paratyphoid fever, and food poisoning. Soil and water pollutants are known as major causes of such diseases (Payment et al., 1991; Mazumder et al., 1992; Reid et al., 2000; Türkdoˇgan et al., 2003; Ashbolt, 2004; Boyce et al., 2007; Chernih and Solodoukhina, 2008). From soil and water, pollutants can be transported through farm products and aquatic products, domestic water, livestock, food and drink, etc., to human digestive tract. However, the

Ecological Indicators 82 (2017) 76–81

Q. Song et al.

habits, preventive health services, and food hygiene standard of the investigated region, which also have influence on the infection morbidity. However, such factors are difficult to quantify. Thus our model is used for morbidity prediction in a given region, and the trained model instance can be applied to other regions with similar dietary habits and level of health services. But given a new region that does not meet the criteria, we need to construct and train a new model instance. Nevertheless, the model architecture is the same. The problem output is the morbidity of gastrointestinal infections of the investigated region calculated as:

number of potential pollutants is very large, and little is known about the exact causal links between pollutants and infections (Briggs, 2003). For such high-dimensional and highly complex prediction problems, classical ANNs with a single hidden layer often suffer from overfitting, poor generalization, and premature convergence (Muttil and Chau, 2007; Gabrys, 2002; Deng et al., 2015; Zheng et al., 2015a; Gu et al., 2016). Recently, deep neural networks (DNNs) have been proposed for overcoming the shortages of traditional ANNs by training multiple hidden layers to discover hierarchical abstractions and thus model complex probabilistic distributions over a huge number of potential influence factors (Hinton and Salakhutdinov, 2006; Hinton et al., 2006; Salakhutdinov et al., 2013). Since the seminal work of Hinton and Salakhutdinov (2006), DNNs have achieved great success in many applications such as speech recognition and image classification (Hinton et al., 2012; Zeng et al., 2016; Zheng et al., 2017). However, the research on DNNs in ecological and environmental modeling is still very limited. Song et al. (2017) study the effect of food contamination on gastrointestinal diseases, but the influence paths from food contaminators is much simpler than that from environmental pollutants. This paper proposes a DNN model for predicting the morbidity of gastrointestinal infections based on 129 types of pollutants contained in soil and water, including inorganic pollutants, organic pollutants, and pathogenic organisms. It uses a deep Boltzmann machine (DBM) to model the complex probabilistic distributions over the concentrations of different pollutants and the morbidity, and employs a Gaussian mixture model (GMM) to output the estimated morbidity. To improve the DNN's feature learning ability, we also propose an evolutionary training algorithm. Experiment on a data set from four counties in central China shows that the proposed model can estimate the morbidity much more accurately than traditional shallow ANN network and linear regression models.

ˆz =

Nc × 100% N

(1)

where Nc is the number of gastrointestinal infectious cases of the inhabitants, and N is the total number of the inhabitants of the region. Mobile population is not taken into account, because these people have much more chances to go out the county and to be infected outside. We collect data from four counties in central China, including three counties in Jiangxi province and a county in Hunan Province, all with similar dietary habits and health services. For each county, we set 12 sampling sites, 6 in farmland and 6 in natural water. At each site, we collect samples three times a month from May 2015 to October 2016, measure the concentration of each pollutant, and calculate the monthly mean and standard deviation of the concentration. The number of infection cases, including diarrhea, dysentery, typhoid fever and paratyphoid fever, and food poisoning, are from the departments of gastroenterology of the hospitals at the levels of primary-grade and above in each county from August 2015 to January 2017. That is, the output of each data tuple is the monthly morbidity with a time lag empirically set to three months after pollutants sampling, which takes the growing periods of crops and livestock into consideration. 2.2. The DNN model

2. Materials and methods Our DNN is based on the restricted Boltzmann machine (RBM) (Smolensky, 1986) shown in Fig. 1, an energy-based probabilistic model that defines a joint probability distribution over v ∈ {0, 1}D and h ∈ {0, 1}P as

2.1. Materials The aim of the study is to propose a model for predicting the morbidity of gastrointestinal infectious diseases in a given region. The predictive indicators include:

P (v , h, θ) =

• The density and the average age of the population of the region. • The concentrations of 129 types of pollutants, including 16 in-

1 exp(−E (v , h, θ)) Z (θ)

(2)

where θ = [b, c, w] is the parameter vector representing visible-tohidden and hidden-to-hidden interaction terms, E(v, h, θ) is the energy function defined as

organic pollutants, 88 organic pollutants, and 25 pathogenic organisms, as described in Table 1. Note that the 68 environmental endocrine disruptors (Keith, 1997) are all used as separate factors because they often have significant effects on endocrine functions, although some of them belong to other organic pollutants (e.g., benzopyrene is a special type of PAHs).

E (v , h, θ) = −vTbv − hTch − vTwh

(3)

and Z(θ) is the partition function defined as

Z (θ) =

∑ ∑ exp(−E (v, h, θ)) v

For each pollutant, we respectively measure its concentrations in soil and water, and use the observed mean and standard deviation of the concentrations as model inputs. Thus the input dimension of the problem is (2 + 129 × 2 ×2) = 518. It should be noted that, there are many other factors, such as dietary

h

(4)

A deep Boltzmann machine (DBM) is an extension of the RBM that has multiple layers of hidden units arranged in layers, where each layer captures complicated, higher-order correlations between the activities of hidden features in the layer below (Salakhutdinov and Hinton, 2009). Considering a two-layer DBM shown in Figs. 2, the energy

Table 1 A summary of the pollutants used for prediction. Category

Pollutants

Inorganic Organic

Pb, Cd, Hg, Cu, Ni, As, Be, Bi, Sb, Tl, Cr, Mo, Ni, Zn, F, V, cyanide, nitrate, nitrite, sulfate, carbonate BTEX, PAHs, HCFCs, PCBs, Azo, QACs, TPH, TCE, OCP, alcohol, ether, phenols, phthalate, chloride, dioxin, organophosphorus pesticide, organochlorine pesticide, chlorinated herbicides, chlorinated solvents, nitrogenous ingredients, endocrine disruptors (68 chemicals) (Keith, 1997) Salmonella, Shigella, dysentery bacillus, typhoid bacillus, plague bacillus, tubercle bacillus, diphtheria bacillus, Francisella tularensis, Brucella, Vibrio cholerae, Vibrio parahaemolyticus, Vibrio mimicus, Vibrio fluvialis, Clostridium tetani, Clostridium botulinum, Clostridium perfringens, Staphylococcus aureus, Bacillus anthraci, Escherichia coli, Yersinia, Helicobacter pylori, Campylobacter jejuni, Aeromonas hydrophila, roundworm eggs, hookworm eggs

Pathogenic organism

77

Ecological Indicators 82 (2017) 76–81

Q. Song et al.

The DBM is used for learning influence features, and on the top of the DBM we add a Gaussian mixture model (GMM) (Gauvain and Lee, 1994; Cardinaux et al., 2003) to calculate the output morbidity z from the (implicit) feature vector y of the topmost layer of the PFDBM:

z=

1 P

P



NG



∑ log ⎜∑ wj N (yi ; μj , Σj) ⎟ i=1

⎝ j=1



(11)

where P is the dimension of vector y, N (yi ; μj , Σj) is a high-dimensional Gaussian function with mean μj and diagonal covariance matrix Σj, NG is the number of Gaussians, and wj is the weight for Gaussian j subject to N ∑ j =G1 wj = 1. As mentioned above, the input dimension of the DNN is 518. Empirically, we use a DBM with two hidden layers in the DNN, and set the number of neurons of the hidden layers to 64 and 16, respectively.

Fig. 1. Restricted Boltzmann machine (RBM).

2.3. The evolutionary learning method Classical greedy layer-wise method for training the DBM (Bengio et al., 2007) is easy to be trapped by local optima. To tackle this issue, we propose an evolutionary learning algorithm. The algorithm first initializes a population of solutions (each of which represents a DNN instance with different weights setting), and then evolves the solution by continually migrating components from probably high-quality solutions to low-quality ones based on biogeography-based optimization meta-heuristic (Simon, 2008). At each generation, each component of a solution w has a probability inversely proportional to f(w) (the fitness of w) of being migrated and, if so, the component is modified by the migration operation (Zheng et al., 2014c) as follows: Fig. 2. Deep Boltzmann machine (DBM).

wij = function of a state {v, h1, h2} is defined as

−vTw1h1

E (v , h1, h2, θ) =



h1T w2 h2

1 Z (θ)

∑ ∑ exp(−E (v, h1, h2, θ)) h1

(6)

h2

By extension, for a DBM with a single visible layer and L hidden layers parameterized by weights wl between the lth layer and the (l + 1)th layer (1 ≤ l ≤ L), the energy function of a state {v, h1, …, hL} is defined as

wij = wij + β·rand( −1, 1)

∑ hlT−1wl hl

(7)

l=2

kN = 6 +

where θ = [w1, …, wL]. The corresponding probability is

1 P (v , θ) = Z (θ)

∑ … ∑ exp(−E (v, h1, h1

…, hL, θ))

The objective of DBM training is to maximize the likelihood as follows (where D is the training dataset):

max θ L (θ, D ) =



(9)

The basic RBM and DBM learn distributions over binary vectors. For real-valued inputs, we transform them into binary ones by using a Gaussian–Bernoulli RBM (GRBM) that replaces binary-valued visible units with Gaussian ones (Hinton and Salakhutdinov, 2006). The energy function of GRBM is defined as D

E (v , h, θ) =

∑ i=1

(vi − bi )2 − 2σi2

D

P

∑ ∑ wij hj i=1 j=1

vi − σi2

(15)

Algorithm 1. The evolutionary learning algorithm for the DNN. 1 2

P

∑ cj hj j=1

t 0.009 tmax

(14)

where tmax is the maximum number of iterations of the algorithm. The linear increase of kN and decrease of β enable local search to be more extensive in later iterations of the algorithm (Zheng, 2015; Xue et al., 2017). Algorithm 1 presents the evolutionary learning algorithm for the DNN (where fmax and fmin are the maximum and minimum fitness values in the population, respectively).

log(P (v , θ))

v∈D

t 18 tmax

β = 0.01 −

(8)

hL

(13)

where rand(−1, 1) generates a random number uniformly distributed in [−1, 1]. The best neighboring solution, if better than w, will replace w in the population. Here the control parameters kN and β are adjusted with iteration number t as follows:

L

E (v , h1, …, hL, θ) = −vTw1h1 −

(12)

where w′ is a solution connected with w in the population and w″ are a solution not connected with w (Zheng et al., 2014b), both of which are selected with probabilities proportional to their fitness. After migration, if the modified solution w is better than the original one, the algorithm performs a local search around w by generating kN neighboring solutions, each of which is obtained by setting a randomly selected component wij as:

(5)

where θ = [w1, w2] is the parameter vector. The probability that the DBM model assigns to the visible vector v is

P (v , θ) =

⎧ wi′, j + α (wi″, j − wij), f (w ′) ≥ f (w ″) ⎨ wi″, j + α (wi′, j − wij), f (w ′) < f (w ″) ⎩

(10) 3 4

where σi is the standard deviation associated with Gaussian visible neuron vi (1 ≤ i ≤ D). 78

Randomly generate a population of n solutions; Randomly set a topology by connecting each solution to three other ones; For each solution w in the population do For each component wij do

Ecological Indicators 82 (2017) 76–81

Q. Song et al.

5 If rand(0, 1) < (fmax − f(w))(fmax − fmin) then 6 Modify wij according to Eq. (12); 7 End If. 8 End For. 9 If w is better than the original one then 10 Perform a local search around w; 11 End If. 12 End For. 13 Return the best solution found so far. 2.4. Comparative models and metrics For comparison, we also implement the following four models for the morbidity prediction problem.

• A multiple linear regression (MLR) (Aiken et al., 2003) model re-

Fig. 3. The observed and predicted morbidities by the five models on the test data set.

presented by

R2 = 1 −

n

z = a0 +

∑ aj xj

where z is the output and aj are the regression coefficients.

Fig. 3 compares the observed morbidity and predicted morbidities of MLR, ANN, DNN, DNN-GA, and DNN-BBO, and Fig. 4 presents the error rates of the five models. As we can see from the results, the predicted morbidities of MLR are higher than the observed morbidities on most cases, while for ANN and DNN the number of overestimation cases and the number of underestimation cases are relatively equal. Among the five models, the prediction results of DNN fit the samples best. From Fig. 4, we can see that the average error rate MLR is more than 40%, which has little use in medical management decision making; ANN achieves an error rate of 37.8% that is not significantly lower than MLR, though its complexity is much higher than MLR, which demonstrates the incapability of ANN to model such a high-dimension and complex prediction problem. The error rates three DNN models are less than 17%, which can be very useful in improving the effectiveness of medical resources preparation, allocation and utilization. Among the three DNN models, DNN-BBO achieves the lowest error rate of 13.3%, which demonstrates the advantage of our evolutionary learning algorithm over the gradient-based algorithm and GA. Table 2 gives success rates, R2, average training time per tuple of the training set, and the average prediction time per tuple of the test set of the five models (on a computer with Intel i7-6500M processor and 8 GB DDR3 memory). As we can see, the success rates of MLR and ANN are around 16%, which can be misleading for medical decision making in most cases. The basic DNN has a success rate more than 58%, and the two DNN models trained by evolutionary algorithms further improve the rate significantly. DNN-BBO achieves the highest success rate of 68%, i.e., on about one thirds of the test tuples the error rate of DNN is

algorithm. The number of neurons of the hidden layer is empirically set as 50. The output of each hidden neuron is calculated using a sigmoid active function:

1 1 + e−ui

(17)

• A DNN model using the same architecture as our approach but

trained by the classical greedy layer-wise method (Bengio et al., 2007). A DNN model using the same architecture as our approach but trained by the standard genetic algorithm (David and Greental, 2014).

The above four models are denoted by MLR, ANN, DNN, and DNNGA, respectively. Our approach is denoted by DNN-BBO. The MLR and ANN training programs are taken from the library of MATLAB 2016a, and the greedy and evolutionary training programs are also implemented in MATLAB (the code is available at http://www.compintell. cn/en). The data set consists of 72 tuples (four counties and 18 months). We use a 6-fold cross-validation to evaluate the three models, that is, the data set is partitioned into 6 equal sized pieces (12 tuples per piece) and the validation is run for 6 times, and at each time five pieces are used as the training set while the remaining piece is used as the test set. For each test tuple i in the test set D , let zi be the observed morbidity and z i be the morbidity predicted by a model, the error rate of the model on the tuple is:

ˆ

Error Rate =

(20)

3. Results

• A three-layer, feed-forward ANN trained by the back-propagation



∑i ∈ D (z i − z i )2

(16)

j=1

f (ui ) =

ˆ

∑i ∈ D (z i − z i )2

ˆ

|z i − z i | × 100% zi

(18)

We consider a prediction with the error rate less than 15% as a “success case”, because it can provide great help in improving medical readiness and thus reducing the rate of exacerbations and mortality (Reynolds, 2001). Let Ns be the number of success cases of a model on the test set D , the success rate of the model is calculated as:

Success Rate =

Ns × 100% |D |

(19) 2

Let z i be the mean of all zi, the R-square (R ) the measuring the model's goodness of fit of is calculated as:

Fig. 4. The prediction error rates of the five models on the test data set.

79

Ecological Indicators 82 (2017) 76–81

Q. Song et al.

extracting meaningful features from the noisy inputs, and thus achieves high accuracy and consistency on the problem. Moreover, the set of indicators selected for morbidity prediction in this study is never complete. There are many other indicators that may have strong or weak relationships with gastrointestinal infections but are not take into consideration due to the difficulty of data collection and processing. Nevertheless, the deep learning model provide us a powerful tool for establishing prediction models with incomplete information, which is another advantage over the traditional MLR and shallow ANN models.

Table 2 The success rates, the average training time and the prediction time (in s) per data tuple of the five models.

Success rate R2 Training time Prediction time

MLR

ANN

DNN

DNN-GA

DNN-BBO

15.28% 0.3086 0.033 0.018

16.67% 0.3098 0.067 0.031

58.33% 0.7261 0.199 0.037

61.11% 0.8086 0.793 0.037

68.06% 0.8017 0.822 0.037

less than 15%, which can be very useful in improving medical readiness and controlling infections. Moreover, R2 values show that only about 30% of the morbidity variation can be explained by MLR and ANN, while the percentage increases to more than 70% for the basic DNN and more than 80% for the two DNNs with evolutionary learning, which indicates that the consistency between the observed morbidities and predicted morbidities of the DNNs is much higher than the two traditional models. The training time of ANN is about twice of that of MLR, the training time of the basic DNN is about three times of the ANN, and the training time of DNN models with evolutionary learning is about 3–4 times of the basic DNN. Nevertheless, the differences among the prediction times of the models are not significant. Generally speaking, the computational cost of DNN is acceptable in most cases (e.g., the time for training on a set of 10,000 samples is about 30 min for the basic DNN and about 2 h for DNN with evolutionary learning), and its high accuracy is worth paying. Once the model has been trained, the time for using it to predict the morbidity of a future period is trivial.

5. Conclusion The paper proposes a DNN model for predicting the morbidity of gastrointestinal infections based on 129 environmental pollutants, and proposes a new evolutionary algorithm for training the model. By using a central DBM to model the complex probabilistic relationship between the indicators, the model exhibits much higher prediction accuracy than the widely used MLR and ANN models, and the evolutionary learning algorithm combing migration and local search operations shows higher search ability than the gradient-based training algorithm. The effectiveness of the proposed DNN model and its learning algorithm has been demonstrated on the data set from four counties in China. Other than gastrointestinal diseases, the collected data samples of soil pollutants can also be used for predicting morbidities of many other diseases, and now we are applying and extending the DBM for modeling the effects of pollutants on hepatopathies and nephropathies. In future work, we will collect more samples and determine more pollutants, as well as perform more detailed sensitivity analysis to filter the pollutants that have strong relationship with the diseases.

4. Discussion Acknowledgments The high error rate of MLR indicates that the relationship between the morbidity and the concentrations of pollutants is highly nonlinear. The ANN model, which has shown good ability in approximating many nonlinear functions, also has high error rate and low success rate on this prediction problem, because the single hidden layer and the back-propagation algorithm easily lead to overfitting and poor generalization in training the high-dimensional data. In comparison, the proposed DNN with both unsupervised pretraining and supervised learning is capable of modeling the complex probabilistic relationship much more accurately, even though the number of training tuple is not large. Moreover, in comparison with the gradient-based training algorithm, the evolutionary learning algorithms can effectively suppress premature convergence because they maintain a population of candidate of solutions that simultaneously explore multiple areas of the search space (Zheng et al., 2014a, 2015b). In particular, the selection and crossover operations of GA can efficiently evolve solutions to the promising areas of the search space, and the mutation operation can effectively enhance the solution diversity and thus help the algorithm to jump out of local optima. However, the exploitation ability of GA is relatively low, that is why we propose the hybrid BBO training algorithm that combines the original migration operation (similar to GA crossover) with the local search operation, and thus balance the exploration and exploitation much better than GA. Inevitably, the data set used in this study contains much noise. For example, because the counties are adjacent, some patients may visit the hospitals in the county other than where they are infected, some infectious cases may be caused by bad living habits other than the pollutants, and some cases may be caused by the pollutants contained in soil and water less or more than three months before the infections. In this high-dimensional inference problem (in particular where the dimension is larger than the sample size), noisy data can significantly degrade the performance of MLR and ANN (Basalyga and Salinas, 2006; Raskutti et al., 2011). However, the energy-based probabilistic learning mechanism can provide good robustness to the DBM model by

This work was supported by National Natural Science Foundation (Grant Nos. 21671330 and 61473263) of China. We would like to thank the Institute of Yichun Agricultural Science, China for the help in data processing. References Świetlicka, I., Sujak, A., Muszyński, S., Świetlicki, M., 2017. The application of artificial neural networks to the problem of reservoir classification and land use determination on the basis of water sediment composition. Ecol. Indic. 72, 759–765. Aiken, L.S., West, S.G., Pitts, S.C., 2003. Multiple Linear Regression. John Wiley & Sons. Ashbolt, N.J., 2004. Microbial contamination of drinking water and disease outcomes in developing regions. Toxicology 198 (1–3), 229–238. Basalyga, G., Salinas, E., 2006. When response variability increases neural network robustness to synaptic noise. Neural Comput. 18 (6), 1349–1379. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2007. Greedy layer-wise training of deep networks. In: In: Bernhard Schölkopf, J.P., Hoffman, T. (Eds.), Advances in Neural Information Processing Systems (NIPS‘06), vol. 19. MIT Press, pp. 153–160. Bibi, H., Nutman, A., Shoseyov, D., Shalom, M., Peled, R., Kivity, S., Nutman, J., 2002. Prediction of emergency department visits for respiratory symptoms using an artificial neural network. Chest 122 (5), 1627–1632. Boyce, J.M., Havill, N.L., Otter, J.A., Adams, N.M.T., 2007. Widespread environmental contamination associated with patients with diarrhea and methicillin-resistant Staphylococcus aureus colonization of the gastrointestinal tract. Infect. Control Hosp. Epidemiol. 28, 1142–1147. Briggs, D., 2003. Environmental pollution and the global burden of disease. Br. Med. Bull. 68 (1), 1–24. Cardinaux, F., Sanderson, C., Marcel, S., 2003. Comparison of MLP and GMM classifiers for face verification on XM2VTS. In: Kittler, J., Nixon, M.S. (Eds.), Audio- and VideoBased Biometric Person Authentication. Springer Berlin Heidelberg, pp. 911–920. Chang, N.-B., Mohiuddin, G., Crawford, A.J., Bai, K., Jin, K.-R., 2015. Diagnosis of the artificial intelligence-based predictions of flow regime in a constructed wetland for stormwater pollution control. Ecol. Inf. 28 (1), 42–60. Chernih, A., Solodoukhina, D., 2008. The Problem of Soil Pollution in Russia and Associated Health Problems. Springer Netherlands, Dordrecht, pp. 161–170. Dai, F., Zhou, Q., Lv, Z., Wang, X., Liu, G., 2014. Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau. Ecol. Indic. 45, 184–194. David, O.E., Greental, I., 2014. Genetic algorithms for evolving deep neural networks. Proc. GECCO. ACM, Vancouver, Canada, pp. 1451–1452.

80

Ecological Indicators 82 (2017) 76–81

Q. Song et al.

Reid, B., Jones, K., Semple, K., 2000. Bioavailability of persistent organic pollutants in soils and sediments – a perspective on mechanisms, consequences and assessment. Environ. Pollut. 108 (1), 103–112. Reynolds, T., 2001. Disease prediction models aim to guide medical decision making. Ann. Intern. Med. 135 (8), 637–640. Salakhutdinov, R., Hinton, G.E., 2009. Deep Boltzmann machines. In: Proc. AISTATS. Clearwater Beach, FL. pp. 448–455. Salakhutdinov, R., Tenenbaum, J.B., Torralba, A., 2013. Learning with hierarchical-deep models. IEEE Trans. Pattern Anal. Mach. Intell. 35 (8), 1958–1971. Simon, D., 2008. Biogeography-based optimization. IEEE Trans. Evol. Comput. 12 (6), 702–713. Smolensky, P., 1986. Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L. (Eds.), Parallel Distributed Processing: vol. 1: Foundations. MIT, Cambridge. Song, Q., Zheng, Y.-J., Xue, Y., Sheng, W.-G., Zhao, M.-R., 2017. An evolutionary deep neural network for predicting morbidity of gastrointestinal infections by food contamination. Neurocomputing 226, 16–22. Türkdoˇgan, M., Kilicel, F., Kara, K., Tuncer, I., Uygan, I., 2003. Heavy metals in soil, vegetables and fruits in the endemic upper gastrointestinal cancer region of Turkey. Environ. Toxicol. Pharm. 13 (3), 175–179. Tan, G., Yan, J., Gao, C., Yang, S., 2012. Prediction of water quality time series data based on least squares support vector machine. Proc. Eng. 31, 1194–1199. Wang, Q., Liu, Y., Pan, X., 2008. Atmosphere pollutants and mortality rate of respiratory diseases in Beijing. Sci. Total Environ. 391 (1), 143–148. Xue, Y., Jiang, J., Zhao, B., Ma, T., 2017. A self-adaptive artificial bee colony algorithm based on global best for global optimization. Soft Comput. http://dx.doi.org/10. 1007/s00500-017-2547-1. Zeng, N., Wang, Z., Zhang, H., Liu, W., Alsaadi, F.E., 2016. Deep belief networks for quantitative analysis of a gold immunochromatographic strip. Cogn. Comput. 8 (4), 684–692. Zhang, W.Y., Wang, J.J., Liu, X., Wang, J.Z., 2013. Prediction of ozone concentration in semi-arid areas of China using a novel hybrid model. J. Environ. Inf. 22 (1), 68–77. Zheng, Y., Ling, H., Xue, J., Chen, S., 2014a. Population classification in fire evacuation: a multiobjective particle swarm optimization approach. IEEE Trans. Evol. Comput. 18 (1), 70–81. Zheng, Y.-J., Ling, H.-F., Wu, X.-B., Xue, J.-Y., 2014b. Localized biogeography-based optimization. Soft Comput. 18 (11), 2323–2334. Zheng, Y.-J., Ling, H.-F., Xue, J.-Y., 2014c. Ecogeography-based optimization: enhancing biogeography-based optimization with ecogeographic barriers and differentiations. Comput. Oper. Res. 50, 115–127. Zheng, Y.-J., 2015. Water wave optimization: a new nature-inspired metaheuristic. Comput. Oper. Res. 55 (1), 1–11. Zheng, Y.-J., Ling, H.-F., Chen, S.-Y., Xue, J.-Y., 2015a. A hybrid neuro-fuzzy network based on differential biogeography-based optimization for online population classification in earthquakes. IEEE Trans. Fuzzy Syst. 23 (4), 1070–1083. Zheng, Y.-J., Xu, X.-L., Ling, H.-F., Chen, S.-Y., 2015b. A hybrid fireworks optimization method with differential evolution operators. Neurocomputing 148 (1), 75–82. Zheng, Y.J., Sheng, W.G., Sun, X.M., Chen, S.Y., 2017. Airline passenger profiling based on fuzzy deep machine learning. IEEE Trans. Neural Netw. Learn. Syst. http://dx.doi. org/10.1109/TNNLS.2016.2609437.

Deng, X., Xu, Y., Han, L., Yu, Z., Yang, M., Pan, G., 2015. Assessment of river health based on an improved entropy-based fuzzy matter-element model in the Taihu Plain, China. Ecol. Indic. 57, 85–95. Gabrys, B., 2002. Combining neuro-fuzzy classifiers for improved generalisation and reliability. Proc. 2002 Int’l Joint Conf. Neural Networks, vol. 3 2410–2415. Gauvain, J.L., Lee, C.-H., 1994. Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2 (2), 291–298. Granata, F., Papirio, S., Esposito, G., Gargano, R., de Marinis, G., 2017. Machine learning algorithms for the forecasting of wastewater quality indicators. Water 9 (2). Gu, B., Sun, X., Sheng, V.S., 2016. Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. http://dx.doi.org/10.1109/TNNLS.2016.2544779. Hinton, G.E., Salakhutdinov, R.R., 2006. Reducing the dimensionality of data with neural networks. Science 313 (5786), 504–507. Hinton, G.E., Osindero, S., Teh, Y.-W., 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18 (7), 1527–1554. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyenv, P., Sainath, T.N., Kingsbury, B., 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29 (6), 82–97. Hu, G., Root, M., 2005. Accuracy of prediction models in the context of disease management. Disease Manag. 8 (1), 42–47. Junk, J., Krein, A., Helbig, A., 2009. Mortality rates and air pollution levels under different weather conditions: an example from Western Europe. Int. J. Environ. Waste Manag. 4 (1–2), 197–212. Keith, L.H., 1997. Environmental endocrine disruptors: an overview of the analytical challenge. In: 13th Annual Waste Testing and Quality Assurance Symposium. Arlington. Lee, K.Y., Chung, N., Hwang, S., 2016. Application of an artificial neural network (ANN) model for predicting mosquito abundances in urban areas. Ecol. Inf. 36, 172–180. Lek, S., Guégan, J., 1999. Artificial neural networks as a tool in ecological modelling, an introduction. Ecol. Model. 120 (2C3), 65–73. Mazumder, D.N., Gupta, J.D., Chakraborty, A.K., Chatterjee, A., Das, D., Chakraborti, D., 1992. Environmental pollution and chronic arsenicosis in south Calcutta. Bull. World Health Organ. 70 (4), 481–485. Moseholm, L., Taudorf, E., Frøsig, A., 1993. Pulmonary function changes in asthmatics associated with low-level SO2 and NO2 air pollution, weather, and medicine intake. Allergy 48 (5), 334–344. Mukherjee, J., Moniruzzaman, M., Chakraborty, S.B., Lek, S., Ray, S., 2017. Towards a physiological response of fishes under variable environmental conditions: an approach through neural network. Ecol. Indic. 78, 381–394. Muttil, N., Chau, K.-W., 2007. Machine-learning paradigms for selecting ecologically significant input variables. Eng. Appl. Artif. Intell. 20 (6), 735–744. Payment, P., Richardson, L., Siemiatycki, J., Dewar, R., Edwardes, M., Franco, E., 1991. A randomized trial to evaluate the risk of gastrointestinal disease due to consumption of drinking water meeting current microbiological standards. Am. J. Public Health 81 (6), 703–708. Raskutti, G., Wainwright, M.J., Yu, B., 2011. Minimax rates of estimation for high-dimensional linear regression over ellq-balls. IEEE Trans. Inf. Theory 57 (10), 6976–6994.

81