Ecological Indicators 101 (2019) 212–220
Contents lists available at ScienceDirect
Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind
Modeling and predicting fecal coliform bacteria levels in oyster harvest waters along Louisiana Gulf coast
T
⁎
Jiao Wanga, , Zhiqiang Dengb a b
National Institute of Environmental Health, Chinese Center of Disease Control and Prevention, Beijing, China Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, Louisiana, United States
A R T I C LE I N FO
A B S T R A C T
Keywords: Fecal coliform Oyster Artificial neural network (ANN) Forecasting Seasonality
Fecal coliform bacteria are important indicator microorganisms that are commonly monitored monthly for the quality of oyster harvest waters and the end product, making the protection of public health challenging as oyster harvest may occur daily and fecal coliform levels in oyster harvest waters may also change daily. This paper presents an artificial intelligence-based neural network modeling approach to predict fecal coliform bacteria levels in oyster harvest areas (waters) daily. The new approach was demonstrated by developing an artificial neural network (ANN) model for daily prediction of fecal coliform levels in seven oyster harvest areas along the Northern Gulf of Mexico coast. The model input variables were selected by using the stepwise regression analysis method. It was found that the prevalence of fecal coliform bacteria in oyster growing waters was controlled by six independent environmental predictors, including wind, salinity, tide, water temperature, rainfall, and solar radiation, which were utilized as the model input variables. It was also found that the prevalence of fecal coliform bacteria in oyster growing waters is affected by not only current conditions of the six independent environmental variables but also antecedent conditions of the variables (particularly average solar radiation and cumulative rainfall over the past two days). Model prediction results indicated that the ANN model was capable to predict not only daily variations in fecal coliform levels but also seasonal fluctuations in observed fecal coliform levels characterized by high bacteria levels in the cold season and low bacteria levels in the warm season. The performance of the ANN model was demonstrated by the linear correlation coefficient (LCC) of 0.7421 and root mean square (RMSE) of 0.3844 for the model development phase and the LCC of 0.6312 and RMSE of 0.2835 for the independent validation phase. The ANN model makes it possible to reduce the harvest and consumption of fecally contaminated oysters and thereby greatly reduce the health risk to the general public and particularly oyster consumers. Although the predictive ANN model was specifically developed for oyster harvest areas along the Louisiana Gulf coast, the methods used in this paper are generally applicable to other oyster harvest areas and coastal waters.
1. Introduction Nonpoint source pollution from pathogens is a leading cause of water quality impairments in the rivers and streams of the United States (EPA, 2002). Due to the relative ease of use and low cost, bacterial indicator organisms are usually used to assess water quality and establish water quality criteria in the United States (Zhang et al., 2015b). The presence of fecal coliform bacteria in water bodies is generally considered an indication that the water body is contaminated with sewage. The National Health and Medical Research Council of Australia and European Environmental Agency (EEA) both target fecal coliform as one of the microbial indicators for drinking water quality. Chinese water quality standard targets fecal coliform as one of the general ⁎
bacteria indicator. Canadian Council of Ministers of the Environment (CCME) includes fecal coliform levels in data set in order to monitor routine water quality. In order to protect human health, the U.S. National Shellfish Sanitation Program (NSSP), a cooperative program between the U.S. Food and Drug Administration (FDA), state regulatory agencies, the Interstate Shellfish Sanitation Conference, and the shellfish industry, requires shellfish producing states to monitor fecal coliform levels in shellfish harvest waters to determine that the products are safe before harvesting is permitted. Fecal coliform bacteria are able to reproduce rapidly under optimum environmental conditions for growth. With adequate food, they can even multiply and grow into visible colonies in dark, warm, and moist environment. By growing and counting colonies of fecal coliform
Corresponding author. E-mail address:
[email protected] (J. Wang).
https://doi.org/10.1016/j.ecolind.2019.01.013 Received 30 August 2018; Received in revised form 3 January 2019; Accepted 5 January 2019 1470-160X/ © 2019 Elsevier Ltd. All rights reserved.
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
Fig. 1. Oyster harvesting areas along Louisiana Gulf coast shoreline.
environmental variables, such as rainfall, temperature, solar radiation, humidity, and salinity (Hathaway et al., 2014; Tong et al., 2016; Wolanin et al., 2016). Regression models can be also applied to study how one environmental variable affects fecal coliform levels when other variables remain fixed and predict fecal coliform levels in oyster harvest areas while it also showed difficulties in identifying the necessary explanatory variables to characterize a large percentage of the variability observed in the fecal coliform data with relatively low R2 values reported (Mas and Ahlfeld, 2007). Since the relationship between environmental variables and fecal coliform levels is complicated and non-linear, non-linear models must be developed to predict fecal coliform levels (Fan et al., 2015). Neural networks are alternative predict models, such as Artificial Neural Networks (ANN). Artificial Neural Networks (ANNs) simulate human brain functioning to predict fecal coliform levels. ANNs have shown promise in modeling and predicting the prevalence of fecal indicator bacteria and outbreaks of viruses in coastal waters (Chenar and Deng, 2018; Wang and Deng, 2016; Zhang et al., 2012). A normalization process is done before ANN models are developed to provide an equivalent numerical basis for input parameter determination. With environmental parameters provided as input data, such as river flow discharge, depth, and salinity, the performance of ANN models is generally better than other models especially in coastal areas (Lin et al., 2008; Zhang et al., 2015a,b). Compared with conventional statistical model, such as multiple linear regression model, binary logistic regression model, partial least square regression model, etc., the ANN model demonstrates a better performance (Thoe et al., 2014). With the advent of regular water quality monitoring and public notification programs at oyster harvesting areas in Louisiana, there has been increased interests in developing models to predict fecal coliform levels. The primary objective of this study was to utilize an artificial
bacteria from a sample of water, we can determine approximately how many bacteria were originally present. Fecal coliform concentrations vary frequently with rainfall, temperature, turbidity, etc. (Kim et al., 2016; Chen and Liu, 2017; Karabas et al., 2018; Leight et al., 2016). Climate may also affect the growth and distribution of fecal coliforms. Seasonality of such environmental factors may affect aquatic pollutant loads and thus the prevalence of the bacteria in water bodies. As a result, there is a need to develop effective tools that could be used to determine concentration of pathogens or at least pathogen indicator organisms in waters. Development of such model would require adequate observed data and parameter estimation for model calibration. A data-driven model could be beneficial and effective in providing reasonably accurate predictions of the presence of fecal coliform in waters based on commonly measured constituents (Tufail et al., 2008). United States EPA recommends specific procedures to collect water samples and test fecal coliform bacteria (APHA, 1992). Field sampling is a traditional method for determining fecal coliform levels. The advantage of molecular method is accurate identification of fecal coliform strains, but both the external substances and low efficiency of DNA extraction may interrupt molecular detection results (Atherholt et al., 2017). Water samples taken from oyster harvest areas are commonly collected and delivered to a laboratory for molecular analysis. Knowing microbiological characteristics of samples is always lagged behind sampling (Rahman et al., 2017; Krapf et al., 2016). It would be ideal to use real-time or near real-time sensors to detect fecal coliform levels, such as sensors carried by satellites. Algorithms are developed to establish the relationship between radiation and reflectance detected by satellite sensors and in-situ environmental data. In recent years, satellite images have been utilized for mapping fecal coliform bacteria levels by developing empirical algorithms for environmental variables (Zhang et al., 2012). Fecal coliform bacteria levels are associated with 213
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
intelligence-based neural network modeling approach to predict fecal coliform bacteria levels in oyster harvest waters. To that end, an artificial neural network (ANN) model was established for predicting fecal coliform levels in seven oyster harvest areas along the Northern Gulf of Mexico coast.
Table 2 Description of environmental data from 2001 to 2011.
2. Materials and methods 2.1. Study area Louisiana is the top producer of oysters in the U.S. oyster production areas. The oyster harvest areas in Louisiana are primarily concentrated on two sides of the Mississippi River along the Gulf of Mexico coast. The Breton Sound and Pontchartrain oyster growing basin are two of the most productive oyster growing basins in the middle of the Gulf of Mexico coast. Thus the monitoring of concentration of microbial indicator, fecal coliform, has drawn wide attention. Louisiana Gulf coast is located in the north of Gulf of Mexico along the southeast Louisiana shoreline. Thus, the environmental data were collected from the study area in the Breton Sound Basin and part of Pontchartrain Basin (oyster harvest Area 1 through Area 7 in Fig. 1) which is located on the southeastern side of the progradational delta lobe of the Mississippi River. Information of these areas is shown in Table 1. This area extends from West 88°54′ to West 89°48′, from North 29°16′ to North 30°14′. Both short-term and long-term changes in the aquatic environment measured here represented water quality of the entire Gulf of Mexico oyster harvest areas. Different areas provide unique environment for growth of fecal coliform bacteria due to different hydrologic characteristics. Red polygons and areas in Fig. 1 are impaired water bodies listed in Clean Water Act 303(d) (http://water.epa.gov/lawsregs/guidance/303.cfm). In all impaired water bodies, Lake Borgne (Area 1) and Bayou Terrace are mainly impaired by pathogen. Lake Pontchartrain is impaired by copper, total coliform, and fecal coliform. Area 2 is adjacent to Area 1 and channeled to Lake Pontchartrain which is also included in 303(d) list but not shown in Fig. 1. Thus, Area 2 is strongly affected by inflow from Area 1 and Lake Pontchartrain. Among all 7 areas, area 3 is the only one which is connected to a number of lakes and bays. These lakes and bays may lead to increase of wind speed, and finally affect fecal coliform levels within this area. Area 4 and area 5 are divided by a small branch of Mississippi river. The inflow of area 5 is also affected by Bayou Terrace, so the effects of runoff to the area 4 and area 5 may be different. A part of the boundary of area 7 is coincident with Mississippi river, significantly strengthening effects of runoff to this area. Although each area in this study has its own unique characteristics, all 7 areas share some similarities.
Datasets used in this study included data for 6 most relevant environmental variables, including wind, salinity, tide, water temperature, rainfall, and solar radiation observed in each area (Table 2). In which, tide and wind data are categorical data and the rest of variables Table 1 Information of 7 oyster harvesting areas (Area 1 through Area 7). Area (km2)
1 2 3 4 5 6 7
683.74 309.66 3121.48 810.84 674.52 507.94 1095.13
Number
Maximum
Minimum
Mean
Stand error
Wind direction Wind, mph Salinity, ppt Tide Water temperature, ℃ Rainfall, mm Solar radiation, kwh/m2
71,440 71,363 71,440 71,426 71,440 67,665 57,411
16.00 7.00 35.70 9.00 35.00 81.00 4.47
0 0 0 1.00 −2.80 0 0
6.68 2.28 10.56 3.80 22.30 0.12 15.59
4.24 0.98 6.86 2.15 6.82 0.32 11.34
are numerical data. For wind data, wind direction and wind speed were obtained. Wind speed data was coded by Louisiana BEACH Grant Report Grant Year 2001, which ranged from 0 to 7. Calm wind (0 mph) was coded as 0; light wind (0–5 mph) was coded as 1; moderate-light wind (10–15 mph) was coded as 2; moderate wind (10–15 mph) was coded as 3; moderate-strong wind (15–20 mph) was coded as 4; strong wind (20–35 mph) was coded as 5; gale (35 mph and more) was coded as 6; and not reported wind was coded as 7. Besides wind speed, average wind speed estimated by midrange value of each category was calculated. Tide type 1 was also defined by Louisiana BEACH Grant Report Grant Year 2001: extreme high tide was coded as 9, extreme low tide was coded as 8, low falling tide was coded as 7, high rising tide was coded as 6, normal tide was coded as 5, low rising tide was coded as 4, low tide was coded as 3, high falling tide was coded as 2, and high tide was coded as 1. According to sea surface height, tide type 2 was defined in this study to describe different categories of tide: extreme high tide was recoded as 9, high rising tide was recoded as 8, high tide was recoded as 7, high falling tide was recoded as 6, normal tide was recoded as 5, low rising tide was recoded as 4, low tide was recoded as 3, low falling tide was recoded as 2, and extreme low tide was recoded as 1. Rainfall data for current day, previous one-day, and previous two-day were obtained. Based on everyday rainfall data, cumulative two-day rainfall and cumulative three-day rainfall were calculated. Similarly, daily solar radiation of current day, previous one day, previous twoday, and previous three-day as well as cumulative two-day solar radiation and cumulative three-day solar radiation were calculated based on hourly solar radiation. Water samples were collected in Areas 1 through 7 approximately every month during 2001 to 2011 (Louisiana Department of Health and Hospitals Office of Public Health Molluscan Shellfish Program) from different sites. Water temperature was measured by Louisiana Department of Health and Hospitals Office of Public Health Molluscan Shellfish Program (LDHH-PHMSP) at one meter level beneath the sea surface. Fecal coliform bacteria levels in water samples were analyzed using US EPA standard methods. Other monthly environmental data for the period January 2001 – November 2011, such as tide, wind direction, wind speed, salinity, and rainfall, were collected from LDHHPHMSP. Solar radiation data were available from August – 2001 to June − 2011 from LSU Agriculture Center (http://weather.lsuagcenter.com) at Station Port Sulphur. In long term observations, fecal coliform bacteria levels fluctuate much more sharply than the environmental variables. Thus, a log transformation of fecal coliform bacteria level was taken.
2.2. Gathering and processing of environmental data
Area Number
Environmental parameters
2.3. Stepwise regression analysis In order to select relevant model parameters, the stepwise regression analysis method was used. This study used SAS (Version 9.2) to conduct the stepwise regression analysis. Forward selection, which involves starting with no variables in the model, testing the addition of each variable using a chosen model fit criterion, adding the variables (if any) whose inclusion give the most statistically significant improvement of the fit, and repeating this process until none improves the 214
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
model to a statistically significant extent. The partial R-square was used to estimate the relevance between environmental variables and fecal coliform levels. The partial R-square (or coefficient of partial determination) measures the marginal contribution of one explanatory variable when all others are already included in the model. It reflected the relevance of independent environmental variables and fecal coliform levels. Variables with P-value < 0.05 in stepwise regression analysis were finally selected as model input variables.
Table 3 R-square and Partial R-square of environmental variables (P < 0.05) in areas 1 through 7 from stepwise regression analysis (R2 = 0.386).
2.4. Development of artificial neural networks (ANN) model Artificial intelligence-based neural network (ANN) models are generally designed to function like the human brain and trained to create the superhuman intelligence being able to mimic the presence of fecal coliforms in oyster growing waters by learning from historical datasets (Chenar and Deng, 2018). A key difference among various types of ANN models is the training algorithm used for determining the weights for neurons between different layers (input layer-hidden layer and hidden layer-output layer). The most commonly used training algorithm is called Back Propagation, which is a multilayer feed-forward network trained by error back-propagation algorithm, and trains ANN models to minimize the difference between the model output and measured target (Zhang et al., 2015a). The back-propagation ANN model utilized in this study consists of three layers: an input layer, a hidden layer, and an output layer. The structure of the ANN model developed in this study is shown in Fig. 2, which is characterized by one input layer consisting of seven input variables, one hidden layer consisting of 12 hidden neurons, and one output layer. The model output denotes log-transformed fecal coliform levels (log (FC)) in oyster harvest areas. This study used the Neural Network Toolbox in MATLAB (Version 2010a) to train, validate, and test the ANN model in the model development procedure. The trained ANN model was further validated through an independent validation procedure. All data collected in the seven oyster harvest areas from September 12th, 2001 to June 15th, 2011 were split into two groups: model development group consisting of the data from September 12th, 2001 to April 20th, 2008 and independent validation group consisting of data from April 21st, 2008 to June 15th, 2011. This study arranged all sampling data in time sequence and chose the data year by year. In the development group, input data were randomly split into three subgroups for training (Subgroup-1: 60% of development group data), validation (Subgroup-2: 20% of development group data), and testing (Subgroup-3: 20% of development group data). The ANN model was continuously trained till the overall performance of the model was acceptable. The model performance was measured using two statistical metrics, including the linear correlation coefficient (LCC) (Eq. (1)) and root mean square error (RMSE) (Eq. (2)).
ρX , Y
cov (X , Y ) = σX σY
1 n
n
⌢
⎜
i=1
⎝
0.2150 0.2840 0.3430 0.3630 0.3800 0.3850 0.3860
0.2150 0.0690 0.0590 0.0200 0.0170 0.0050 0.0010
3.1. Selection of model input variables The model input variables were selected based on the results of the stepwise regression analysis. The entered variables and associated partial r-squares are shown in Table 3 for the ANN model. Water temperature was shown as an important parameter in the study area and thus selected as an important environmental predictor for fecal coliform bacteria prediction. Solar radiation was also included in models of other studies because it was determined to be one of the most significant variables to predict fecal coliform levels (Cho et al., 2012). The cumulative two-day average solar radiation was proved to be strongly related with fecal coliform level (P < 0.05) for the ANN model. Other studies also confirmed that strong wind speed intensified the transport of fecal coliform bacteria (Sigua et al., 2010). The inclusion of midrange wind speed type (WS2) contributed to a better partial R-square than the wind speed (WS1). Thus, we included WS2 in our ANN model. The cumulative two-day rainfall (RF2) was included in the ANN model other than current rainfall and cumulative three-day rainfall because it was associated with higher partial R-square. Wind direction (Wd) and fecal coliform levels were correlated because it induced more energetic conditions which enhanced the presence of high fecal coliform levels (Ufnar et al., 2006). Tide was another variable which affected the prevalence of fecal coliform bacteria due to water level variation. 3.2. Selected ANN model Fig. 3 shows the performance of the finally selected ANN model based on annual data sets and corresponding LCC and RMSE. The LCC and RMSE values shown in Fig. 3 denote the averages of corresponding values for training, validation, and testing phases of the selected ANN model. Fig. 3 shows a comparison between the log-transformed fecal coliform levels predicted using the ANN model and observed ones in all the seven oyster harvest areas from September 12th, 2001 to April 20th, 2008 (sampling event number 1–47 denote data from 2001, 48–200 from 2002, 201–352 from 2003, 353–509 from 2004, 510–625 from
2
⎠
T SR2 S Ws2 RF 2 Wd T2
3. Results
⎟
i
Partial R-Square
where yi denotes the observed value and yi ̂ denotes predictive value.
(1)
∑ ⎛yi − y ⎞
R-Square
*T = water temperature; SR2 = cumulative previous two-day average solar radiation; S = salinity; WS2 = average wind speed; RF2 = cumulative two-day rainfall; Wd = wind direction type; T2 = tide type 2.
where cov denotes covariance, σX denotes the standard deviation of X, and σY denotes the standard deviation of Y.
RMSE =
Variables
(2)
Fig. 2. Artificial Neural Network (ANN) model structure (w denotes weight of input neurons and b denotes bias).
215
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
Fig. 3. Comparison between the observed and model predicted fecal coliform levels in the model development period (September 12th, 2001 to April 20th, 2008).
independent validation period, respectively. The selected ANN model correctly predicted approximately 45.14% and 47.54% of these violations in model development phase and model independent validation phase, respectively. Compared with fecal coliform level prediction in model development period, the model predictions in independent validation showed smaller LCC, though it successfully reproduced a slightly higher proportion of water quality violations.
2005, 626–733 from 2007, and 734–785 from early 2008). The observed fecal coliform levels clearly showed seasonal variations characterized by decreasing fecal coliform levels in summer season and increasing fecal coliform levels in winter season. Most violations of water quality criterion occurred during the low temperature season from November to April when solar radiation and temperature were relatively low. The predicted fecal coliform levels showed a similar variation trend with the LCC value of 0.7421 for the model development. Fig. 4 demonstrates the performance of the ANN model with the independent validation data. The overall variation pattern shown in Fig. 4 is basically similar to that in Fig. 3, which is characterized by high fecal coliform levels in the winter season and low fecal coliform levels in summer season. The correlation coefficient of the model slightly decreased to 0.6312 when independent data were used to test the model performance. The independent data consisted of observed fecal coliform level data collected from April 21st, 2008 to June 15th, 2011, in which the 1–115th data points denote data from late 2008, the 116-290th data points denote data from 2009, the 291-440th data points denote data from 2010, and 441-523th data points denote data from 2011. In both the model development phase and the independent validation phase, the predictive ANN model consistently demonstrated a good performance in terms of the consistency of the variation trend in the model predicted fecal coliform levels with that in the observations of fecal coliform in the study area. According to the NSSP standard, the geometric mean of fecal coliform counts in water samples should not exceed 14 MPN per 100 ml (Log (FC) = 1.15). Based on this standard, there were 144 and 61 water quality standard violation cases in model development period and
3.3. Seasonality of fecal coliform levels The monthly average of fecal coliform level in multiple years demonstrates an obvious U shape in both model development period and independent validation period (Fig. 5). Fig. 5 clearly illustrates the comparison between observed and predicted monthly averaged fecal coliform level in multiple years. The lowest fecal coliform level appeared in June, July, and August, and the highest fecal coliform levels appeared in January, February, November, and December. The prediction of fecal coliform level accurately simulated this seasonal variation with LCC of 0.9765 and 0.9093 in model development period and independent validation period, respectively. The performance of developed model in model development phase was better than that in independent validation phase. The model error was larger in independent validation period with RMSE of 0.0798 than that in model development period (RMSE = 0.0440). The discrepancy between observation and prediction was relatively large in November and December in both model development period and independent validation period. In addition, the predicted fecal coliform level was obviously higher than observed values in January, February, and March in independent validation period.
Fig. 4. Comparison between the observed and model predicted fecal coliform levels in the independent validation period (April 21st, 2008 to June 15th, 2011). 216
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
Fig. 5. Seasonality of monthly variation of observed and predicted fecal coliform level in model development period (a) and independent validation period (b).
4. Discussion 4.1. Performance of the ANN model The model accuracy and stability are dependent to the strength of correlation between model variables and target fecal coliform levels. The predictive model developed in this study showed a reasonable performance in fecal coliform level estimation and accurately simulated seasonal variation of fecal coliform level in Louisiana oyster harvest waters. The true positive rate (TPR), true negative rate (TNR), false positive rate (FPR), and false negative rate (FNR) in this study are 47.54%, 96.13%, 3.87%, and 52.46%, respectively. Motamarri and Boccelli (2012) utilized multiple linear regression (MLR) model and ANN model to predict fecal coliform levels for further classification by determining if the concentration satisfies the water quality standard for different purposes of water use, such as swimming and boating. The TPR, TNR, FPR, and FNR of ANN model in their study were 56%–68%, 92%–99%, 1%–8%, and 32%–44%, respectively. The TPR in our study was slightly lower, the FNR was slightly higher, and the rest indices were close to that in their study. Thoe et al. (2014) also established an ANN model in coastal waters using past fecal indicator bacteria (FIB) concentration, rainfall, tide, wave, storm drain condition, temperature, upwelling index, solar radiation, clod cover, air pressure, and onshore/ offshore wind speed. Their developed model was applied to predict water quality at Santa Monica Beach in the United States, the correlation coefficient and RMSE in model independent validation phase for fecal coliform level prediction was 0.38 and 0.41, respectively. The ANN model performance has been improved to some extent in our study via parameter simplification and including additional relevant environmental parameters, such as wind direction, solar radiation, and salinity (LCC = 0.63; RMSE: 0.28).
Fig. 6. Variation of average value of environmental variables when water quality violation events occur in model development phase (to the left of the vertical line) and independent validation phase (to the right of the vertical line).
217
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
could be classified as strong wind. The wind direction of ESE and strong wind speed contributed to transfer of fecal coliform from sources outside the study area into this region. In water quality violation events, the value of salinity data was significantly lower than the historical maximum, which was distributed on two sides of 8.02 ppt with a slight trend of seasonal variation. This negative correlation between salinity and coliform levels was consistent with the inverse relationship between salinity and fecal coliform levels with R2 of 0.74 observed in Mississippi Sound which is quite close to our study area (P < 0.001) (Chigbu et al., 2004). Similarly, the water temperature in water quality violation events also demonstrates seasonality to some extent, which fluctuated around 17.35 °C. The maximum and minimum water temperature in violation events were close to historical upper and lower limit, which implying that although low temperature was claimed to be responsible for the decrease of fecal coliform levels in some studies since the growth of bacteria is inhibited, violation of the water quality criterion does not necessarily correlate with low temperature in this study. Leight et al. (2016) pointed out that the average air temperature was only significantly related to fecal coliform levels in the summer, which also explained the data distribution pattern in Fig. 6 that the fecal coliform level could not be determined by the average temperature of a single event; instead, the seasonality of temperature variation should be taken into consideration for more accurate predictions. The average tide data was close to category 5, which denotes normal tide. Since tide data used in this study was categorical, it could not be explained as that normal tide is the reason of water quality violation, it should be interpreted as that the distribution of tide type in all categories was relatively even in all water quality violation events. Many studies have shown that higher fecal coliform levels were observed during high tides due to possible contamination of the aquifer near the shoreline or the presence of high concentrations of fecal coliform bacteria in the beach sand (Wright et al., 2011; Zhang et al., 2013). The relationship between low tide and high fecal coliform levels was also observed in some studies, while the mechanism remains unclear (Davino et al., 2015; Rosenfeld et al., 2006). Most rainfall data was significantly lower than the historical maximum (5.06 mm), and was concentrated at 0.44 mm level. Heavy rainfall might be an important cause of failure of wastewater treatment plants (WWTPs). Once the inflow exceeds the capacity of WWTPs, a large amount of storm water pollutants would be discharged directly to coastal areas, such as pathogens from untreated combined sewage, waterfowl feces, wildlife feces, and domestic pet wastes, which have be collected from parking lots, streets, driveways, and other impervious surfaces. This leads to the increase of fecal coliform level in water as a consequence. Leight and Hood (2018) found that antecedent rainfall corresponds to the densities of fecal coliforms, but the response of fecal coliform level to rainfall differs between watersheds, and is related to characteristics such as the percent of open water, the percent and types of wetlands, and the percent of soils with moderate to high runoff potential. In their study, the level of rainfall predictive of a 50% or greater chance of fecal coliform exceeding the criterion is typically more than 25.4 mm and they also mentioned that this threshold value might vary by watershed. In this study, the value of rainfall was not as high as expected when water quality criterion was violated, indicating that rainfall might be an essential factor of fecal coliform level increase, but low rainfall or even no rainfall could not be seen as low risk of water quality violation. Unlike wind direction, wind speed, tide, and water temperature, solar radiation data in water quality violation events showed a regular fluctuation around 11.47 kwh/m2 per hour. From September 2001 to June 2011, the maximum value of solar radiation was 41 kwh/m2 per hour, while when fecal coliform levels exceeded the water quality criterion, no solar radiation over 30 kwh/m2 per hour was observed. Solar radiation significantly increased the mortality rate of fecal coliform in marine waters (Boehm et al., 2009). In summer season, both water
Fig. 6. (continued)
4.2. Data distribution pattern of environmental variables in water quality violation events Fig. 6 demonstrates variation of environmental variables when water quality violations occur in model development phase and independent validation phase. In which, the red horizontal line indicates the historical maximum value, the green horizontal line indicates the historical minimum value, and the yellow dash line indicates the average value only in violation events. The average values of environmental variables shown in Fig. 6 reveal the distribution pattern of data, it should be pointed out that the mean value does not necessarily lead to the occurrence of water quality violation events. When the water quality criterion was violated, observed solar radiation and salinity data is mainly distributed in low value zone, other variables haven’t shown a certain distribution pattern. The historical upper and lower limit of wind direction is 16 and 0, respectively. The average wind direction in water quality violation events was close to 6, which denotes ESE (east-southeast). For wind speed data, except for 4 cases of the historical maximum (27.5 mph) value, most wind speed data distributed around 11.41 mph, which 218
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
independent environmental variables including wind, salinity, tide, water temperature, rainfall, and solar radiation. Specifically, the prevalence of fecal coliform bacteria in oyster growing waters is affected by not only current conditions of the six independent environmental variables but also antecedent conditions of the variables (particularly average solar radiation and cumulative rainfall over the past two days). 2) Model prediction results indicated that the ANN model was capable to predict not only daily variations in fecal coliform levels but also seasonal fluctuations in observed fecal coliform levels characterized by high bacteria levels in the cold season and low bacteria levels in the warm season due to solar inactivation. The performance of the ANN model was demonstrated by the linear correlation coefficient of 0.7421 and RMSE of 0.3844 in the model development phase and the linear correlation coefficient of 0.6312 and RMSE of 0.2835 in the independent validation phase. 3) The ANN model makes it possible to reduce the harvest and consumption of fecally contaminated oysters and thereby greatly reduce the health risk to the general public and particularly oyster consumers. 4) While the predictive ANN model was specifically developed for oyster harvest areas along the Louisiana Gulf coast, the methods presented in this paper are generally applicable to other oyster harvest areas and coastal waters as well.
temperature and solar radiation are higher, but the inhibition effect of high solar radiation on growth of fecal coliform was stronger than the promotion effect of water temperature. Due to the jointly effect of solar radiation and water temperature, the fecal coliform levels were observed to be lower in that in summer and higher than that in winter in this study. 4.3. Effects of environmental variables on prediction of fecal coliform levels Periodical fluctuations of predicted fecal coliform levels in Figs. 3 and 4 imply that the developed model also reproduced the seasonality of fecal coliform levels, showing low concentration in summer and high concentration in winter. This could be explained by the fact that a part of model input variables fluctuated on a seasonal basis, such as solar radiation and water temperature. While Cho et al. observed an opposite result using their modified SWAT model, which shows high concentration in summer and low concentration in winter in four different sites in Korea, they explained the seasonal variability of fecal coliform from a perspective of bacteria growth, they claimed that fecal coliform regrowth is dominant in summer season while bacteria are inactivated or dead in winter season in surface waters (Cho et al., 2016). Summer season is characterized by high temperature and high solar radiation, the latitude of our study area is lower than that in Cho et al.’s study, the increase of solar radiation might be more significant, which inhibits the regrowth of fecal coliform. While the correlations between different environmental parameters and fecal coliform levels are not uniform seasonally. It has also been claimed that the most influential variables were related to rainfall and river stage height in the wet season and wind and tidal-stage in the dry season (Zimmer-Fraust et al., 2018). In addition to seasonality, the combined effect of environmental variables also plays a key role in prediction of fecal coliform levels. In this study, the violation could not be simply explained by the increase or decrease of a single factor. Most scholars agreed that temperature, salinity, tide level, solar radiation, and rainfall are all important impact factors of fecal coliform levels (Leight et al., 2016; Thoe et al., 2014; Zhang et al., 2015b). Although not included in this study, large scale sea-level pressure patterns (Leight et al., 2016), sanitation upgrading (Tong et al., 2016), fecal coliform reduction projects such as the constructions of wastewater treatment plant, on-site wetland treatment, wastewater interception (Liu et al., 2015), suspended sediment concentration (Chen and Liu, 2017), and impervious surface (Crim et al., 2012; Uejio et al., 2012) are also proved to be related with fecal coliform levels. Furthermore, lack of monitoring data could lead to decrease of LCC and increase of RMSE. This phenomenon was found in Choi and Bae’s study, they compared the model performance with and without rainfall data, results indicated that the exclusion of rainfall data in the predictive model reduced the LCC and increased RMSE (Choi and Bae, 2017). Therefore, in most cases, effects of environmental factors on fecal coliform level are comprehensive and complicated, and model input variables should be carefully selected. In order to further improve the ANN model performance, other relevant environmental parameters and measures to reduce missing data should be taken into consideration for more accurate predictions of fecal coliform levels in oyster harvest waters in different seasons.
Acknowledgements The authors would like to thank the Molluscan Shellfish Program of Louisiana Department of Health and Hospitals for sharing the bacteriological and environmental data. References APHA, 1992. Compendium Methods for the Microbiological Examination of Foods, 16th Edition. American Public Health Association, Washington DC. Atherholt, T.B., Procopio, N.A., Goodrow, S.M., 2017. Seasonality of coliform bacteria detection rates in New Jersey domestic wells. Groundwater 55 (3), 346–361. Boehm, A.B., Yamahara, K.M., Love, D.C., Peterson, B.M., McNeill, K., Nelson, K.L., 2009. Covariation and photoinactivation of traditional and novel indicator organisms and human viruses at a sewage-impacted marine beach. Environ. Sci. Technol. 43 (21), 8046–8052. Chen, W., Liu, W., 2017. Investigating the fate and transport of fecal coliform contamination in a tidal estuarine system using a three-dimensional model. Mar. Pollut. Bull. 116 (1), 365–384. Chenar, S.S., Deng, Z., 2018. Development of artificial intelligence approach to forecasting oyster norovirus outbreaks along Gulf of Mexico coast. Environ. Int. 111, 212–223. Chigbu, P., Gordon, S., Strange, T., 2004. Influence of inter-annual variations in climatic factors on fecal coliform levels in Mississippi Sound. Water Res. 38 (20), 4341–4352. Cho, K.H., Pachepsky, Y.A., Kim, J.H., Kim, J.-W., Park, M.-H., 2012. The modified SWAT model for predicting fecal coliforms in the Wachusett Reservoir Watershed, USA. Water Res. 46, 4750–4760. Cho, K.H., Pachepsky, Y.A., Kim, M., Pyo, J., Park, M.-H., Kim, Y.M., Kim, J.-W., Kim, J.H., 2016. Modeling seasonal variability of fecal coliform in natural surface waters using the modified SWAT. J. Hydrol. 535, 377–385. Choi, S.-W., Bae, H.-K., 2017. Daily prediction of total coliform concentrations using artificial neural networks. KSCE J. Civ. Eng. 22, 467–474. Crim, J.F., Schoonover, J.E., Lockaby, B.G., 2012. Assessment of fecal coliform and Esacherichia Coli across a land cover gradient in West Georgia Streams. Water Qual. Exposure Health 4 (3), 143–158. Davino, A., Melo, M., Filho, R., 2015. Assesing the sources of high fecal coliform levels at anurban tropical beach. Braz. J. Microbiol. 46 (4), 1019–1026. EPA, 2002. Natiaonl water quality inventory: 2000 Rep, in: (4503F), O.o.W. (Ed.), Washington, D.C. Fan, J., Ming, H., Li, L., Su, J., 2015. Evaluating spatial-temporal variations and correlation between fecal indicator bacteria (FIB) in marine bathing beaches. J. Water Health 13 (4), 1029–1038. Hathaway, J.M., Krometis, L.H., Hunt, W.F., 2014. Exploring seasonality in Escherichia coli and Fecal Coliform ratios in urban watersheds. J. Irrig. Drain. Eng. 140 (4), 1. Karabas, N., Tas, S., Erguven, G.O., Bayhan, H., 2018. A research on coliform bacteria in the Golden Horn Estuary (Sea of Marmara, Turky). Desalin. Water Treat. 115, 199–206. Kim, K., Whelan, G., Molina, M., Purucker, S.T., Pachepsky, Y., Guber, A., Cyterski, M.J., Franklin, D.H., Blaustein, R.A., 2016. Rainfall-induced release of microbes from manure: model development, paramter estimation, and uncertainty evaluation on
5. Conclusions This paper presents an artificial intelligence-based neural network modeling approach to predict fecal coliform bacteria levels in oyster harvest waters on a daily basis. The new approach was demonstrated by developing an ANN model for daily prediction of fecal coliform levels in seven oyster harvest areas along the Northern Gulf of Mexico coast. Major findings from the development and independent validation of the ANN model can be summarized as follows: 1) Fecal coliform levels in oyster growing waters are affected by 6 219
Ecological Indicators 101 (2019) 212–220
J. Wang, Z. Deng
Tong, Y., Yao, R., He, W., Zhou, F., Chen, C., Liu, X., Lu, Y., Zhang, W., Wang, X., Lin, Y., Zhou, M., 2016. Impacts of sanitatino upgrading to the decrease of fecal coliforms entering into the environment in China. Environ. Res. 149, 57–65. Tufail, M., Ormsbee, L., Teegavarapu, R., 2008. Artificial intelligence-based inductive models for prediction and classification of fecal coliform in surface waters. J. Environ. Eng. 134, 789–799. Uejio, C.K., Peters, T.W., Patz, J.A., 2012. Inland lake indicator bacteria: Long-term impervious surface and weather influences and a predictive Bayesian model. Lake Reservoir Manage. 28 (3), 232–244. Ufnar, D., Ufnar, J.A., Ellender, R.D., Rebarchik, D., Stone, G., 2006. Influence of coastal processes on high fecal coliform counts in the Mississippi Sound. J. Coastal Res. 22, 1515. Wang, J., Deng, Z., 2016. Modeling and prediction of oyster norovirus outbreaks along Guld of Mexico. Environ. Health Perspect. 124 (5), 627–633. Wolanin, A., Jelonkiewicz, L., Zelazny, M., Lenart-Boron, A., 2016. Water Air, Soil Pollut. 227 (9), 302. Wright, M.E., Abdelzaher, A.M., Solo-Gabriele, H.M., Elmir, S., Fleming, L.E., 2011. The inter-tidal zone is the pathway of imput of Enterococci to a subtropical recreational marine beach. Water Sci. Technol. 63, 542–549. Zhang, Z., Deng, Z., Rusch, K.A., 2012. Development of predictive models for determining enterococci levels at Gulf Coast beaches. Water Res. 46, 465–474. Zhang, Z., Deng, Z., Rusch, K.A., 2015b. Modeling fecal coliform bacteria levels at Gulf Coast Beaches. Water Qual. Exposure Health 7 (3), 255–263. Zhang, Y., Huang, J.J., Chen, L., Qi, L., 2015a. Eutrophication forecasting and management by artificial neural network: a case study at Yuqiao Reservoir in North China. J. Hydroinf. 17, 679–695. Zhang, W., Wang, J., Fan, J., Gao, D., Ju, H., 2013. Effects of rainfall on microbial water quality on Qingdao No. 1 Bathing Beach, China. Mar. Pollut. Bull. 66 (1-2), 185–190. Zimmer-Fraust, A.G., Brown, C.A., Manderson, A., 2018. Statistical models of fecal coliform levels in Pacific Northwest estuaries for imperoved shellfish harvest area closure decision making. Mar. Pollut. Bull. 137, 360–369.
small plots. J. Water Health 14 (3), 443–459. Krapf, T., Kuhn, R.M., Kauf, P., Gantenbein-Demarchi, C.H., Fieseler, L., 2016. Quantitative real-time PCR does not reliably detect single fecal indicator bacteria in drinking water. Water Sci. Technol. Water Supply 16 (6), 1674–1682. Leight, A.K., Hood, R.R., 2018. Precipitation thresholds for fecal bacterial indicators in the Chesapeake Bay. Water Res. 139, 252–262. Leight, A.K., Hood, R., Wood, R., Brohawn, K., 2016. Climate relationships to fecal bacterial densities in Maryland shellfish harvest waters. Water Res. 89, 270–281. Lin, B., Syed, M., Falconer, R.A., 2008. Predicting faecal indicator levels in estuarine receiving waters - an integrated hydrodynamic and ANN modelling approach. Environ. Modell. Software 23, 729–740. Liu, W., Chan, W., Young, C., 2015. Modeling fecal coliform contamination in a tidal Danshuei River esturine system. Sci. Total Environ. 502, 632–640. Mas, D.M.L., Ahlfeld, D.P., 2007. Comparing artificial neural networks and regression models for predicting faecal coliform concentrations. Hydrol. Sci. J. 52, 713–731. Motamarri, S., Boccelli, L., 2012. Development of a neural-based forecasting tool to calssify recreational water qualtiy using fecal indicator organisms. Water Res. 46, 4508–4520. Rahman, M.M., Yoon, K.B., Lim, S.J., Jeon, M.G., Kim, H.J., Kim, H.Y., Cho, J.Y., Chae, H.M., Park, Y.C., 2017. Molecular detection by analysis of the 16S rRNA gene of fecal coliform bacteria from the two Korean Apodemus species (Apodemus agrarius and Apeninsulae). Genet. Mol. Res. 16 (2). Rosenfeld, L.K., McGee, C.D., Robertson, G.L., Noble, M.A., Jones, B.H., 2006. Themporal and spatial variability of fecal indicator bacteria in the surf zone off Huntington Beach, CA. Mar. Environ. Res. 61 (5), 471–493. Sigua, G.C., Pascale Palhares, J.C., Kich, J.D., Mulinari, M.R., Mattei, R.M., Klein, J.B., Muller, S., Plieske, G., 2010. Microbiological quality assessment of watershed associated with animal-based agriculture in Santa Catarina, Brazil. Water Air Soil Pollut. 210, 307–316. Thoe, W., God, M., Griesbach, A., Grimmer, M., Taggart, M.L., Boehm, A.B., 2014. Predicting water quality at Santa Monica Beach: evaluation of five different models for public notification of unsafe swimming conditions. Water Res. 56, 105–117.
220