Predicting groundwater arsenic contamination: Regions at risk in highest populated state of India

Predicting groundwater arsenic contamination: Regions at risk in highest populated state of India

Water Research 159 (2019) 65e76 Contents lists available at ScienceDirect Water Research journal homepage: www.elsevier.com/locate/watres Predictin...

3MB Sizes 0 Downloads 40 Views

Water Research 159 (2019) 65e76

Contents lists available at ScienceDirect

Water Research journal homepage: www.elsevier.com/locate/watres

Predicting groundwater arsenic contamination: Regions at risk in highest populated state of India Sonal Bindal, Chander Kumar Singh* Analytical and Geochemistry Laboratory, Dept. of Energy and Environment, TERI School of Advanced Studies, New Delhi, India

a r t i c l e i n f o

a b s t r a c t

Article history: Received 29 November 2018 Received in revised form 20 April 2019 Accepted 28 April 2019 Available online 1 May 2019

Arsenic (As) contamination of groundwater is a public health concern, impacting the lives of approximately 100 million people in India. Chronic exposure to As significantly increases mortality due to the occurrence of several types of cancer, respiratory and cardiac diseases. Uttar Pradesh is a part of the middle Indo-Gangetic plains and has been found to be severely affected by As contamination of groundwater, as established by several small-scale studies. The current study incorporates a hybrid method based on a random forest ensemble algorithm and univariate feature selection using 1473 data points for predicting As in the region. Twenty direct/proxy predictor variables were considered to describe the geochemical environment, aquifer conditions and topography that are responsible for As enrichment in groundwater. The map of As predicted through the hybrid random forest ensemble model shows an overall accuracy of 84.67%. The hybrid random forest model performs better than the univariate, logistic, fuzzy, adaptive fuzzy and adaptive neuro fuzzy inference systems, which have been widely used for As prediction. The projected number of rural populations at risk due to high As exposure is 12% of the total population of the region, which accounts for 23.48 million people who are at risk. The predictive map provides insight for the regions where future testing campaigns and interventions for mitigation should be prioritized by policymakers. © 2019 Elsevier Ltd. All rights reserved.

Keywords: Arsenic India Prediction Regression Hybrid random forest model

1. Introduction Elevated arsenic (As) concentrations above the World Health Organization (WHO) permissible limit of 10 mg/L in groundwater pose a health threat to approximately 100 million people in India (Chakraborti et al., 2016, 2018; Bhowmick et al., 2018). Several countries, including Bangladesh (Yang et al., 2014), India (Mukherjee et al., 2009), China (Guo et al., 2014), Nepal (Pokhrel et al., 2009), Cambodia (Polya et al., 2010), Vietnam (Winkel et al., 2011), Myanmar (van Geen et al., 2014), Laos (Cho et al., 2011), Indonesia (Winkel et al., 2008a,b), and the USA (Gong et al., 2014), are severely affected by high As in groundwater. Chronic As exposure significantly increases mortality due to cardiovascular diseases (Argos et al., 2010), and prolonged As exposure can result in skin, liver, bladder, and lung cancer (Chen et al., 2011). As poisoning has also been linked to infant mortality, impaired

* Corresponding author.Analytical and Geochemistry Laboratory, TERI School of Advanced Studies, New Delhi, India. E-mail addresses: [email protected], [email protected] (C.K. Singh). https://doi.org/10.1016/j.watres.2019.04.054 0043-1354/© 2019 Elsevier Ltd. All rights reserved.

intellect and motor dysfunction in children (Bhowmick et al., 2018; Parvez et al., 2011; Rahman et al., 2010; Wasserman et al., 2004). As in groundwater is not only related to As-containing host minerals but also affected by host minerals solubility, redox conditions and pH. Studies have revealed that As-rich minerals are linked with the Quaternary deposits of alluvial sediments belonging to the Holocene age (Mukherjee et al., 2009; Shah, 2010). As contamination can occur due to the reductive dissolution of Asbearing minerals (Postma et al., 2016). These As-rich sediments are transported by rivers originating from the Himalayas and are deposited into downstream basins and deltaic areas. The organic matter buried along with the sediment is utilized by microbes for metabolic activities. The microbial reduction of iron (Fe) from Fe3þ to Fe2þ due to the consumption of oxygen bound to As-bearing Feoxyhydroxides results in the subsequent release of As (Drahota et al., 2013; Verma et al., 2016). As contamination can take place in reducing aquifer environments, in oxidizing environments with high pH (Nickson et al., 2005), with oxidative weathering of sulfide minerals and with geothermal activity. Soil texture also plays a significant role in providing the appropriate environment for As release into the groundwater (Hoque et al., 2009). However, there is

66

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

still a lack of understanding of the extent of the problem and a need for an elaborative investigation of the sources and exact geological conditions that lead to As contamination. Testing individual wells for As contamination is cumbersome and requires manpower and time with quality control, and detailed maps, which can help identify such high-risk regions, are still missing. Thus, these predictive maps could be very useful to help decision makers to focus their mitigation efforts on the areas that are most affected. Several studies use modeling to predict occurrences of As at global to regional scales (Berg et al., 2001, 2016; Bretzler et al., 2017; Winkel et al., 2008a,b; Yang et al., 2014). The modeling approach requires an understanding of the geochemical mechanism that assists the release of As along with the geological, topographical and environmental factors, and thus the models can be used as proxies to predict As-affected regions. Modeling-based prediction helps in identifying the areas for future testing so that interventions to provide safe drinking water can be prioritized. A few studies have used methods such as Thiessen polygon, inverse distance weighing (IDW) (Gong et al., 2014), global polynomial interpolation (Bhunia et al., 2016), kriging (Sovann and Polya, 2014) and cokriging (Gong et al., 2014) to predict the spatial variation of contaminants in groundwater at the local/ regional scale. Although these methods are simple and easy to use, substantial data points are required to produce outcomes, and they do not account for spatial dependency of the data or consider the factors or processes that might influence the occurrence of the contaminants. There is a paucity of data at a larger scale; thus, these models do not perform well. Thus, predictive models that consider the factors responsible for contamination are used to overcome the limitations of the models that use interpolation for prediction. Logistic regression models (LRM) have been commonly employed to predict the spatial distributions of As worldwide (Dummer et al., 2014). Several studies have used logistic regression (Lee et al., 2009; Zhang et al., 2012) to assess the likelihood of As contamination greater than the predefined limit of 10 mg/L by using limited As data points along with auxiliary independent variables, such as geology, topography, and soil properties. In 2013, Rodriguez-Lado et al. used proxies such as Holocene sediments, soil salinity, fine subsoil texture, topographic wetness index (TWI), density of rivers, slope, distance to rivers, and gravity anomaly, out of which Holocene sediments, soil salinity, subsoil texture, and TWI were found to be most significant, highlighting their relative importance in predicting the occurrence of groundwater As. A few studies used linear regression (LR) (Zhang et al., 2013), principal component regression (PCR) (Luo et al., 2012), Bayesian modeling (Cha et al., 2016) and artificial neural network (ANN)-based regression (Bonelli et al., 2017; Cho et al., 2011) for As prediction. Such models (Cha et al., 2016; Cho et al., 2011) have shown accuracies varying between 60 and 70%. To understand such complex relationships and the magnitude of the problem, machine learning models have been found to be much more accurate. Machine learning models (e.g., random forests and neural networks) show higher prediction accuracy than LRM due to their strength in modeling complex relationships between response and predictor variables (Tesoriero et al., 2017). The machine learning algorithms develop sophisticated model subunits for capturing relationships that are otherwise too complex to specify in parametric models. However, no such model has been developed either at the regional or national scale in India, even though new regions have been added to existing high-As risk areas, affecting millions of people in India. A novel hybrid random forest model has been used to predict the regions in Uttar Pradesh at risk due to As contamination; in this model, the outcomes from stepwise regression have been incorporated in the random forest model. The predictive abilities of

widely used models such as univariate, LRM, fuzzy, adaptive fuzzy regression (AFR) and adaptive neuro fuzzy inference system (ANFIS) were compared with that of a hybrid random forest model. The population at risk was estimated using a linear function relationship between predicted As and the population density of Uttar Pradesh. 2. Materials and methods 2.1. Groundwater As data A total of 1680 As data points were collected for this study. A total of 728 data points were collected from randomly tested household handpumps and the remaining 952 were collected from the literature (CGWB, 2014; Chauhan et al., 2009; Mehrotra et al., 2014; Raju, 2011; Shah, 2008; Shah, 2010; Shah, 2013; Shah, 2017). The geotagged groundwater As data were collected from 728 domestic and community handpumps with well depths of 3e122 m using ITS Econo Quick test kits. The kit results were validated through laboratory measurements, which were found to be comparable (George et al., 2012a,b; van Geen et al., 2014; van Geen et al., 2018), with slight overestimation at high As concentrations. The kit correctly categorized ~90% of wells (George et al., 2012a,b; van Geen et al., 2014) as per the WHO guideline of 10 mg/L for As, which is also the limit of the Bureau of Indian standards, which was recently lowered from 50 mg/L (BIS, 2012). The data points from the literature have limit of detection (LOD) values ranging from 0.0001 to 0.0003 mg/L, whereas the LOD value of the kit is 0.01 mg/L. The As data required for the modeling were classified as 10 mg/L and >10 mg/L for model training, testing and validation (Rodriguez-Lado et al., 2013; Ayotte et al., 2016; Bretzler et al., 2017; Podgorski et al., 2017). The As concentration is spatially variable and relatively stable temporally (van Geen et al., 2013). Randomly distributed data doesn't impact the regression rather the randomly distributed independent variable is an advantage, to model the prediction probability, because the selected variables are used based on initial regression to model the prediction. The prediction studies performed by Ayotte et al. (2016); Winkel et al., 2008a,b; Rodriguez-Lado et al., 2013; Podgorski et al. (2017); Bretzler et al. (2017) have all used randomly collected As datapoints. The As datapoints were placed over a grid size of 100 m  100 m, comparable to highest spatial resolution of proxy variable used in the study (i.e., geology). Further, some of the datapoints were aggregated together by using the geometric mean to a resolution of one point per grid (Bretzler et al., 2017), thus reducing the total data points of 1680 to 1473. Of these 1473 data points, 804 were equal (n ¼ 469) or greater than (n ¼ 335) 10 mg/L (54.5%) and 669 (45.41%) were less than 10 mg/L (Fig. 1). The dataset was split randomly into training (80%) and testing (20%) comprising of 1178 and 295 data points. The aggregated data points were binary coded such that there are only two possibilities, either contaminated (As 10 mg/L i.e. 1) or uncontaminated (As < 10 mg/L, i.e. 0). 2.2. Independent predictor variables Based on the findings of previous investigations (Berg et al., 2016; Podgorski et al., 2017; Rodríguez-Lado et al., 2013; Winkel et al., 2008a,b), twenty independent proxy variables were considered for use in the model (Table 1). The datasets available for extraction of these independent variables were collected for this study from several sources, as shown in Table 1. The proxy variables used in the modeling are directly or indirectly associated with the occurrence and enrichment of As in groundwater. Some of the independent variables, such as the redox

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

67

Fig. 1. As concentrations (n ¼ 1473) measured in the groundwater of Uttar Pradesh shown by blue (As 10 mg/L), green (>10  50 mg/L) and red (>50 mg/L) circles overlaid on topographical map depicting the Ganga River and its major tributaries. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Table 1 List of twenty independent variables used in the calibration and testing of the prediction models. Variables

Unit

Resolution

Raster format

Geology Slope Topsoil clay content Topsoil silt content Topsoil sand content Subsoil clay content Subsoil silt content Subsoil sand content Topsoil pH Subsoil pH Topsoil organic content Subsoil organic carbon Fluvisols Evapotranspiration (ET)/Precipitation (P) LULC Drainage Topographic Wetness Index (TWI) Distance from River Groundwater level data Grace data

Geological successions Decimal degrees % % % % % %

1:50000 30 arc seconds 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 5 arc minutes 30 arc seconds 500 m 30 arc seconds 1: 500000 30 arc seconds 1: 500000 ground level data 50 km

Classified Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Classified Continuous Continuous Continuous Continuous Continuous

% % % mm year-1 Meters Meters Meters Meters kg/m3

68

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

Table 2 Sources of the database related to variables used in the process of model delineation. Database

Organization

Website

Geology Soil data Groundwater level data Grace data Digital Elevation Model (DEM) Land use Land cover (LULC) Evapotranspiration Precipitation Temperature Population data

Geological survey of India (GSI) Soil grids CGWB reports and India WRIS portal Nasa website United State Geological Survey (USGS) Landsat data (30m) WorldClim WorldClim WorldClim Census of India

http://bhukosh.gsi.gov.in/Bhukosh/Public https://soilgrids.org/ http://www.india-wris.nrsc.gov.in/ https://grace.jpl.nasa.gov/data/get-data/ https://earthexplorer.usgs.gov/ https://earthexplorer.usgs.gov/ http://www.worldclim.org/ http://www.worldclim.org/ http://www.worldclim.org/ http://census2011.gov/

state of the aquifer and well depth, could not be directly considered due to the unavailability of the data; therefore, soil organic carbon and geological sequence of Holocene and Pleistocene sediments were used as proxies in the model. These independent predictor variables can be classified as geological, soil, climatic and anthropogenic factors that are associated with the occurrence and enrichment of As in groundwater (Table 1). These variables were extracted either by digitizing the scanned datasheets (Quaternary geological maps) or by acquiring them from available online sources, as mentioned in Table 2. The climatic variables, such as evapotranspiration, precipitation and temperature, were downloaded in the form of grids from the WorldClim database. The groundwater-level data were derived using an algorithm on the Gravity Recovery and Climate Experiment (GRACE) satellite data (Table 2). Due to differences in resolution, data format and projections, the variables were converted to raster format at 100-m spatial resolution to maintain uniformity among datasets. Furthermore, all the datasets were reprojected to the Universe Transverse Mercator (UTM) coordinate system with the WGS projection for the northern hemisphere zone 44 to maintain the consistency and spatial integrity of the data. All maps were prepared using ArcGIS 10.3.1 software. Prior to conducting this study, the literature related to As contamination was extensively reviewed. Subsequently, twenty independent (direct and proxy) variables were gathered to understand the processes leading to As contamination. Furthermore, significant variables among these twenty variables (Table 3) were narrowed down using univariate feature selection. Univariate feature selection builds a predictive model for the response variable using each individual variable and measures the performance of each model. In fact, it is similar to Pearson's correlation coefficient since it is equivalent to the standardized regression coefficient that is used for prediction in linear regression. If the relationship between a dependent and an independent variable is nonlinear, then random forests are used to avoid overfitting. Following this, the independent variables can either be added or subtracted from the set of exploratory variables based upon the criterion set by sequential output obtained through the “f-test” or “t-test”. The

variables based on univariate feature selection can introduce bias in the parameter estimates, which are further from 0 specifically in complex prediction models (Harrell, 2001, 2015). To avoid this, we have considered the few variables that are important during initial univariate feature selection. Based upon the findings from the model, a total of eight variables were found to be significant with a confidence level of 95% (p < 0.05). Consequently, these eight independent variables were used to construct the hybrid random forest ensemble model (Table 3). (a) Geology. Geologically, the Gangetic plains has been classified into three major classes (Fig. 2 a): (i) Upper Siwalik (Upper Pliocene to Lower Pleistocene), which includes lower, middle and upper groups in the northern part of the Siwaliks (Middle Miocene to Early Pleistocene); (ii) Older Alluvium (Middle to Upper Pleistocene); and (iii) Newer Alluvium (Upper Pleistocene to Recent Holocene) in order of successions (Shah, 2008). The study area comprises thick layers of Quaternary sediments containing multiple layers of aquifers. The shallow groundwater resides in the unconsolidated alluvial sediments in the zone of saturation at 0e100 m depth and are rich in minerals including As (Hoque et al., 2011; Shah, 2010). Geological quadrangle maps at a scale of 1:50,000 were obtained from the Geological Survey of India (GSI), digitized (Table 1) and reclassified into eight classes, Archean-Proterozoic, Carboniferous, Holocene, MesoProterozoic, Mid Miocene to Pliocene, Neo-Proterozoic, PaleoProterozoic and Pleistocene, based on geological succession (Fig. 2a). (b) Soil. Subsoil clay, silt, sand fraction and organic content were extracted from a digital soil grids database (Fig. 2b, c, 2d, and 2e). The data were further downscaled by using a resampling mean method from 250-m to 100-m resolution to avoid loss of data. The data were extracted for the subsoil at the depth of 30e200 cm for the percentage of sand, silt, clay and organic content of the soil (Table 1). (c) Groundwater level data (GWL). The groundwater level data at 1350 locations for the years 1996e2016 were acquired from the Central Ground Water Board (Fig. 2f; Table 1). The

Table 3 List of significant independent variables. Variables

Regression coefficients (l)

Standard error

Wald-odds value

p-value

Geology Subsoil Clay fraction Groundwater level (GWL) Subsoil Organic content Fluvisols Land use/Land cover (LULC) Subsoil Silt fraction Subsoil Sand fraction Constant

4.71 2.23 2.12 1.41 1.12 0.74 1.75 2.34 0.47

0.00471 0.05129 0.05088 0.01410 0.01456 0.00296 0.02451 0.05616 0.01363

7.23 3.78 1.24 10.41 2.12 5.14 0.85 0.45 0.57

0.021 0.005 0.034 0.002 0.012 0.027 0.024 0.014 0.548

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

69

Fig. 2. Independent variables used for As prediction in the groundwater of Uttar Pradesh, (a) Geology (b) Subsoil Clay Fraction (c) Subsoil Silt Fraction (d) Subsoil Sand Fraction (e) Subsoil Organic Content (f) Groundwater Level (g) Fluvisols (h) LULC.

70

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

groundwater level is recorded quarterly in India during the months of January, April, August and November. The annual average was calculated, and a continuous raster surface was generated using the inverse distance weighing (IDW) interpolation method. (e) Fluvisols. Fluvisols are genetically young soils in alluvial deposits that are rich in iron-bearing minerals (FAO, 1999). These types of soils are found to be associated with cooccurrences of As (Stuckey et al., 2016). They are supposed to provide a similar environment as Holocene sediments (Podgorski et al., 2017). The dataset for the fluvisol soils (Table 1) was downscaled to 100 m (Fig. 2g). (f) LULC. Land use/land cover (LULC) data were extracted using Landsat 8 satellite imagery for the year 2017 (Table 1) by performing an ISODATA unsupervised classification algorithm in Erdas Imagine 14. The output consisted of 8 major LULC classes present in the region (Fig. 2h).

2.3. Development of hybrid random forest model for As prediction Random forest is a machine learning algorithm based on the generation of many decision trees and their assemblage to produce a final output (Breiman, 2001). Each output from the decision tree is dependent on the values of a random vector that is sampled independently from the same distribution of all decision trees generated in the forest. The number of predictors used to find the best split at each node is randomly chosen from a subset of all the predictors. Among the ensemble classifiers, random forest best accounts for the nature of different input predictors efficiently. Random forest is also insensitive to noise, outliers and overtraining, which reduces the chances of bias in the data. It is computationally less intensive than other boosting-based ensemble and simple bagging methods (Breiman, 2001). This model is used where complex processes act as controlling factors for the occurrence or nonoccurrence of any dependent variable and thus helps in identifying the controlling factors, sources and processes. To implement the model, the number of decision trees in the ensemble was generated at each node. The classification model repeatedly split the dependent variable (As) over independent variables, resulting in the highest variance of the dependent variable (Breiman, 2001). The number of bins was determined using Sturges’ formula, and the bin width was varied so each bin contained the same number of members. The data were consolidated to focus on the fundamental criterion of whether the concentration of As poses a health hazard. In addition, a univariate logistic regression was run with each variable, and the significance of the coefficient was assessed through its p value, as shown in Table 3. The numbers of bins were calculated to categorize a continuous numerical variable, which was found to be 10. Since classification accuracy is more sensitive to independent variables, the number of decision tree was fixed at a default value of 1500, and independent variables were tested for 10 values. This parameter-tuning process was iterated 4 times and repeated for a 20-fold cross validation process (Breiman, 2001) to reach the maximum configuration of the model output. The modeling was performed with training and then testing datasets. Random forest was applied using the raster and caret package within R (ver.3.5.1) open-source statistical software (Team, R, 2016). 2.4. Comparison with other predictive models In this study, we evaluated the performance of hybrid random forest model with univariate, logistic regression mode (LRM), fuzzy, adaptive fuzzy regression (AFR) and adaptive neuro fuzzy inference

system (ANFIS). The model comparison would indicate the best fit model that can be used for predictive modeling with best outputs in the region. The univariate model is used to understand the simple relationship between dependent and independent variables. However, this only takes direct association or disassociation into account, which makes it less comprehensive in nature. The LRM involves categorization of the dependent variable, which can either be above or below a threshold value; however, there is a significant disadvantage for using it with continuous variables (Dummer et al., 2014; Zhang et al., 2012). Furthermore, LRM results could be difficult to accurately decipher the solid relationship among the different independent variables. Fuzzy regression models can only address uncertainty in the rules individually (Rodriguez-Lado et al., 2013) and lack the capability for learning and adapting to a new set of input and output datasets. Therefore, the AFR takes into account both uncertainty in the rules and capability for learning and adapting in the processes. However, ANFIS was found to work better than the previous methods, as it was a powerful means to solve the complex and nonlinear relations using either binary or categorical variables (Tesoriero et al., 2017). To resolve the complexity among the dependent and independent datasets, the hybrid random forest model was used in this study. The hybrid random forest model takes account of the relationships and can directly incorporate the potentially significant binary or categorical or continuous factors (Breiman, 2001). All other modeling procedures, including the retaining or discarding of models, are based on the Hosmer-Lemeshow goodness-of-fit test and the weighting and averaging of coefficients.

2.5. Accuracy assessment and sensitivity analysis The diagnostic statistics used for assessing the accuracy of the interpolation were root mean square error (RMSE) and standardized root mean square (RMS), given in eq. i and ii,

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X  2 RMSE ¼ t  zi;est z n i¼1 i;act vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X 1  2 RMSE Standardized RMS ¼ t zi;act  zi;est ¼ 2 n i¼1 s s

(i)

(ii)

where zi;act is the actual values for As, zi;est is the estimated or predicted values for As, s is the standard deviation and s2 is the variance among the datasets for n number of data points. The method yielding smaller RMSE and a standardized RMS value close to 1 is an optimal method (Winkel et al., 2008a,b). The validation process is different from cross-validation since it depends on the subset of the dataset. The dataset was divided into two groups: one for model development as a training dataset and the other for testing the accuracy using a testing dataset. The model has been validated using training and testing datasets (Bretzler et al., 2017).

2.6. Sensitivity analysis Sensitivity analysis was performed to test the performance of the models. In the forward-selection stepwise technique, the incremental change in the R2 values is an indication of the significance of sensitivity of the output to each newly introduced predictor variable. The performance of a prediction model was determined by its true positive rate (sensitivity) and true negative rate (specificity).

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

Sensitivity ¼

Specificity ¼

TP *100 TP þ FN TN *100 TN þ FP

Overall accuracy ¼

TP þ TN *100 TP þ FP þ FN þ TN

(iii)

(iv)

(v)

Sensitivity measures the model's ability to correctly classify groundwater samples with As 10 mg/L (true positive-TP), whereas specificity measures the model's ability to correctly classify samples with As 10 mg/L (true negative-TN) (using eq. iii and iv). In general, a true positive (TP) value is an outcome when the model accurately predicts the positive class. Correspondingly, a true negative (TN) value is an outcome when the model accurately predicts the negative class, and a false positive (FP) value is a result when the model inaccurately predicts the positive class. A false negative (FN) is a result when the model inaccurately predicts the negative class value. The classification results of the As prediction model under different probability cutoff values mostly show that sensitivity is inversely related to specificity (Berg et al., 2001). A plot of sensitivity against specificity for all the outputs of model cutoff values between 0 and 1 draws a receiver operating characteristic (ROC) curve. The area under the curve (AUC) value must be above 0.5 (no predictive capability) and 1 (best predictive capability) (Tesoriero et al., 2017). 3. Results and discussion 3.1. Selection of model proxies The highest positive weighing coefficient (Table 3) for geology (l ¼ 4.71) indicates that geology plays a major role in As enrichment in aquifers. Shallow aquifers with Holocene deposits in regions with low slope gradients are found to have high As (Shamsudduha et al., 2008). The Holocene sedimentary deposits (l ¼ 5.12) provide a chemically reducing environment in the floodplains of rivers, thus enriching aquifers with As (McArthur et al., 2004, 2008). The Holocene organic-rich alluvial deposits have been found to be statistically significant in other studies (Lado et al., 2008; Winkel et al., 2011). Furthermore, geochemical analytical studies have found that the sediments belonging to newer alluvium are high in concentrations of As and Fe simultaneously (Kirchner et al., 1998; Saha and Sahu, 2016). Moreover, Xray diffraction (XRD) of newer and older alluvium sediments reveal mineral assemblages of quartz, chlorite, muscovite, montmorillonite, kaolinite and feldspar (Shah, 2010). However, newer alluvium minerals have amphibole and goethite minerals, which are rich in Fe (Shah, 2010). Therefore, handpumps installed over newer alluvial deposits are more prone to As contamination. Minerals rich in Fe are found to adsorb As onto their surfaces and release it into groundwater under different geochemical environments (Naseem et al., 2018). The fluvisols are soils that are genetically rich in iron minerals and depict an alluvial setting. Under the influence of reducing conditions, the As adsorbed on the iron oxyhydroxide (FeOOH) surfaces is released in groundwater. The fluvisols show a high weighing coefficient of l ¼ 1.12 (Table 3), and the studies in the middle Gangetic plains show a strong correlation between As and Fe (Berg et al., 2001; McArthur et al., 2008). The reducing conditions in the groundwater are initiated by the microbially mediated decay of dissolved organic carbon (DOC) (Harvey et al., 2002) and the mobilization of As by humic and fluvic acids, which plays an important role in mineral degradation

71

and metal mobilization of FeOOH. Experimental studies by Sharma et al. (2010) verified that both fluvic acids and humic acids adhere strongly to metal oxides and clay minerals, which displaces As from metal oxide and mineral surfaces (Sharma et al., 2010). Therefore, DOC, which comprises the organic matter fraction of subsoil with l ¼ 1.41 (Table 3), is embedded in the sediments, and past drainage in the region plays a vital role in As release by generating favorable redox conditions (Mazumder et al., 2016). Further, experimental evidence (McArthur et al., 2001) in the Bengal basin suggests that aquifers are separated from overlying silt (l ¼ 1.75) soils. These aquifers are separated by underlying paleosol formations (Hoque et al., 2012; McArthur et al., 2011), which are rich in clay content (l ¼ 2.23) and a source of organic carbon. These conditions initiate reducing conditions in the aquifer, which are favorable for As enrichment in groundwater. This As-rich groundwater interacts with gray sands (l ¼ -2.34), which have already undergone reduction (Table 1). The As-containing groundwater supplies organic carbon through the clay layer. The downward movement of this Asrich water leads to further contamination of deep aquifers (McArthur et al., 2001). Silt was found to be inversely associated with As contamination (l ¼ 1.75). A higher percentage of silt represents the transport and deposition of fresh materials in the river basin (Ahmed et al., 2004). Silt could also be produced by mechanical weathering of soils as it is highly reactive. The silt has active silicate sites, and As readily adheres to these sites, thereby decreasing its concentration in groundwater. Therefore, silt can provide active sites for adsorption of As species (Chakraborti et al., 2002). The results also show that the areas closer to the rivers are more susceptible to contamination with As. Thus, the subsoil composition, with different fractions of sand, silt and clay, plays a vital role in the enrichment of As in aquifers. Studies suggest that increased flooded irrigation leads to As contamination (Bhattacharya et al., 2006; Harvey et al., 2002; Nickson et al., 2005). There has been an increase in groundwater use for irrigation from 54.31% to 72.16% from 2000 to 2009 (GWP, 2013). Rice, wheat, bajra, barley, and maize crops in the state of Uttar Pradesh have dominantly been irrigated. Roychowdhury et al., (2005) found that surface soils have lower As levels than subsurface soils. LULC with a weighing factor of l ¼ 0.74 was found to be a significant proxy variable. Within the subclasses for LULC, cropland (l ¼ 0.67) was found to be the most significant contributing factor. These are in agreement with most of the variables found to be significant with other prediction models (Berg et al., 2016). Groundwater from private and government wells is used to irrigate 101.61 lakh hectares, which comprises 73.58% of the total cultivated area of the state (Roy and Ahmad, 2015). This is dominantly planted with wheat (37.90%), rice (22.83%), millet (3.57%), maize (2.68%), sorghum (0.69%), and barley (0.64%) (Roy and Ahmad, 2015). Excessive cultivation of rice acts as a barrier to the inflow of oxygen into the subsurface aquifer due to flooded irrigation, a dominant practice in the region, and provides a favorable reducing environment, resulting in As release (van Geen et al., 2013). The over-extraction of groundwater for irrigation purposes has disturbed the hydrological balance of the aquifers, resulting in fluctuating groundwater levels (Rasool et al., 2016). The groundwater level in the region (l ¼ 2.12) suggests similar exploitation trends with significant (Table 3) p values (<0.034). The groundwater fluctuation results in subsequent changes in the geochemical environment of the aquifer. The groundwater level initiates the reduction of sulfide-rich minerals mediated by sulfate-reducing bacteria (Harvey et al., 2005). Depending on this biogeochemical behavior and changes in the redox conditions of aquifers, the solubility of As is affected, resulting in As contamination of the groundwater (Mukherjee et al., 2006). Irrigation might result in higher evapotranspiration and slow infiltration through young

72

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

alluvial sediments, thus increasing the groundwater As concentration (Podgorski et al., 2017). 3.2. Hybrid random forest model Eight significant (p < 0.05) independent variables (Table 3) were used in the hybrid random forest model to predict As contamination in the groundwater of Uttar Pradesh (Fig. 3a). Fig. 3b depicts the binary map of As prediction, which shows areas at low (<10 mg/ L) and high risk (>10 mg/L). The hybrid random forest model prediction is in agreement with the existing spatial distribution of groundwater As contamination. Of the 1473 binary recoded As concentration data points greater than or equal to 10 mg/L, 85% are correctly predicted by the hybrid random forest model (Fig. 3b). More precisely, 73% of the groundwater samples with As concentrations greater than or equal to 10 mg/L are correctly predicted (sensitivity), whereas 97% of the groundwater samples with As concentrations less than 10 mg/L are appropriately predicted (specificity). The performance of the hybrid random forest model was evaluated using ROC and AUC. AUC is a measure of the model performance, with values typically varying between 0.5 and 1 (Naghibi et al., 2016). The AUC value for the hybrid random forest model is 0.755 (Fig. 4a), with an overall accuracy of 84.6% along with RMS and RMSE values of 0.014 and 0.006 (Table 4). A cutoff value of 0.61 was used based on the intersection of specificity and sensitivity so that predictive accuracy is not biased towards either high or low As values (Fig. 4b). Similar cut-off values are obtained for the testing and overall datasets. The AUC value for the test dataset is 0.74, with an overall accuracy of 86.29% (Table 5). The specificity of the test dataset (95.6%) is similar to that of the training dataset (95.7%). The true negative rate indicates the ability of the model to correctly predict the little-contaminated areas and considers the robustness of the model. The high true positive rate indicates that the model is more sensitive to delineating the unsafe areas. The overall accuracy of the hybrid random forest model for the total data points (n ¼ 1473) together with the training (80%) and testing (20%) datasets (Fig. 4a) is 84.67%, which is calculated using eq. (v). The findings are also evident from the results of Ravenscroft et al., 2009). Moreover, the AUC values for the univariate, LRM, fuzzy, AFR and ANFIS models are found to be 0.49, 0.54, 0.45, 0.59, and 0.71, respectively, with an overall prediction accuracy of 40.6%e67.3%, as

shown in Table 5. The fuzzy model performance was least (40%) as fuzzy alone is very unstable and very sensitive to resampling (Maiti and Tiwari, 2014). The performance of LRM was found to be 53.9%. The input dilutes the variability of independent variables. ANFIS (67.3%) is preferred over AFR (56.9%) due to its greater robustness and flexibility, which caters to the complexity of hydrogeochemical conditions. Thus, the hybrid random forest model with an accuracy of 84.67% performs much better than the models used in other prediction studies (Cao et al., 2018; Rodríguez-Lado et al., 2013; Winkel et al., 2008a,b; Zhang et al., 2012). The ensemble hybrid random forest model involves an unlimited number of decision tree formations along with the flexibility to incorporate expert knowledge of the aquifer environments and generates robust output with high accuracy. 3.3. Prediction map of As The hybrid random forest ensemble model identifies seven districts of Uttar Pradesh, namely, Ballia (Chauhan et al., 2009), Gorakhpur (Singh et al., 2018), Ghazipur (Kumar et al., 2010), Gonda (CGWB, 2014), Faizabad, Barabanki and Lakhimpur Kheri, as highrisk regions with prediction probabilities of 0.8e1.0 (Fig. 5a). The majority of these areas lie in the floodplains of the Ganga, Rapti and Ghaghra Rivers (Fig. 1). Studies involving geochemical analysis have found that groundwater in these areas is contaminated with high As. These soils are rich in organic content that seeps into groundwater, creating reducing conditions and thus impacting the solubility of As-bearing minerals. The organic content that seeps down is used in microbial reduction, which leads to the reduction of Asbearing iron oxyhydroxides, resulting in the subsequent release of As. These findings are in accordance with previous geochemical studies that confirmed the presence of high As in groundwater in these districts (Ahamed et al., 2006; Chauhan et al., 2009; Singh et al., 2018). Nevertheless, this result suggests that the districts of Barabanki and Gonda are high-risk regions and require blanket testing to confirm As contamination of groundwater. The districts of Shahjahanpur, Unnao (Chauhan et al., 2012), Chanduali, Varanasi (CGWB, 2014), Pratapgarh, Kushinagar, Mau, Balrampur, Deoria and Siddharth Nagar were found to be under moderate risk of As contamination, with prediction probabilities of 0.6e0.8 (Fig. 5a); however, no exhaustive study has been performed in these moderately predicted high-risk areas except Unnao and Varanasi.

Fig. 3. (a) As prediction map for the state of Uttar Pradesh overlaid for 10 classes with As data points (n ¼ 1473) by classifying them below and above the WHO guidelines of 10 mg/L (b) Binary prediction map for differentiating the safe and unsafe regions in the study area.

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

73

Fig. 4. (a) Statistics of the prediction strength of the hybrid random forest results using the threshold of 10 mg/L applied to test (n ¼ 1178), training (n ¼ 295) and total (n ¼ 1473) As dataset along with AUC value of 0.74, 0.78 and 0.76 indicating the discriminative power of the model. (b) Depicts the performance of the model at 0e100% probability along with sensitivity and specificity.

Table 4 Results of the model showing the efficiency, RMS and RMSE values. Model name

Data

Efficiency R2

RMSE

RMS

Univariate Logistic Fuzzy AFR ANFIS Hybrid Random Forest

100m*100m 100m*100m 100m*100m 100m*100m 100m*100m 100m*100m

0.49 0.51 0.48 0.55 0.65 0.71

0.02 0.15 0.10 0.02 0.01 0.01

0.07 0.15 0.47 0.23 0.02 0.01

Fig. 5a also highlights nine districts, Baghpat, Meerut, Ghaziabad, Mathura, Agra, Firozabad, Jhansi, Fatehpur, and Banda, with probabilities of 0.4e0.6 for predicted As. Furthermore, 28 districts with low risk are also demarcated, with probabilities lying within 0.1e0.4. The areas underlain by Pleistocene deposits were found to be safe, with low risk probabilities of 0e0.1. The Pleistocene older alluvium is characterized by sediments of yellow to brown color having fewer reduced iron Fe(II) concretions that do not provide As adsorption sites, which results in As-safe groundwater. The remaining 16 districts are at no risk of As contamination.

moderate, low and no risk zones indicating a probability of As between 80 and 100%, 60e80%, 10e60%, and <10%. In addition, there are regions of predicted high As concentrations that are below the statistically determined probability cutoff of 61% and have been used to calculate high-risk regions, which would estimate population density more accurately for densely populated areas. Uttar Pradesh is the most densely populated state of India, with a total population of 199 million (Census of India, 2011). Out of the total population of Uttar Pradesh, 155 million people (77.73%) live in rural areas (MDWS Report, 2011) without access to piped water and rely mostly on groundwater for their drinking and domestic needs. The population density of Uttar Pradesh varies from 242 to 3917 people/km2 (Census of India, 2011). Accordingly, 23.48 million people in rural areas are exposed to high As since they are dependent on groundwater for drinking, cooking and irrigation (Fig. 5b). This number has been calculated using the following equation (vi), which signifies a linear function relationship between predicted As probabilities and population density to identify the actual population exposed to high As:

Population exposed ¼ Rural population of

3.4. Population affected in high-risk regions The probability map of Fig. 5a shows the severity of As contamination in the groundwater of Uttar Pradesh. The probabilities are grouped to prepare an arsenic hazard map with high,

area  predicted probability of As (vi) The risk map indicates the need for widespread testing of wells

Table 5 AUC, Sensitivity, Specificity values for univariate, LRM, fuzzy, AFR, ANFIS and Hybrid Random Forest Model. Model name

Spatial resolution

AUC value

Sensitivity

Specificity

Overall prediction accuracy

Standard error

Univariate Logistic Fuzzy AFR ANFIS Random Forest Hybrid Model Random Forest Hybrid Model (Test) Random Forest Hybrid Model (Training)

100m*100m 100m*100m 100m*100m 100m*100m 100m*100m 100m*100m 100m*100m 100m*100m

0.501 0.542 0.453 0.592 0.714 0.762 0.744 0.783

0.468 0.503 0.423 0.541 0.578 0.732 0.685 0.765

0.342 0.572 0.388 0.603 0.771 0.972 0.962 0.964

0.412 0.542 0.414 0.571 0.672 0.845 0.824 0.863

0.030 0.010 0.050 0.004 0.003 0.001 0.002 0.001

74

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

Fig. 5. (a) Prediction probabilities of As overlaid with district to create the hazard class as high, moderate, low and no risk classes in the State of Uttar Pradesh. (b) Population exposed to As predicted using hybrid random forest model for the state of Uttar Pradesh using population density data of the year 2011.

in the regions of Uttar Pradesh to help reduce the long-term exposure of the people residing in populated clusters with 180e2271 people/km2. Within these identified regions, districts such as Ballia, Varanasi, Gazipur, Gorakhpur, Faizabad, and Deoria are evidently experiencing a public health crisis due to As exposure. The map shown in Fig. 5b also indicates that out of a total of 72 districts, 40 are exposed to high As in groundwater, especially districts located in the northeastern parts of the state. The present findings confirm the widespread As contamination in Uttar Pradesh. 4. Policy implications- well switching The result of the study subsumes an elaborative demarcation for high and low As risk areas in the State of Uttar Pradesh. The outputs from hybrid random forest model highlights the high-risk areas where As testing can be prioritized by the government or nongovernmental organizations. Thus, the prediction maps can be used as a guide for government and policymakers to downscale their sites of action and provide interventions in the affected regions. A blanket testing in the affected areas using As field kits, which are reliable to demarcate safe and unsafe wells would be first step to reduce the exposure (Nickson et al., 2007). On a long-term basis, use of filters, other arsenic removal techniques and piped water supply are the most viable solution. The costs and logistic of treating and supplying water are considerably higher and most of the times have been prohibitive due to logistics, its operation and maintenance issues. Well switching has been found to be a significant short-term mitigation option in As affected areas. However, the viability of well switching is based on blanket testing of well in these regions. In Bihar, Barnwal et al. (2017) have tried to address this problem by demonstrating the ability of safe well sharing among neighbors (Barnwal et al., 2017). This is an example of indulging handpump owners into a social network leading to safe drinking water. Regular interventions after baseline survey resulted into 30.5% higher switching to a safer well. A similar pattern of switching results of 26e41% was obtained in Bangladesh (George et al., 2012a,b) stressing community participation. The approach is very reasonable and economical, however, the neighbor with a safe well needs to be compensated in some economical and logistical way for the sharing of well.

5. Conclusion The study highlights the threat of arsenic in groundwater of the most populous state of India in the Indo-Gangetic basin.  The As hazard map can be used as a baseline to identify the regions where the targeted blanket testing of handpumps and mitigation measures are urgently required. Blanket testing of wells in high-risk regions would inform households about safe and unsafe wells/handpumps. Once safe and unsafe wells are identified, people can be informed to switch to safe wells for drinking and cooking purposes. However, if higher percentages of wells are unsafe, then options for centralized treatment such as reverse osmosis systems or deep well installation can be explored.  Spatial variation of arsenic is very high even within a single village because of highly heterogeneous aquifers in the region. Since this prediction is not able to account for small-scale variations, specific regions could be called upon to reduce the poisoning of the population.  Similar approaches for As prediction modeling with high accuracy can be emulated at the national scale to identify the regions at risk with higher accuracy. A hazard map at the national scale in India can be considered to create awareness in high-risk regions and accordingly plan mitigation strategies.

Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement The authors would like to acknowledge the fellowship provided by University Grant Commission to conduct this research. We also acknowledge the discussion held with Dr. Neeti on statistical analysis.

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

References Ahamed, S., Sengupta, M.K., Mukherjee, A., Hossain, M.A., Das, B., Nayak, B., Pal, A., Mukherjee, S.C., Pati, S., Dutta, R.N., Chatterjee, G., 2006. Arsenic groundwater contamination and its health effects in the state of Uttar Pradesh (UP) in upper and middle Ganga plain, India: a severe danger. Sci. Total Environ. 370 (2e3), 310e322. https://doi.org/10.1016/j.scitotenv.2006.06.015. Ahmed, K.M., Bhattacharya, P., Hasan, M.A., Akhter, S.H., Alam, S.M.M., Bhuyian, M.A.H., Imam, M.B., Khan, A.A., Sracek, O., 2004. Arsenic enrichment in groundwater of the alluvial aquifers in Bangladesh: an overview. Appl. Geochem. 19 (2), 181e200. https://doi.org/10.1016/j.apgeochem.2003.09.006. Argos, M., Kalra, T., Rathouz, P.J., Chen, Y., Pierce, B., Parvez, F., Islam, T., Ahmed, A., 2010. Arsenic exposure from drinking water, and all-cause and chronic-disease mortalities in Bangladesh (heals): a prospective cohort study. Lancet 376 (9737), 252e258. https://doi.org/10.1016/S0140-6736(10)60481-3. Ayotte, J.D., Nolan, B.T., Gronberg, J.A., 2016. Predicting arsenic in drinking water wells of the Central Valley, California. Environ. Sci. Technol. 50 (14), 7555e7563. https://doi.org/10.1021/acs.est.6b01914. Barnwal, P., van Geen, A., van der Goltz, J., Singh, C.K., 2017. Demand for environmental quality information and household response: evidence from well-water arsenic testing. J. Environ. Econ. Manag. 86, 160e192. https://doi.org/10.1016/j. jeem.2017.08.002. Berg, M., Tran, H.C., Nguyen, T.C., Pham, H.V., Schertenleib, R., Giger, W., 2001. Arsenic contamination of groundwater and drinking water in Vietnam: a human health threat. Environ. Sci. Technol. 35 (13), 2621e2626. https://doi.org/ 10.1021/es010027y. Supporting Information. Berg, M., Winkel, L.H.E., Amini, M., Rodriguez-Lado, L., Hug, S.J., Podgorski, J., Bretzler, A., de Meyer, C., Trang, P.T.K., Lan, V.M., Viet, P.H., June 2016. Regional to sub-continental prediction modeling of groundwater arsenic contamination. et al.. In: Arsenic Research and Global Sustainability: Proceedings of the Sixth International Congress on Arsenic in the Environment (As 2016). CRC Press, Stockholm, Sweden (p. 21). Bhattacharya, P., Claesson, M., Bundschuh, J., Sracek, O., Fagerberg, J., Jacks, G., Martin, R. a, Storniolo, A.D.R., Thir, J.M., 2006. Distribution and mobility of arsenic in the río dulce alluvial aquifers in santiago del estero province, Argentina. Sci. Total Environ. 358 (1e3), 97e120. https://doi.org/10.1016/j. scitotenv.2005.04.048. Bhowmick, S., Pramanik, S., Singh, P., Mondal, P., Chatterjee, D., Nriagu, J., 2018. Arsenic in groundwater of West Bengal, India: a review of human health risks and assessment of possible intervention options. Sci. Total Environ. 612, 148e169. https://doi.org/10.1016/j.scitotenv.2017.08.216. Bhunia, G.S., Shit, P.K., Maiti, R., 2016. Comparison of GIS-based interpolation methods for spatial distribution of soil organic carbon (SOC). J. Saudi Soc. Agric. Sci. 8. https://doi.org/10.1016/j.jssas.2016.02.001. BIS, 2012. Indian Standard Drinking Water Specification (Second Revision), vol. 10500. Bureau of Indian Standards, pp. 1e11. ISCUS, May. Bonelli, M.G., Ferrini, M., Manni, A., 2017. Artificial neural networks to evaluate organic and inorganic contamination in agricultural soils. Chemosphere 186, 124e131. https://doi.org/10.1016/j.chemosphere.2017.07.116. Breiman, L., 2001. Random Forests 1e33. Bretzler, A., Lalanne, F., Nikiema, J., Podgorski, J., Pfenninger, N., Berg, M., Schirmer, M., 2017. Groundwater arsenic contamination in Burkina Faso, west africa: predicting and verifying regions at risk. Sci. Total Environ. 584, 958e970. https://doi.org/10.1016/j.scitotenv.2017.01.147. Cao, H., Xie, X., Wang, Y., Pi, K., Li, J., Zhan, H., Liu, P., 2018. Predicting the risk of groundwater arsenic contamination in drinking water wells. J. Hydrol. 560, 318e325. https://doi.org/10.1016/j.jhydrol.2018.03.007. Census of India, 2011. Uttar Pradesh. India. Central Ground Water Board, 2013-14. Ground Water Year Book, vols. 1e81, 2014. Cha, Y.K., Kim, Y.M., Choi, J.W., Sthiannopkao, S., Cho, K.H., 2016. Bayesian modeling approach for characterizing groundwater arsenic contamination in the mekong river basin. Chemosphere 143, 50e56. https://doi.org/10.1016/j.chemosphere. 2015.02.045. Chakraborti, D., Rahman, M.M., Paul, K., Chowdhury, U.K., Sengupta, M.K., Lodh, D., Chanda, C.R., Saha, K.C., Mukherjee, S.C., 2002. Arsenic calamity in the Indian subcontinent what lessons have been learned? Talanta 58 (1), 3e22. https://doi. org/10.1016/S0039-9140(02)00270-9. Chakraborti, D., Rahman, M.M., Ahamed, S., Dutta, R.N., Pati, S., Mukherjee, S.C., 2016. Arsenic contamination of groundwater and its induced health effects in Shahpur block, Bhojpur district, Bihar state, India: risk evaluation. Environ. Sci. Pollut. Res. 23 (10), 9492e9504. https://doi.org/10.1007/s11356-016-6149-8. Chakraborti, D., Singh, S.K., Rashid, M.H., Rahman, M.M., 2018. Arsenic: occurrence in groundwater. In: Earth Sytems and Environmental Sciences, second ed. Elsevier Inc. Chauhan, V.S., Nickson, R.T., Chauhan, D., Iyengar, L., Sankararamakrishnan, N., 2009. Ground water geochemistry of Ballia district, Uttar Pradesh, India and mechanism of arsenic release. Chemosphere 75 (1), 83e91. https://doi.org/10. 1016/j.chemosphere.2008.11.065. Chauhan, V.S., Yunus, M., Sankararamakrishnan, N., 2012. Geochemistry and mobilization of arsenic in shuklaganj area of kanpur-unnao district, Uttar Pradesh, India. Environ. Monit. Assess. 184 (8), 4889e4901. https://doi.org/10.1007/ s10661-011-2310-5. Chen, Y., Graziano, J.H., Parvez, F., Liu, M., Slavkovich, V., Kalra, T., Argos, M., Islam, T., Ahmed, A., Rakibuz-Zaman, M., et al., 2011. Arsenic exposure from drinking

75

water and mortality from cardiovascular disease in Bangladesh: prospective cohort study. Biomed 342 (7806), 1e11. https://doi.org/10.1136/bmj.d2431. Cho, K.H., Sthiannopkao, S., Pachepsky, Y.A., Kim, K.W., Kim, J.H., 2011. Prediction of contamination potential of groundwater arsenic in Cambodia, Laos, and Thailand using artificial neural network. Water Res. 45 (17), 5535e5544. https://doi.org/10.1016/j.watres.2011.08.010. Drahota, P., Falteisek, L., Redlich, A., Rohovec, J., Matousek, T., Cepi cka, I., 2013. Microbial effects on the release and attenuation of arsenic in the shallow subsurface of a natural geochemical anomaly. Environ. Pollut. 180, 84e91. https://doi.org/10.1016/j.envpol.2013.05.010. Dummer, T.J.B., Yu, Z.M., Nauta, L., Murimboh, J.D., Parker, L., 2014. Geostatistical modelling of arsenic in drinking water wells and related toenail arsenic concentrations across nova scotia. Canada. Sci. Total Environ. 505, 1248e1258. https://doi.org/10.1016/j.scitotenv.2014.02.055. George, C.M., Zheng, Y., Graziano, J.H., Rasul, S. Bin, Hossain, Z., Mey, J.L., van Geen, A., 2012show a. Evaluation of an arsenic test kit for rapid well screening in Bangladesh. Environ. Sci. Technol. 46 (20), 11213e11219. https://doi.org/ 10.1021/es300253p. George, C.M., van Geen, A., Slavkovich, V., Singha, A., Levy, D., Islam, T., Ahmed, K.M., Moon-Howard, J., Tarozzi, A., Liu, X., Factor-Litvak, P., 2012show b. A clusterbased randomized controlled trial promoting community participation in arsenic mitigation efforts in Bangladesh. Environ. Health. 11, 41. https://doi.org/ 10.1186/1476-069X-11-41. Gong, G., Mattevada, S., O'Bryant, S.E., 2014. Comparison of the accuracy of kriging and IDW interpolations in estimating groundwater arsenic concentrations in Texas. Environ. Res. 130, 59e69. https://doi.org/10.1016/j.envres.2013.12.005. Ground Water Policy (GWP), 2013. Ground Water Dept, Policy for Sustainable Ground Water Management in Uttar Pradesh. Guo, H., Wen, D., Liu, Z., Jia, Y., Guo, Q., 2014. A review of high arsenic groundwater in mainland and taiwan, China: distribution, characteristics and geochemical processes. Appl. Geochem. 41, 196e217. https://doi.org/10.1016/j.apgeochem. 2013.12.016. Harrell, F.E., 2001. Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression and Survival Analysis. Springer, New York, p. 568. Harrell Jr., F.E., 2015. Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer. Harvey, C.F., Swartz, C.H., Badruzzaman, A.B.M.M., Keon-Blute, N., Yu, W., Ali, M.A., Jay, J., Beckie, R., Niedan, V., Brabander, D., et al., 2002. Arsenic mobility and groundwater extraction in Bangladesh. Sci 298 (5598), 1602e1606. https:// doi.org/10.1126/science.1076978, 80. Harvey, C.F., Swartz, C.H., Badruzzaman, A.B.M., Keon-Blute, N., Yu, W., Ali, M.A., Jay, J., Beckie, R., Niedan, V., Brabander, D., et al., 2005. Groundwater arsenic contamination on the ganges delta: biogeochemistry, hydrology, human perturbations, and human suffering on a large scale. Compt. Rendus Geosci. 337 (1e2), 285e296. https://doi.org/10.1016/j.crte.2004.10.015. Hoque, M.A., Burgess, W.G., Shamsudduha, M., Ahmed, K.M., 2011. Delineating lowarsenic groundwater environments in the bengal aquifer system, Bangladesh. Appl. Geochem. 26 (4), 614e623. https://doi.org/10.1016/j.apgeochem.2011.01. 018. Hoque, M.A., Khan, A.A., Shamsudduha, M., Hossain, M.S., Islam, T., Chowdhury, S.H., 2009. Near surface lithology and spatial variation of arsenic in the shallow groundwater: southeastern Bangladesh. Environ. Geol. 56 (8), 1687e1695. Hoque, M.A., McArthur, J.M., Sikdar, P.K., 2012. The palaeosol model of arsenic pollution of groundwater tested along a 32 Km traverse across West Bengal, India. Sci. Total Environ. 431, 157e165. https://doi.org/10.1016/j.scitotenv.2012. 05.038. Kirchner, J.W., Weil, A., Nickson, R., McArthur, J., Burgess, W., Ahmed, K.M., 1998. Arsenic poisoning of Bangladesh groundwater. Nature 395, 1998. https://doi. org/10.1038/26387. Kumar, M., Kumar, P., Ramanathan, A.L., Bhattacharya, P., Thunvik, R., Singh, U.K., Tsujimura, M., Sracek, O., 2010. Arsenic enrichment in groundwater in the middle Gangetic plain of Ghazipur district in Uttar Pradesh, India. J. Geochem. Explor. 105 (3), 83e94. https://doi.org/10.1016/j.gexplo.2010.04.008. Lado, L.R., Polya, D., Winkel, L., Berg, M., Hegan, A., 2008. Modelling arsenic hazard in Cambodia: a geostatistical approach using ancillary data. Appl. Geochem. 23 (11), 3010e3018. https://doi.org/10.1016/j.apgeochem.2008.06.028. Lee, J.J., Jang, C.S., Liu, C.W., Liang, C.P., Wang, S.W., 2009. Determining the probability of arsenic in groundwater using a parsimonious model. Environ. Sci. Technol. 43 (17), 6662e6668. https://doi.org/10.1021/es900540s. Luo, T., Hu, S., Cui, J., Tian, H., Jing, C., 2012. Comparison of arsenic geochemical evolution in the datong basin (shanxi) and hetao basin (inner Mongolia), China. Appl. Geochem. 27 (12), 2315e2323. https://doi.org/10.1016/j.apgeochem.2012. 08.012. Maiti, S., Tiwari, R.K., 2014. A comparative study of artificial neural networks, Bayesian neural networks and adaptive neuro-fuzzy inference system in groundwater level prediction. Env. Earth. Sci. 71 (7), 3147e3160. Mazumder, D.G., Saha, A., Ghosh, N., Majumder, K.K., 2016. Effect of drinking arsenic safe water for ten years in an arsenic exposed population: study in West Bengal, India. In: Arsenic Research and Global Sustainability: Proceedings of the Sixth International Congress on Arsenic in the Environment (As2016). CRC Press, Stockholm, Sweden, p. 365. June 19-23. McArthur, J.M., Banerjee, D.M., Hudson-Edwards, K.A., Mishra, R., Purohit, R., Ravenscroft, P., Cronin, A., Howarth, R.J., Chatterjee, A., Talukder, T., Lowry, D., 2004. Natural organic matter in sedimentary basins and its relation to arsenic in anoxic Ground water: the example of West Bengal and its worldwide

76

S. Bindal, C.K. Singh / Water Research 159 (2019) 65e76

implications. Appl. Geochem. 19 (8), 1255e1293. https://doi.org/10.1016/j. apgeochem.2004.02.001. McArthur, J.M., Nath, B., Banerjee, D.M., Purohit, R., Grassineau, N., 2011. Palaeosol control on groundwater flow and pollutant distribution: the example of arsenic. Environ. Sci. Technol. 45 (4), 1376e1383. https://doi.org/10.1021/es1032376. McArthur, J.M., Ravenscroft, P., Banerjee, D.M., Milsom, J., Hudson-Edwards, K.A., Sengupta, S., Bristow, C., Sarkar, A., Tonkin, S., Purohit, R., 2008. How paleosols influence groundwater flow and arsenic pollution: a model from the Bengal basin and its worldwide implication. Water Resour. Res. 44 (11), 1e30. https:// doi.org/10.1029/2007WR006552. McArthur, J.M., Ravenscroft, P., Safiulla, S., Thirlwall, M.F., 2001. Arsenic in groundwater: testing pollution mechanisms for sedimentary aquifer in Bangladesh. Water Resour. 37 (1), 109e117. https://doi.org/10.1029/ 2000WR900270. MDWS Report, 2011. Central Team on Arsenic Mitigation in Rural Drinking Water Sources in Ballia District. Uttar Pradesh State Ministry of Drinking Water and Sanitation Government of India, New Delhi. Mehrotra, A., Mishra, A., Tripathi, R.M., Shukla, N., 2014. Mapping of arsenic contamination severity in bahraich district of ghagra basin, Uttar Pradesh, India. Geomatics, Nat. Hazards Risk 0 (0), 1e12. https://doi.org/10.1080/19475705. 2013.871354. Mukherjee, A., Fryar, A.E., Thomas, W.A., 2009. Geologic, geomorphic and hydrologic framework and evolution of the Bengal basin, India and Bangladesh. J. Asian Earth Sci. 34 (3), 227e244. https://doi.org/10.1016/j.jseaes.2008.05.011. Mukherjee, A., Sengupta, M.K., Hossain, M.A., Ahamed, S., Das, B., Nayak, B., Lodh, D., Rahman, M.M., Chakraborti, D., 2006. Arsenic contamination in groundwater: a global perspective with emphasis on the Asian scenario. J. Health Popul. Nutr. 24 (2), 142e163. Naghibi, S.A., Pourghasemi, H.R., Dixon, B., 2016. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ. Monit. Assess. 188 (1), 1e27. Naseem, S., McArthur, J.M., 2018. Arsenic and other water-quality issues affecting groundwater, indus alluvial plain, Pakistan. Hydrol. Process. 32 (9), 1235e1253. https://doi.org/10.1002/hyp.11489. Nickson, R.T., McArthur, J.M., Shrestha, B., Kyaw-Myint, T.O., Lowry, D., 2005. Arsenic and other drinking water quality issues, muzaffargarh district, Pakistan. Appl. Geochem. 20 (1), 55e68. https://doi.org/10.1016/j.apgeochem.2004.06.004. Nickson, R., Sengupta, C., Mitra, P., Dave, S.N., Banerjee, A.K., Bhattacharya, A., Basu, S., Kakoti, N., Moorthy, N.S., Wasuja, M., Kumar, M., 2007. Current knowledge on the distribution of arsenic in groundwater in five states of India. J Environ. Sci. Health Part A 42 (12), 1707e1718. https://doi.org/10.1080/ 10934520701564194. Parvez, F., Wasserman, G.A., Factor-Litvak, P., Liu, X., Slavkovich, V., Siddique, A.B., Sultana, R., Sultana, R., Islam, T., Levy, D., et al., 2011. Arsenic exposure and motor function among children in Bangladesh. Environ. Health Perspect. 119 (11), 1665e1670. https://doi.org/10.1289/ehp.1103548. Podgorski, J.E., Eqani, S.A.M.A.S., Khanam, T., Ullah, R., Shen, H., Berg, M., 2017. Extensive arsenic contamination in high-PH unconfined aquifers in the indus valley. Sci. Adv. 3 (8) https://doi.org/10.1126/sciadv.1700935. Pokhrel, D., Bhandari, B.S., Viraraghavan, T., 2009. Arsenic contamination of groundwater in the terai region of Nepal : an overview of health concerns and treatment options. Environ. Int. 35 (1), 157e161. https://doi.org/10.1016/j. envint.2008.06.003. Polya, D.A., Polizzotto, M.L., Fendorf, S., Rodriguez-Lado, L., Hegan, A., Lawson, M., Rowland, H.A.L., Giri, A.K., Mondal, D., Sovann, C., et al., 2010. Arsenic in groundwaters of Cambodia. Water Resour. Dev. Southeast Asia 31e56. January. Postma, D., Pham, T.K.T., Sø, H.U., Hoang, V.H., Vi, M.L., Nguyen, T.T., Larsen, F., Pham, H.V., Jakobsen, R., 2016. A model for the evolution in water chemistry of an arsenic contaminated aquifer over the last 6000 Years, red river floodplain, Vietnam. Geochem. Cosmochim. Acta 195, 277e292. https://doi.org/10.1016/j. gca.2016.09.014. €m, E.-C., Smith, A.H., Rahman, A., Persson, L.Å., Nermell, B., El Arifeen, S., Ekstro Vahter, M., 2010. Arsenic exposure and risk of spontaneous abortion, stillbirth, and infant mortality. Epidemiology 21 (6), 797e804. https://doi.org/10.1097/ EDE.0b013e3181f56a0d. Raju, N.J., 2011. Evaluation of hydrogeochemical processes in the Pleistocene aquifers of middle Ganga plain, Uttar Pradesh, India. Environ. Earth Sci. 65 (4), 1291e1308. https://doi.org/10.1007/s12665-011-1377-1. Rasool, A., Farooqi, A., Masood, S., Hussain, K., 2016. Arsenic in groundwater and its health risk assessment in drinking water of mailsi, Punjab, Pakistan. Human. Ecol. Risk Assess. 22 (1), 187e202. https://doi.org/10.1080/10807039.2015. 1056295. Ravenscroft, P., Brammer, H., Richards, K., 2009. Arsenic Pollution: a Global Synthesis, vol.28. John Wiley & Sons. Rodríguez-Lado, L., Sun, G., Berg, M., Zhang, Q., Xue, H., Zheng, Q., Johnson, C.A., 2013. Groundwater arsenic contamination throughout China. Sci, 341 (6148), 866e868. https://doi.org/10.1126/science.1237484, 80. Roy, R., Ahmad, H., 2015. State Agricultural Profile of Uttar Pradesh. Agro-Economic Research Centre, University of Allahabad. Roychowdhury, T., Tokunaga, H., Uchino, T., Ando, M., 2005. Effect of arseniccontaminated irrigation water on agricultural land soil and plants in West

Bengal, India. Chemosphere 58 (6), 799e810. Saha, D., Sahu, S., 2016. A decade of investigations on groundwater arsenic contamination in middle Ganga plain, India. Environ. Geochem. Health 38 (2), 315e337. https://doi.org/10.1007/s10653-015-9730-z. Shah, B.A., 2013. Arsenic in groundwater, quaternary sediments, and suspended river sediments from the middle Gangetic plain, India: distribution, field relations, and geomorphological setting. Arab. J. Geosci. 7 (9), 3525e3536. https:// doi.org/10.1007/s12517-013-1012-4. Shah, B.A., 2010. Arsenic-contaminated groundwater in Holocene sediments from parts of middle Ganga plain, Uttar Pradesh, India. Curr. Sci. 98 (10), 1359e1365. Shah, B.A., 2017. Groundwater arsenic contamination from parts of the ghaghara basin, India: influence of fluvial geomorphology and quaternary morphostratigraphy. Appl. Water Sci. 7 (5), 2587e2595. https://doi.org/10.1007/s13201016-0459-3. Shah, B.A., 2008. Role of quaternary stratigraphy on arsenic-contaminated groundwater from parts of middle Ganga plain, UP-Bihar, India. Environ. Geol. 53 (7), 1553e1561. https://doi.org/10.1007/s00254-007-0766-y. Shamsudduha, M., Uddin, A., Saunders, J.A., Lee, M.K., 2008. Quaternary stratigraphy, sediment characteristics and geochemistry of arsenic-contaminated alluvial aquifers in the ganges-brahmaputra floodplain in Central Bangladesh. J. Contam. Hydrol. 99 (1e4), 112e136. https://doi.org/10.1016/j.jconhyd.2008.03. 010. Sharma, P., Rolle, M., Kocar, B.D., Fendorf, S., Kapppler, A., 2010. Influence of natural organic matter on as transport and retention. Environ. Sci. Technol. 45 (2), 546e553. https://doi.org/10.1021/es1026008. Singh, C.K., Kumar, A., Bindal, S., 2018. Arsenic contamination in Rapti river basin, terai region of India. J. Geochem. Explor. 192, 120e131. https://doi.org/10.1016/j. gexplo.2018.06.010. Sovann, C., Polya, D.A., 2014. Improved groundwater geogenic arsenic hazard map for Cambodia. Environ. Chem. 11 (5), 595e607. https://doi.org/10.1071/ EN14006. Stuckey, J.W., Sparks, D.L., Fendorf, S., 2016. In: Delineating the Convergence of Biogeochemical Factors Responsible for Arsenic Release to Groundwater in South and Southeast Asia. Adv. Agron., vol.140. Academic Press, pp. 43e74. https://doi.org/10.1016/bs.agron.2016.06.002. Tesoriero, A.J., Gronberg, J.A., Juckem, P.F., Miller, M.P., Austin, B.P., 2017. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification. Water Resour. Res. 53 (8), 7316e7331. https://doi.org/10. 1002/2016WR020197. van Geen, A., Bostick, B.C., Thi Kim Trang, P., Lan, V.M., Mai, N.-N., Manh, P.D., Viet, P.H., Radloff, K., Aziz, Z., Mey, J.L., et al., 2013. Retardation of arsenic transport through a Pleistocene aquifer. Nature 501 (7466), 204e207. https:// doi.org/10.1038/nature12444. van Geen, A., Win, K.H., Zaw, T., Naing, W., Mey, J.L., Mailloux, B., 2014. Confirmation of elevated arsenic levels in groundwater of Myanmar. Sci. Total Environ. 478, 21e24. https://doi.org/10.1016/j.scitotenv.2014.01.073. van Geen, A., Farooqi, A., Kumar, A., Khattak, J.A., Mushtaq, N., Hussain, I., Ellis, T., Singh, C.K., 2018. Field testing of over 30,000 wells for arsenic across 400 villages of the Punjab plains of Pakistan and India: Implications for prioritizing mitigation. Sci. Total Environ. 654, 358e1363, 2019. https://doi.org/10.1016/j. scitotenv.2018.11.201. Verma, S., Mukherjee, A., Mahanta, C., Choudhury, R., Mitra, K., 2016. Influence of geology on groundwateresediment interactions in arsenic enriched tectonomorphic aquifers of the himalayan brahmaputra river basin. J. Hydrol. 540, 176e195. https://doi.org/10.1016/j.jhydrol.2016.05.041. Wasserman, G.A., Liu, X., Parvez, F., Ahsan, H., Litvak, P.F., van Geen, A., Slavkovich, V., Lolacono, N.J., Cheng, Z., Hussain, I., H., M., G, J.H., 2004. Water arsenic exposure and children's intellectual function in araihazar, Bangladesh. Environ. Health Perspect. 112 (13), 1329e1333. https://doi.org/10.1289/ehp. 6964. Winkel, L.H.E., Trang, P.T.K., Lan, V.M., Stengel, C., Amini, M., Ha, N.T., Viet, P.H., Berg, M., 2011. Arsenic pollution of groundwater in Vietnam exacerbated by deep aquifer exploitation for more than a century. Proc. Natl. Acad. Sci. Unit. States Am. 108 (4), 1246e1251. https://doi.org/10.1073/pnas.1011915108. Winkel, L., Berg, M., Amini, M., Hug, S.J., Johnson, C.A., 2008a. Predicting groundwater arsenic contamination in southeast asia from surface parameters. Nat. Geosci. 1 (8). https://doi.org/10.1038/ngeo254. Winkel, L., Berg, M., Stengel, C., Rosenberg, T., 2008b. Hydrogeological survey assessing arsenic and other groundwater contaminants in the lowlands of sumatra, Indonesia. Appl. Geochem. 23 (11), 3019e3028. https://doi.org/10. 1016/j.apgeochem.2008.06.021. Yang, N., Winkel, L.H.E., Johannesson, K.H., 2014. Predicting geogenic arsenic contamination in shallow groundwater of south Louisiana, United States. Environ. Sci. Technol. 48 (10), 5660e5666. https://doi.org/10.1021/es405670g. Zhang, Q., Rodríguez-Lado, L., Johnson, C.A., Xue, H., Shi, J., Zheng, Q., Sun, G., 2012. Predicting the risk of arsenic contaminated groundwater in shanxi province, northern China. Environ. Pollut. 165, 118e123. https://doi.org/10.1016/j.envpol. 2012.02.020. Zhang, Q., Rodriguez-Lado, L., Liu, J., Johnson, C.A., Zheng, Q., Sun, G., 2013. Coupling predicted model of arsenic in groundwater with endemic arsenism occurrence in shanxi province, northern China. J. Hazard Mater. 262, 1147e1153. https:// doi.org/10.1016/j.jhazmat.2013.02.017.