Geoderma 216 (2014) 1–9
Contents lists available at ScienceDirect
Geoderma journal homepage: www.elsevier.com/locate/geoderma
Prediction of low heavy metal concentrations in agricultural soils using visible and near-infrared reflectance spectroscopy Junjie Wang a, Lijuan Cui b, Wenxiu Gao c,⁎, Tiezhu Shi a, Yiyun Chen a, Yin Gao a a b c
School of Resource and Environmental Science & Key Laboratory of Geographic Information System of the Ministry of Education, Wuhan University, Wuhan 430079, China Institute of Wetland Research, Chinese Academy of Forestry, Beijing 100091, China State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
a r t i c l e
i n f o
Article history: Received 16 April 2012 Received in revised form 28 October 2012 Accepted 28 October 2013 Available online 21 November 2013 Keywords: Soil heavy metal VNIR reflectance spectroscopy GA-PLSR Land-use type Predictive mechanism
a b s t r a c t In order to monitor the accumulation of heavy metals effectively and avoid the damage to the health of agricultural soils, a promising approach is to predict low concentrations of heavy metals in soils using visible and nearinfrared (VNIR) reflectance spectroscopy coupled with calibration techniques. This study aimed to (i) compare the performance of a combination of partial least squares regression with genetic algorithm (GA-PLSR) against a general PLSR for predicting low concentrations of four heavy metals (i.e., As, Pb, Zn and Cu) in agricultural soils; (ii) explore the transferability of GA-PLSR models defined on one subset of land-use types to the other types; and (iii) to investigate the predictive mechanism for the prediction of the metals. One hundred soil samples were collected in the field locating at Yixing in China, and VNIR reflectance (350–2500 nm) spectra were measured in a laboratory. With the entire soil samples, GA-PLSR and PLSR models were calibrated for the four heavy metals using a leave-one-out cross-validation procedure. The GA-PLSR models achieved better cross-validated accuracies than the PLSR models. For the transferability of GA-PLSR models, the soil samples were divided into three pairs of training sets and test sets from different land-use types. Three GA-PLSR models defined on the training sets had good transferability to the test sets, but nine GA-PLSR models were not successful. As for the predictive mechanism, besides the widely-used correlation analysis between OM and the metals, the relationship between the content of OM and the prediction accuracy of the metals was investigated and the similarity of the important wavelengths for OM and the metals was compared. The three methods verified that OM had a significant correlation with the predictions of the spectrally-featureless metals (Pb, Zn and Cu) from VNIR reflectance. We conclude that GA-PLSR modeling has a better capability for the prediction of the low heavy metal concentrations from VNIR reflectance, and it has a potential of transferability between different land-use types, and its accuracy is fundamentally influenced by OM. © 2013 Elsevier B.V. All rights reserved.
1. Introduction In China, the human activities, such as mining, transportation, sewage disposal and fertilizing, have been posing an ongoing threat to the soil health over the last two decades (Wei and Yang, 2010). Moreover, the consumption of metal-polluted crops (e.g., rice, corn and soybean) grown in agricultural soils greatly raises the potential risks of food security and human health (Zhuang et al., 2009). Therefore, the determination of heavy metal concentrations of agricultural soils is necessary to monitor the health of agricultural soils and further to take preventative measures to avoid soil contamination. Conventionally, the spatial distribution of heavy metal concentrations in soil is investigated based on numerous soil samples and laboratory analysis, which is time-consuming with high expense and low efficiency (Jarmer et al., 2008; Kemper and Sommer, 2002; Kooistra et al., 2003; Mouazen et al., 2007; Ren et al., 2009). Due to its rapidness, ⁎ Corresponding author. Tel.: +86 27 68778524; fax: +86 27 68778969. E-mail address:
[email protected] (W. Gao). 0016-7061/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.geoderma.2013.10.024
inexpensiveness and non-destruction, visible and near-infrared (VNIR) reflectance spectroscopy coupled with calibration techniques has been developed to predict various soil properties, such as moisture (Gill et al., 2006), organic matter (Kooistra et al., 2001), clay (Gomez et al., 2008a), Fe (Ren et al., 2009) and heavy metals (Wu et al., 2005). Among the calibration techniques, partial least squares regression (PLSR) is considered as a common standard tool (Viscarra Rossel et al., 2006), and principle component regression (PCR) (Chang et al., 2001), stepwise multiple linear regression (SMLR) (Kemper and Sommer, 2002), back propagation neural network (BPNN) (Mouazen et al., 2010) and support vector regression (SVR) (Gill et al., 2006) were also employed by many studies. In order to simplify the calibration models and improve the prediction accuracy, PLSR has been combined with wavelength selection methods such as genetic algorithm (GA-PLSR) (Vohland et al., 2011) for predicting soil organic carbon and with interval partial least squares algorithm (iPLSR) (Inoue et al., 2012) for predicting nitrogen content in rice, but such approach has not yet been studied in the prediction of soil heavy metal concentrations according to our reviews.
2
J. Wang et al. / Geoderma 216 (2014) 1–9
Hitherto the prediction of heavy metals has been focused on soils polluted by medium and high concentrations of heavy metals, for example, soils collected from sediment (Choe et al., 2008; Malley and Williams, 1997; Moros et al., 2009), alluvial (Clevers et al., 2004; Kooistra et al., 2001, 2004; Vohland et al., 2009; Wu et al., 2007), mining (Kemper and Sommer, 2002; Ren et al., 2009; Siebielec et al., 2004), urban (Pandit et al., 2010) and suburban (Wu et al., 2005) areas. Little attention has been devoted to predict low heavy metal concentrations of agricultural soils. Such prediction, however, is indispensible for monitoring soil health since a minor increase of heavy metal concentration may lead to large pollution levels (Pandit et al., 2010). In addition, most of the previous studies were carried out with soil samples from a homogeneous land-use type such as grassland (Vohland et al., 2009) and paddy field (Wu et al., 2007), without much consideration of soil samples from different land-use types. Heavy metals in soil are deemed spectrally featureless, and most of them cannot be detected with VNIR reflectance spectroscopy at concentrations ≤1000 mg kg−1 (Wu et al., 2007). Therefore, it is not straightforward to derive heavy metal concentrations from VNIR reflectance of soils. Some studies have declared that there were significant correlations between heavy metals and spectrally-active properties of soils, such as organic matter (Kooistra et al., 2001; Malley and Williams, 1997; Moros et al., 2009; Vohland et al., 2009), clay (Kooistra et al., 2001) and Fe (Ren et al., 2009; Wu et al., 2005, 2007). Thus, such properties might play a bridge role in the mechanism of the prediction for soil heavy metal concentrations from VNIR reflectance. Various studies have explored the predictive mechanism and proclaimed that it was diverse with different soil conditions and properties (Malley and Williams, 1997; Wu et al., 2005). Using VNIR reflectance acquired in laboratory, this study (i) applied the GA-PLSR modeling to predict low concentrations of heavy metals (i.e., As, Pb, Zn and Cu) in agricultural soils and compared its performance against general PLSR approach; (ii) explored the transferability of GA-PLSR models between different land-use types without extra calibrations; and (iii) investigated the predictive mechanism for the assessment of the metals in soils with three methods from different views. 2. Materials and methods 2.1. Study area The study area is located at Yixing (Fig. 1), in the south of Jiangsu Province, China. It has distinctive seasons with an annual temperature of 15.7 °C and a mean annual precipitation of 1177 mm (Zhang et al., 2005). According to the USDA Soil Taxonomy, the inceptisol dominates the study area. Yixing is a typical agricultural region, cropland distributing in the low-lying area, woodland and tea plantation spreading in the southern hills and low mountains, cereal dominating in the northern and western area, and vegetables (e.g., gingeli and soybean) growing in the eastern area.
All the soil samples were air-dried at room temperature for three days to standardize the moisture level, and the small stones and plant residues were removed. The 100 soil samples were ground with an agate mortar and passed through a 20-mesh sieve (0.84 mm) in order to minimize the impacts of particle-size on soil spectral reflectance (Chang et al., 2001; Kooistra et al., 2001). Afterwards, each soil sample was split into two sub-samples, one for laboratory spectral measurement and another one for chemical analysis of soil properties after passing through a 100-mesh sieve (0.15 mm). 2.3. Chemical analysis of soil properties The pH, content of organic matter (OM), and concentrations of heavy metals including arsenic (As), plumbum (Pb), zinc (Zn) and copper (Cu) were measured with about 50 g soils. The pH was measured with potentiometry method, and the OM content was determined using potassium dichromate volumetric method (Jackson, 1960). Before the heavy metal concentrations measured, the soil samples were preprocessed by electric heating board acid (HCL-HNO3-HCLO4) digestion. The Pb, Zn and Cu concentrations were measured with an atomic absorption flame spectrometer (AAFS), and the As concentration was measured using an atomic fluorescence spectrophotometry. In order to minimize the experimental errors, the national geochemical standard soil samples were used in the chemical analysis of the heavy metal concentrations (Wu et al., 2008).
2.4. Spectral measurement of soil samples The reflectance spectra of the soil samples were measured in a laboratory using an ASD FieldSpec® 3 portable spectroradiometer (Analytical Spectral Devices, Inc., USA), which covers a spectral range of 350–2500 nm and offers data collection in 10 scans per second. The spectral measurements were made in a dark room in order to control the illumination conditions and reduce the effects of stray light. The following five steps were carried out to ensure the measurement accuracy: (i) for each sample, about 50 g soils were homogenized and smoothed on a glass utensil (2 cm height by 10 cm in diameter) to make the diffuse reflection maximized and to increase signal-to-noise ratio (Mouazen et al., 2010); (ii) a 50 W halogen lamp (Analytical Spectral Devices, Inc., USA) pointing at an angle of 15° to the vertical was mounted on a tripod about 40 cm above the nadir to ensure the uniform distribution of incident light; (iii) the fiber optic with a 25° field-of-view, was positioned in a pistol and mounted on a tripod 15 cm above the center of the observed sample, to reduce the impacts of background scattering on the soil reflectance spectra (Ren et al., 2009); (iv) a calibrated white Spectralon panel was systematically measured under the same conditions before the spectral measurement; and (v) 10 successive measurements were performed for each sample, and the measured values were averaged as the final spectrum.
2.2. Field sampling and pre-processing
2.5. Spectral pre-processing
A total of 30 sampling sites (30 × 30 m) (Fig. 1) were set during August 11–14, 2010 in agricultural areas, which covered nine land-use types including gingeli (Sesamum indicum L.) cropland, corn (Zea mays L.) field, soybean (Glycine max Merr.) cropland, paddy (Oryza sativa L.) field, tea (Camellia sinensis L.) plantation, shrubland (Llex Cornuta Lindl.), arbor (Cinnamomum camphora Presl.) woodland, grassland (Setaria viridis Beauv.) and bare land. At each site, the surface soil samples (0–10 cm) were collected at three or four sampling points. The location of each sampling point was acquired using a Global Positioning System (GPS–Garmin Map 60cs) with an accuracy of about ±5 m. As a result, a total of 100 soil samples were collected, and each sample was kept in a labeled sample bag.
Due to the large instrument noises, the spectral bands (350–399 nm and 2451–2500 nm) were first removed from the initial reflectance spectra of the soil samples to improve the signal-to-noise ratio. Afterwards, the remaining reflectance spectra were transformed into the absorbance (log10 1/R, R is reflectance) in order to reduce the nonlinearities and scattering effects (Gomez et al., 2008a; Kemper and Sommer, 2002). The spectra were then subjected to the Savitzky–Golay smoothing method with a second order polynomial fit and a window size of 7 data points. This method can reduce the impacts of random noise on the robustness of calibration models (Gomez et al., 2008a; Wu et al., 2005). All the transformations were implemented using MatLab software version 7.11.0 (The MathWorks, Inc., USA).
J. Wang et al. / Geoderma 216 (2014) 1–9
3
Fig. 1. Spatial distribution of 100 soil samples collected at Yixing, Jiangsu Province, China.
2.6. Model calibration and validation 2.6.1. GA-PLSR modeling This study applied genetic algorithm (GA) to select an array of continuous wavelengths over the spectral range and the root mean square error of cross-validation (RMSECV) was used as a fitness criterion. The GA can simplify the calibration models and improve the prediction accuracy via substantially decreasing the number of redundant wavelengths (Goicoechea and Olivieri, 2003). The details of the GA can refer to Jarvis and Goodacre (2005). With the selected wavelengths and the entire soil samples, PLSR models were established to predict each heavy metal and OM, since PLSR is specially suitable for the case that the number of highly collinear predictor variables is greater than the number of soil samples (Summers et al., 2011; Viscarra Rossel et al., 2006). The optimum number of latent variables (LVs) or PLS factors was determined using a leave-one-out cross-validation procedure. In order to assess the performance of GA-PLSR modeling, general PLSR models were also defined for the heavy metals with the cross-
validation method on the entire soil samples. The whole processes of the GA-PLSR and PLSR modeling were carried out using PLS-Toolbox version 6.7.1 from Eigenvector Research Inc. For each GA-PLSR model, the parameter values of the GA were set based on the preliminary tests: population size (64), window width (15 nm), max generations (100), mutation rate (0.5%) and replicate runs (5).
2.6.2. Transferability verifying In order to verify the transferability of GA-PLSR models defined on one subset of land-use types to the other types without further calibration, the samples were organized into three different datasets according to the land-use types and each dataset had a pair of training set and test set. Dataset #1: dry land (including gingeli cropland, soybean cropland and corn field) and paddy field. Dataset #2: woodland (including tea plantation, shrubland and arbor woodland) and grassland. Dataset #3: farmland (including gingeli cropland, soybean cropland, corn field and paddy field) and non-farmland (including tea plantation, shrubland, arbor woodland and grassland).
4
J. Wang et al. / Geoderma 216 (2014) 1–9
For each heavy metal, one GA-PLSR model was defined on the training set of each dataset with a cross-validation approach and then was validated with the test set of the same dataset to check the transferability of the model. 2.6.3. Evaluation of the calibration models The determination coefficient for cross-validation and validation (r2CV and r2Pre), model bias and residual predictive deviation (RPD) were used to evaluate the performances of the above calibration models. The r2CV and RPD give a respective indication of the quality and predictive power of the calibration models. We adopted the fivelevel interpretations of r2 given by Williams (2004) and RPD given by Saeys et al. (2005): excellent predictions (r2 N 0.90, RPD N 3.0); good (r2 0.82 to 0.90, RPD 2.5 to 3.0); approximate quantitative predictions (r2 0.66 to 0.81, RPD 2.0 to 2.5); possibility to distinguish between high and low values (r2 0.50 to 0.65, RPD 1.5 to 2.0); unsuccessful (r2 b 0.50, RPD b 1.50). 2.7. Predictive mechanism exploring As for the predictive mechanism of the assessment of heavy metal concentrations from VNIR reflectance, previous studies focused on the correlations between the spectrally-active soil properties (e.g., OM, clay and Fe) and heavy metal concentrations (Ren et al., 2009; Vohland et al., 2009; Wu et al., 2007). Besides the correlation analysis, we furthermore investigated the relationship between the content of OM and the prediction accuracy of the heavy metals with the GA-PLSR and general PLSR models defined on the entire samples. All the soil samples were sorted by the measured contents of OM in ascending order. Each sample had a measured and predicted content of OM, and a measured and predicted concentration of each heavy metal. Further, the samples were graded into several groups according to the measured contents of OM with an equal-interval classification. For each sample, the absolute relative error between the measured values and predicted values was calculated for OM and the four metals, and then the mean absolute relative error (MARE) was calculated for each group. The higher MARE means the lower prediction accuracy for a group. If the prediction accuracy of the heavy metals has an explicit relation with the content of OM, it infers that OM greatly influences the prediction of the heavy metals from VNIR reflectance. In addition, we compared the important wavelengths of OM and heavy metals. If their important wavelengths have the similar distribution, it infers that the OM has potential functions in the prediction of the spectrally-featureless heavy metals. The important wavelengths were identified jointly by the Variable Importance in the Projection (VIP) and the PLS regression coefficient called b-coefficient (Gomez et al., 2008a). If the VIP value is larger than 1 and the b-coefficient value is larger than the standard deviation, the corresponding wavelength is considered important (Gomez et al., 2008a).
Table 1 Statistical description of the soil properties (N = 96).a pH
Maximum Minimum Mean SD Skewness Kurtosis CV NBC
8.05 4.17 5.93 0.87 −0.25 −0.42 14.68 –
OM (g kg−1) 52.98 1.35 23.35 9.43 −0.01 0.06 40.36 –
3. Results 3.1. Soil properties Four samples were removed from the 100 soil samples due to their abnormal reflectance spectra and evident chemical measurement errors. Table 1 shows the descriptive statistics of the soil properties of the remaining 96 soil samples. The pH values ranged from 4.17 to 8.05, i.e., strong acid to mild alkaline. The mean value of pH indicates that the soil is acidy in the study area. There was a large variation of the OM contents from 1.35 to 52.98 g kg−1, presenting an approximately normal distribution with the coefficients of skewness and kurtosis close to 0. The As and Zn concentrations performed a more skewed and irregular distribution. In addition, compared with As, Pb and Cu, the Zn concentrations presented a higher standard deviation (SD = 19.99 mg kg−1) and a higher coefficient of variation (CV = 36.93%). According to the Soil Environment Quality Standards (GB156181995) in China, the mean concentrations of the four heavy metals were at natural background level and varied within a relatively narrow range. It means that the agricultural soils in the study area were basically not metal-polluted. There were only nine samples moderately contaminated by As, Pb and Zn. The contamination may mostly result from sewage irrigation, parent materials or vehicle exhausts (Wei and Yang, 2010). Table 2 lists the mean values of these soil properties of different land-use types covered by the soil samples in this study. According to the pH values, the acid of the soil samples decreased from gingeli cropland to bare land and their heavy metal concentrations and OM content also decreased. 3.2. GA-PLSR models and general PLSR models Table 3 shows the cross-validation results of the GA-PLSR and general PLSR models defined on the entire soil samples. Compared with the general PLSR models using the entire wavelengths (400–2450 nm), the number of the wavelengths selected by the GA was fewer from 330 for Cu to 551 for Zn, and the r2CV values of the GA-PLSR models for As, Pb, Zn, Cu and OM increased by 12.50%, 22.45%, 3.45%, 18.97% and 15.15%, respectively. The GA-PLSR models for As, Pb and Zn generated r2CV values of 0.60–0.63 with biases ranging from −0.15 to 0.01 mg kg−1, while the GA-PLSR models provided a r2CV value of 0.69 with a bias of − 0.01 mg kg− 1 for Cu and a r2CV value of 0.76 with a bias of − 0.04 g kg− 1 for OM. It indicates that the GA-PLSR models can not only improve the cross-validated accuracy but also discriminate well between high and low values of the metals and OM. 3.3. Transferability of the GA-PLSR models Since one GA-PLSR model was defined for each heavy metal with each sample dataset described in Section 2.6.2, the total 12 models Table 2 The mean values of the soil properties for the nine land-use types.a
Metals (mg kg−1) As
Pb
Zn
Cu
21.90 1.91 9.24 2.77 0.99 3.89 29.96 15
37.60 9.01 21.74 6.20 0.38 −0.14 28.50 35
117.94 29.32 54.11 19.99 1.14 0.94 36.93 100
26.38 8.30 16.03 4.06 0.20 −0.49 25.30 35
a SD, standard deviation; CV, coefficient of variation in %; NBC, natural background concentrations of soil heavy metals from Soil Environment Quality Standards (GB15618-1995) in China.
Land-use type
N
pH
OM (g kg−1)
Gingeli cropland Paddy field Soybean cropland Grassland Tea plantation Arbor woodland Shrubland Corn field Bare land
12 14 10 13 10 10 11 12 4
5.71 5.93 5.94 5.94 5.82 5.97 6.01 6.09 6.47
25.78 24.20 23.72 23.89 24.39 23.61 23.49 23.21 13.13
a
Metals (mg kg−1) As
Pb
Zn
Cu
9.34 9.28 9.32 9.38 9.65 9.62 9.55 9.21 8.87
23.24 22.05 22.06 22.20 22.87 21.87 21.84 21.46 17.25
57.65 55.16 53.93 54.17 54.78 53.61 53.28 52.93 36.94
16.71 16.21 15.98 16.03 16.20 15.89 15.75 15.47 11.96
N, the number of soil samples for each land-use type.
J. Wang et al. / Geoderma 216 (2014) 1–9
5
Table 3 Cross-validation results of GA-PLSR and PLSR models for As, Pb, Zn, Cu and OM with the entire soil samples (N = 96).a Properties
As Pb Zn Cu OM
GA-PLSR
PLSR
Number of bands
LVs
r2CV
RMSECV
Bias
Slope
LVs
r2CV
RMSECV
Bias
Slope
405 495 551 330 480
8 8 9 8 9
0.63 0.60 0.60 0.69 0.76
1.68 3.89 12.81 2.26 4.62
−0.02 0.01 −0.15 −0.01 −0.04
0.65 0.65 0.69 0.75 0.82
9 8 10 7 8
0.56 0.49 0.58 0.58 0.66
1.84 4.42 13.17 2.63 5.51
−0.01 −0.03 −0.07 −0.02 0.02
0.61 0.55 0.70 0.64 0.71
a Number of bands, the number of the wavelengths selected by the GA; LVs, the optimum number of latent variables; r2CV, determination coefficient for cross-validation; RMSECV, root mean square error of cross-validation; bias, the mean error between the predicted and measured values; slope, the slope of best-fit line for the measured against predicted values. The unit of RMSECV and bias for As, Pb, Zn and Cu is mg kg−1, while g mg−1 for OM.
were developed with the three datasets to predict the four heavy metals (Table 4). Fig. 2 illustrates the scatter plots of the measured against predicted concentrations according to the GA-PLSR models. The number of the selected wavelengths for the 12 GA-PLSR models changed from 165 to 461 in terms of the three datasets and the type of the heavy metals. The cross-validated accuracy of the GA-PLSR models were moderate (r2CV = 0.53–0.77, bias = − 0.12–0.14 mg kg− 1), except the model for As with dataset #2 (r2CV = 0.23, bias = 0.16 mg kg− 1). According to the validations, the GA-PLSR model for Cu (r2Pre = 0.73, bias = −0.03 mg kg−1, RPD = 1.96) defined on dry land in dataset #1 could be transferred to paddy field, and the GA-PLSR model for Zn (r2Pre = 0.75, bias = − 1.05 mg kg− 1, RPD = 2.02) and Cu (r2Pre = 0.89, bias = − 0.21 mg kg− 1, RPD = 2.98) with dataset #2 were also able to be transferred from woodland to grassland. The good performances were also illustrated in Fig. 2c-2, d-1 and d-2, most of the points were in the vicinity of 1:1 line. The predictions of the other 9 GA-PLSR models did not show good transferability (RPD = 0.63–1.46, bias = −4.40–5.06 mg kg−1). For example, the model for Zn defined on dry land in dataset #1 had an excellent fit-line of the prediction (y = 0.94x − 1.17) (Fig. 2c-1) with a r2Pre value of 0.68, indicating relatively strong correlations between the measured values and predicted values for Zn. However, the bias of prediction was −4.40 mg kg−1 and RPD was 1.46, which made the model not convincing to predict the Zn concentration of paddy field. In addition, the transferability of the GA-PLSR models with dataset #3 was also poor from farmland to non-farmland. According to Fig. 2a-3, b-3, c-3 and d-3, the samples from shrubland and grassland (both
belonging to non-farmland) had relatively large biases. Therefore, we removed the soil samples from the two land-use types and validated the models again. The prediction accuracy was significantly improved for As (r2Pre = 0.63, bias = −0.24 mg kg−1, RPD = 1.68), Pb (r2Pre = 0.58, bias = − 0.16 mg kg− 1, RPD = 1.47) and Cu (r2Pre = 0.53, bias = 1.17 mg kg− 1, RPD = 1.41) except for Zn. It is possibly due to the cross-validated accuracy of the models for Zn was relatively low (r2CV = 0.53). 3.4. The function of OM in the assessment of the heavy metal concentrations Table 5 lists the correlations between the soil properties (N = 96). OM was moderately correlated with Pb (r = 0.59), Zn (r = 0.44) and Cu (r = 0.49) at a significance level of 0.01, while no significant correlation was found between OM and As (r = −0.13). We also calculated the correlations between the mean values of OM and the four metals for the nine land-use types (Table 2). As a result, Pb, Zn and Cu held excellent correlations with OM at a significance level of 0.001 with r = 0.99, 1.00 and 1.00, respectively. However, the correlation between As and OM was relatively weak with r = 0.74. After the sorting by the measured OM contents, the entire soil samples (N = 96) were graded into four groups (0–10, 10–20, 20–30 and over 30 g kg−1). Table 6 gives the MAREs between the measured values and the predicted values of the OM and heavy metals for each group. In general, the MAREs of the metals except As had negative relation with the OM contents. It indicates that the prediction accuracies of the GAPLSR and PLSR models had positive relations with the OM contents. Moreover, the performances of the GA-PLSR models were better than
Table 4 Cross-validation and validation results of GA-PLSR models for As, Pb, Zn and Cu with the three sample datasets described in Section 2.6.2.a Metals
Number of bands
Cross-validation LVs
r2CV
Validation RMSECV
Bias
Slope
r2Pre
Slope
RPD
0.60 0.67 0.94 0.82
1.09 1.00 1.46 1.96
RMSEP
Bias
461 375 360 371
Training set (dry land, N = 34) 5 0.64 1.29 5 0.62 4.06 6 0.64 13.42 6 0.58 2.77
0.04 0.04 0.14 −0.05
0.70 0.70 0.78 0.68
Test set (paddy field, N = 14) 0.35 1.64 0.44 0.46 6.05 −3.89 0.68 10.90 −4.40 0.73 1.70 −0.03
Dataset #2: As Pb Zn Cu
446 165 285 345
Training set (woodland, N 5 0.23 5 0.67 6 0.77 5 0.59
0.16 −0.04 0.00 −0.12
0.39 0.67 0.77 0.59
Test set (grassland, N = 13) 0.03 2.57 1.28 0.36 3.03 1.54 0.75 7.33 −1.05 0.89 1.38 −0.21
0.21 0.81 0.68 0.79
0.63 0.82 2.02 2.98
Dataset #3: As Pb Zn Cu
416 390 405 360
Training set (farmland, N = 48) 5 0.56 1.37 4 0.58 4.12 4 0.53 13.92 4 0.65 2.33
0.02 −0.06 0.05 0.03
0.64 0.58 0.58 0.69
Test set (non-farmland, N = 44) 0.11 3.54 0.73 0.13 5.98 0.72 0.10 21.71 5.06 0.36 1.26 1.26
0.36 0.36 0.40 0.41
0.84 0.90 0.74 1.15
Dataset #1: As Pb Zn Cu
= 31) 2.95 3.34 4.87 2.09
a 2 r Pre, determination coefficient for prediction with the test set; RMSEP, root mean square error of prediction; RPD, the ratio of standard deviation of the test set to RMSEP; N, the number of soil samples for the training set and test set.
6
J. Wang et al. / Geoderma 216 (2014) 1–9
Fig. 2. Scatter plots of the measured against predicted concentrations of (a) As, (b) Pb, (c) Zn and (d) Cu for the three test sets, using the GA-PLSR models given in Table 4.
J. Wang et al. / Geoderma 216 (2014) 1–9 Table 5 Pearson's correlation coefficients between the soil properties (N = 96). Property pH OM As Pb Zn Cu
pH 1
OM −0.15 1
As
Pb
Zn
Cu
−0.04 −0.13 1
−0.29⁎⁎ 0.59⁎⁎
0.06 0.44⁎⁎ −0.30⁎⁎ 0.50⁎⁎
0.07 0.49⁎⁎ −0.01 0.42⁎⁎ 0.58⁎⁎
−0.12 1
1
7
et al., 2008b; Malley and Williams, 1997). The third reason is that the soil samples were collected from various land-use types, which is an important cause for a moderate prediction accuracy (Malley and Williams, 1997). In addition, the GA-PLSR models were defined with the linear modeling method. It is still an open question that the calibration of heavy metal concentrations in soil is linear or non-linear processes with VNIR reflectance spectroscopy. Therefore, the non-linear calibration techniques (e.g., support vector regression, SVR) or the combination of linear and nonlinear methods (e.g., PLS-SVR and PC-SVR) will be investigated to improve the performance of the calibration models in the future.
1
⁎⁎ Correlation is significant at the p = 0.01 level (2-tailed).
the PLSR models since the MAREs of each group were lower for GA-PLSR models. Fig. 3 illustrates the important wavelengths based on the Variable Importance Projection (VIP) and b-coefficient of over the entire wavelength range (400–2450 nm). The important wavelengths of OM and the four metals were overlapped in the range of 400–500 nm and 1900–2450 nm. It indicates that OM, as a spectrally-active element, might play an essential function in the predictions of the heavy metal concentrations from VNIR reflectance.
4.2. Transferability analysis of the GA-PLSR models As described in Section 3.3, only three GA-PLSR models defined on one subset of land-use types had good transferability to some other land-use types. The prediction accuracies of the other 9 GA-PLSR models had a wide difference from the cross-validation accuracies (Table 4). The basic reason is that the training and test sets from different landuse types had disproportioned distributions (Davey et al., 2009). Moreover, due to the limited size of soil samples, the GA-PLSR models were defined and validated on the soil samples from a group of similar land-use types instead of completely homogeneous land-use type. It may also influences the prediction accuracy as discussed in Section 4.1. Further research will investigate deeply the transferability of GA-PLSR models between two or more homogeneous land-use types with the support of adequate soil samples. Moreover, it is also encouraging to verify the transferability of GA-PLSR models for predicting soil heavy metal concentrations from laboratory spectra to field spectra and even to hyperspectral remotely-sensed imagery.
4. Discussion 4.1. Accuracy analysis of the GA-PLSR models In this study, we combined GA and PLSR to define the calibration model to predict the low concentrations of heavy metals from VNIR reflectance. It proved that the GA-PLSR models achieved better crossvalidated accuracies of the soil properties than the general PLSR models (Table 3). It is because the GA simplified the calibration models by selecting an array of meaningful wavelengths that represented the lowest RMSECV, while the general PLSR method employed the entire wavelengths containing a great deal of redundant information. The previous literatures focused on the general PLSR method and reported r2CV values of 0.42–0.72 for As, 0.45–0.81 for Pb, 0.50–0.93 for Zn and 0.41–0.91 for Cu (Kooistra et al., 2001; Malley and Williams, 1997; Ren et al., 2009; Siebielec et al., 2004; Vohland et al., 2009; Wu et al., 2005, 2007). The GA-PLSR models in this study got r2CV values of 0.56–0.64 for As, 0.58–0.67 for Pb, 0.53–0.77 for Zn and 0.58–0.69 for Cu (Tables 3 and 4). The only exceptional case was the model defined on woodland for As (r2CV = 0.23). Thus, the GAPLSR modeling is a promising method to predict the low heavy metal concentrations from VNIR reflectance. However, according to the interpretation of r2 given by Williams (2004), the cross-validated accuracies of the GA-PLSR models were still moderate since the r2CV values were smaller than 0.82. The essential reason is that the heavy metals in soil are spectrally featureless, and thus the spectral information of the metals is rather limited in the VNIR reflectance spectra (Wu et al., 2007). Another reason is that the statistical distributions of the measured heavy metal concentrations were skewed and the coefficients of variation were relatively low (Table 1). As the previous literatures declared, the normal distribution and high variability were the influential factors for good prediction accuracies of soil properties such as Fe and organic carbon contents (Gomez
4.3. Predictive mechanism analysis According to the analysis results in Section 3.3, the relationships between the OM content and the concentrations of Pb, Zn and Cu were demonstrated from three different views: (i) significant correlations based on the measured values (Table 5); (ii) positive relations based on the predicted values (Table 6); and (iii) similar important wavelengths (Fig. 3). Such relationships might be supported by the essential nature of soil that phenolic hydroxyls or carboxylic groups in OM were considered strongly to be bound with the metals (e.g., Pb, Zn and Cu) (Logan et al., 1997). Moreover, OM has high priority to interact with heavy metals compared with other soil properties (Dupuy and Douay, 2001). As an exceptional case, the As concentration had no significant correlation with the OM content at a significance level of 0.01 (Table 5) and the prediction accuracy for As did not have strong correlation with the OM content (Table 6). Compared with the other metals (Pb, Zn and Cu), As behaved abnormally. Wu et al. (2005) also found the anomaly for As due to its geochemical characteristics. Thus, OM had no clear effects on the prediction of the As concentration.
Table 6 The mean absolute relative error (MARE) of As, Pb, Zn, Cu and OM based on the sorting of the measured OM contents (N = 96), using the GA-PLSR and PLSR models given in Table 3.a OM (g mg−1)
Mean (g mg−1)
n
0–10 10–20 20–30 Over 30
5.50 14.33 25.26 34.12
6 27 37 26
a
MARE (%) for GA-PLSR models
MARE (%) for PLSR models
As
Pb
Zn
Cu
OM
As
Pb
Zn
Cu
OM
17.92 12.68 14.58 12.78
20.36 14.81 17.03 13.30
44.13 17.13 19.62 11.07
13.75 13.28 11.74 8.90
125.44 20.96 13.17 10.02
24.51 11.18 17.78 15.80
40.32 16.01 17.36 15.14
38.85 18.36 23.14 11.75
15.71 13.43 13.82 10.80
139.94 21.43 16.29 11.68
MARE ¼ ∑jcross‐validation predicted value−meausred valuej=measured value= n , n is the number of soil samples in each group.
8
J. Wang et al. / Geoderma 216 (2014) 1–9
Fig. 3. Plots of important wavelengths based on Variable Importance Projection (VIP) and b-coefficient over the entire wavelength range (400–2450 nm), using the PLSR models given in Table 3.
Because of the limitation of data collection, we only covered OM and the four metals without consideration about the functions of other soil properties. In fact, other soil properties (clay and Fe) also had close interactions with the heavy metals (Kooistra et al., 2001; Ren et al., 2009; Wu et al., 2005, 2007). Thus, further research is needed to ascertain whether OM or the combined effect of OM and other spectrally-active properties significantly influences the assessment of the low heavy metal concentrations. 5. Conclusions This study defined the GA-PLSR models to predict the low heavy metal concentrations in agricultural soils from VNIR reflectance. With the entire soil samples, the GA-PLSR models gave r2CV values of 0.60–0.69 for the four metals while the general PLSR models provided r2CV values of 0.49–0.58. Based on the three different datasets, except the GA-PLSR model defined on woodland for As (r2CV b 0.50), the other models reported r2CV values of 0.53–0.77, and there were three GA-PLSR models with RPD values N1.5 with r2Pre values of 0.73–0.89. Therefore, compared with the general PLSR method, the GA-PLSR modeling is a promising approach to predict the low heavy metal concentrations from VNIR reflectance, and it has the potential of transferability from one subset of land-use types to some other types. Based on the three different aspects, it is convincing that OM plays an important role in the prediction of the concentrations of Pb, Zn and Cu. In addition, despite the negative results reported in this study, we are optimistic of the potential of the GAPLSR models in the transferability from laboratory-measured data to filed-measured data, and even to the hyperspectral remote sensing imagery. Acknowledgments This study was supported by the Special Foundation of Ministry of Finance of China for Nonprofit Research of Forestry Industry (Grant No. 200904001) and the National Natural Science Foundation of China (Grant No. 41171290 and No. 41023001). References Chang, C.W., Laird, D.A., Mausbach, M.J., Hurburgh Jr., C.R., 2001. Near-infrared reflectance spectroscopy-principal components regression analyses of soil properties. Soil Sci. Soc. Am. J. 65 (2), 480–490. Choe, E., van der Meer, F., van Ruitenbeek, F., van der Werff, H., de Smeth, B., Kim, K.W., 2008. Mapping of heavy metal pollution in stream sediments using combined
geochemistry, field spectroscopy, and hyperspectral remote sensing: a case study of the Rodalquilar mining area, SE Spain. Remote Sens. Environ. 112 (7), 3222–3233. Clevers, J., Kooistra, L., Salas, E., 2004. Study of heavy metal contamination in river floodplains using the red-edge position in spectroscopic data. Int. J. Remote Sens. 25 (19), 3883–3895. Davey, M.W., Saeys, W., Hof, E., Ramon, H., Swennen, R.L., Keulemans, J., 2009. Application of visible and near-infrared reflectance spectroscopy (Vis/NIRS) to determine carotenoid contents in banana (Musa spp.) fruit pulp. J. Agric. Food Chem. 57 (5), 1742–1751. Dupuy, N., Douay, F., 2001. Infrared and chemometrics study of the interaction between heavy metals and organic matter in soils. Spectrochim. Acta A Mol. Biomol. Spectrosc. 57 (5), 1037–1047. Gill, M.K., Asefa, T., Kemblowski, M.W., McKee, M., 2006. Soil moisture prediction using support vector machines1. J. Am. Water Res. Assoc. 42 (4), 1033–1046. Goicoechea, H.C., Olivieri, A.C., 2003. A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy. J. Chemom. 17 (6), 338–345. Gomez, C., Lagacherie, P., Coulouma, G., 2008a. Continuum removal versus PLSR method for clay and calcium carbonate content estimation from laboratory and airborne hyperspectral measurements. Geoderma 148 (2), 141–148. Gomez, C., Viscarra Rossel, R.A., McBratney, A.B., 2008b. Soil organic carbon prediction by hyperspectral remote sensing and field vis-NIR spectroscopy: an Australian case study. Geoderma 146 (3–4), 403–411. Inoue, Y., Sakaiya, E., Zhu, Y., Takahashi, W., 2012. Diagnostic mapping of canopy nitrogen content in rice based on hyperspectral measurements. Remote Sens. Environ. 126, 210–221. Jackson, M.L., 1960. Soil Chemical Analysis. Prentice-Hall. Jarmer, T., Vohland, M., Lilienthal, H., Schnug, E., 2008. Estimation of some chemical properties of an agricultural soil by spectroradiometric measurements 1. Pedosphere 18 (2), 163–170. Jarvis, R.M., Goodacre, R., 2005. Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data. Bioinformatics 21 (7), 860–868. Kemper, T., Sommer, S., 2002. Estimate of heavy metal contamination in soils after a mining accident using reflectance spectroscopy. Environ. Sci. Technol. 36 (12), 2742–2747. Kooistra, L., Wehrens, R., Leuven, R., Buydens, L., 2001. Possibilities of visible-near-infrared spectroscopy for the assessment of soil contamination in river floodplains. Anal. Chim. Acta. 446 (1–2), 97–105. Kooistra, L., Wanders, J., Epema, G., Leuven, R., Wehrens, R., Buydens, L., 2003. The potential of field spectroscopy for the assessment of sediment properties in river floodplains. Anal. Chim. Acta. 484 (2), 189–200. Kooistra, L., Salas, E., Clevers, J., Wehrens, R., Leuven, R., Nienhuis, P., Buydens, L., 2004. Exploring field vegetation reflectance as an indicator of soil contamination in river floodplains. Environ. Pollut. 127 (2), 281–290. Logan, E., Pulford, I., Cook, G., Mackenzie, A.B., 1997. Complexation of Cu2+ and Pb2+ by peat and humic acid. Eur. J. Soil Sci. 48 (4), 685–696. Malley, D., Williams, P., 1997. Use of near-infrared reflectance spectroscopy in prediction of heavy metals in freshwater sediment by their association with organic matter. Environ. Sci. Technol. 31 (12), 3461–3467. Moros, J., Vallejuelo, S.F.O., Gredilla, A., Diego, A., Madariaga, J.M., Garrigues, S., Guardia, M., 2009. Use of reflectance infrared spectroscopy for monitoring the metal content of the estuarine sediments of the Nerbioi–Ibaizabal River (Metropolitan Bilbao, Bay of Biscay, Basque Country). Environ. Sci. Technol. 43 (24), 9314–9320. Mouazen, A., Maleki, M., De Baerdemaeker, J., Ramon, H., 2007. On-line measurement of some selected soil properties using a VIS-NIR sensor. Soil Tillage Res. 93 (1), 13–27. Mouazen, A., Kuang, B., De Baerdemaeker, J., Ramon, H., 2010. Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy. Geoderma 158 (1–2), 23–31. Pandit, C.M., Filippelli, G.M., Li, L., 2010. Estimation of heavy-metal contamination in soil using reflectance spectroscopy and partial least-squares regression. Int. J. Remote Sens. 31 (15), 4111–4123. Ren, H.Y., Zhuang, D.F., Singh, A.N., Pan, J.J., Qiu, D.S., Shi, R.H., 2009. Estimation of As and Cu contamination in agricultural soils around a mining area by reflectance spectroscopy: a case study. Pedosphere 19 (6), 719–726. Saeys, W., Mouazen, A.M., Ramon, H., 2005. Potential for onsite and online analysis of pig manure using visible and near infrared reflectance spectroscopy. Biosyst. Eng. 91 (4), 393–402. Siebielec, G., McCarty, G.W., Stuczynski, T.I., Reeves III, J.B., 2004. Near-and mid-infrared diffuse reflectance spectroscopy for measuring soil metal content. J. Environ. Qual. 33 (6), 2056–2069. Summers, D., Lewis, M., Ostendorf, B., Chittleborough, D., 2011. Visible near-infrared reflectance spectroscopy as a predictive indicator of soil properties. Ecol. Indic. 11 (1), 123–131. Viscarra Rossel, R., Walvoort, D., McBratney, A., Janik, L., Skjemstad, J., 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131 (1–2), 59–75. Vohland, M., Bossung, C., Fründ, H.C., 2009. A spectroscopic approach to assess traceheavy metal contents in contaminated floodplain soils via spectrally active soil components. J. Plant Nutr. Soil Sci. 172 (2), 201–209. Vohland, M., Besold, J., Hill, J., Fründ, H.C., 2011. Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy. Geoderma 166 (1), 198–205.
J. Wang et al. / Geoderma 216 (2014) 1–9 Wei, B., Yang, L., 2010. A review of heavy metal contaminations in urban soils, urban road dusts and agricultural soils from China. Microchem. J. 94 (2), 99–107. Williams, P., 2004. Near-infrared Technology: Getting the Best Out of Light: A Short Course in the Practical Implementation of Near-infrared Spectroscopy for the User. PDK Projects, Incorporated. Wu, Y., Chen, J., Wu, X., Tian, Q., Ji, J., Qin, Z., 2005. Possibilities of reflectance spectroscopy for the assessment of contaminant elements in suburban soils. Appl. Geochem. 20 (6), 1051–1059. Wu, Y.C., Ji, J., Gong, J., Liao, P., Tian, Q., Qingjiu Ma, H., 2007. A mechanism study of reflectance spectroscopy for investigating heavy metals in soils. Soil Sci. Soc. Am. J. 71 (3), 918.
9
Wu, S.H., Zhou, S.L., Yang, D.Z., Liao, F.Q., Zhang, H.F., Ren, K., 2008. Spatial distribution and sources of soil heavy metals in the outskirts of Yixing City, Jiangsu Province, China. Chin. Sci. Bull. 53, 188–198. Zhang, R., Yao, Q., Ji, Y., Wang, P., 2005. A study on law of non-point source pollutants losses in a typical small watershed of Taihu Basin — a case study at Meilin watershed in Yixing City of Jiangsu Province. Resour. Environ. Yangtze Basin 14 (1), 98–102 (in Chinese). Zhuang, P., McBride, M.B., Xia, H., Li, N., Li, Z., 2009. Health risk from heavy metals via consumption of food crops in the vicinity of Dabaoshan mine, South China. Sci. Total Environ. 407 (5), 1551–1561.