Geoderma 363 (2020) 114163
Contents lists available at ScienceDirect
Geoderma journal homepage: www.elsevier.com/locate/geoderma
Rapid estimation of soil cation exchange capacity through sensor data fusion of portable XRF spectrometry and Vis-NIR spectroscopy
T
Mengxue Wana,b,g, Wenyou Hua, Mingkai Qua, Weidong Lic, Chuanrong Zhangc, Junfeng Kange, ⁎ Yongsheng Hongd, Yong Chenf, Biao Huanga, a
Key Laboratory of Soil Environment and Pollution Remediation, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China University of Chinese Academy of Sciences, Beijing 100049, China c Department of Geography, University of Connecticut, Storrs CT 06269, USA d School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China e School of Architectural and Surveying & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China f Department of Ecosystem Science and Management, Texas A&M University, College Station, TX 77843, USA g Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions (Henan University), Ministry of Education, Kaifeng 475004, China b
ARTICLE INFO
ABSTRACT
Handling Editor: Morgan Cristine
Soil cation exchange capacity (CEC) is a critical property of soil fertility. Conventionally, it is measured using laboratory chemical methods, which involve complex sample preparation and are time-consuming and expensive. Previous studies have investigated nondestructive and rapid methods for determining soil CEC using proximal soil sensors individually, including portable X-ray fluorescence (PXRF) spectrometry and visible nearinfrared reflectance (Vis-NIR) spectroscopy. In this study, we examined the potential of the fusing data from PXRF and Vis-NIR to predict soil CEC for 572 soil samples from Yunnan Province, China. The CEC of the samples ranged from 5.42 to 50.25 cmol kg−1. Both partial least-squares regression (PLSR) and support vector machine regression (SVMR) were applied to predict soil CEC with individual sensor datasets and a fused sensor dataset for comparison. The root mean squared error (RMSE), coefficients of determination (R2), and ratios of performance to interquartile range (RPIQ) were used to evaluate the performance of the models. Results showed that: (1) SVMR performed better than PLSR on single sensor datasets and the fused sensor dataset, in terms of RMSE, R2, and RPIQ; and (2) both PLSR and SVMR based on the fused sensor dataset had better predictive power (RMSE = 4.02, R2 = 0.72, and RPIQ = 2.23 in PLSR model; RMSE = 3.02, R2 = 0.82, and RPIQ = 2.31 in SVMR model) than those based on any single sensor dataset. In summary, the fused sensor data and SVMR showed great potential for estimating soil CEC efficiently.
Keywords: Partial least-squares regression Support vector machine regression Proximal sensing technique Fused sensor data
1. Introduction Soil cation exchange capacity (CEC) is a measure of the total amount of cations such as Ca, Mg, K, and Na, that can be adsorbed by soil colloids. Its value also depends on the type of colloids, soil texture, pH, and organic matter (Kvalheim, 2010). Generally, CEC is higher for soil clay minerals with higher SiO2/(Al2O3 + Fe2O3) ratios, and it is lower in an acidic environment than in an alkaline environment. It is also related to the form of humic and fulvic acids within soil organic matter (SOM) (Buol and Kamprath, 1998). As such, soil CEC plays a key role in describing nutrient retention and supply or the fate of heavy
metals and pesticides (Leinweber et al., 1993), and is one of the most commonly examined soil chemical properties in soil sciences and the key indicator to soil quality and productivity (Ross and Ketterings, 1995; Charman and Murphy, 2007). Conventionally, soil CEC is commonly measured using laboratory chemical methods, such as the conventional ammonium-acetate method or summation of cations. These methods are costly, tedious, and time-consuming due to the complex procedures involved in sample preparation and chemical analysis (Zhang and Gong, 2012). Moreover, these methods are easily influenced by the saturation or extraction processes, the salt solution and pH, and leaching methods. Therefore, to
Abbreviations: LVs, latent variables; RMSE, root mean squared error; MLR, multiple linear regression; MSC, multiplicative scatter correction; PLSR, partial leastsquares regression; PTFs, pedotransfer functions; PXRF, portable X-ray fluorescence; RPIQ, ratio of performance to interquartile range; SVMR, upport vector machine regression; SG, Savitzky-Golay; VIP, variable importance in projection; Vis-NIR, visible near-infrared reflectance ⁎ Corresponding author. E-mail address:
[email protected] (B. Huang). https://doi.org/10.1016/j.geoderma.2019.114163 Received 27 June 2019; Received in revised form 7 December 2019; Accepted 25 December 2019 0016-7061/ © 2019 Elsevier B.V. All rights reserved.
Geoderma 363 (2020) 114163
M. Wan, et al.
reduce the experimental influence on soil CEC measurements, many studies have estimated CEC values of various soils by developing pedotransfer functions (PTFs), which establish quantitative relationships between CEC and other soil properties, such as soil pH, texture, and SOM, using statistical tools (Emamgolizadeh et al., 2015; Olorunfemi et al., 2016; Sulieman et al., 2018). Although it is easier to measure other soil properties (e.g., soil pH, texture, and SOM) than to measure CEC in laboratory, it is still labor intensive. Thus, there is a need for faster and cheaper methods for estimating soil CEC with comparable accuracy. Previous studies have attempted to characterize soil CEC using single sensor data, mainly using either visible near-infrared reflectance (Vis-NIR) spectroscopy with partial-least squares regression (PLSR) or portable X-ray fluorescence (PXRF) spectrometry with multiple linear regression (MLR); these are rapid and cost-effective alternatives (Sharma et al., 2015; Shepherd and Walsh, 2002). However, studies comparing the performance of the two instruments on soil CEC estimation and exploring the potential of fused sensor data for soil CEC estimation are rare (Horta et al., 2015; Mahmood et al., 2012). Therefore, using fused sensor data (i.e., PXRF and Vis-NIR) to improve soil CEC estimation would be highly meaningful. Multivariate statistics have been applied to calculate the relationships between soil CEC and other soil attributes including physicochemical properties, spectral absorptions, and elemental concentrations (Olorunfemi et al., 2016; Khaledian et al., 2017; Sharma et al., 2015; Sulieman et al., 2018; Ulusoy et al., 2016). PLSR is one of the widely used linear regression techniques and has a good capacity to handle data multicollinearity and estimate soil attributes based on linear relationships between soil properties and spectral data (O'Rourke et al., 2016; Vasques et al., 2008; Wold et al., 2001). However, nonlinear regressions such as support vector machine regression (SVMR), artificial neural networks, random forests, regression trees, and multivariate adaptive regression splines have attracted attention in recent years because the relationships between soil properties and spectral data or elemental concentrations are rarely linear in nature (Chang and Lin, 2011; Mouazen et al., 2010; Rizzo et al., 2016; Rossel and Behrens, 2010; Wang et al., 2015; Xu et al., 2018). A few studies have proved that nonlinear models outperformed linear models, especially for soil properties with high variability (Araújo et al., 2014; Lucà et al., 2017). Moreover, to the best of our knowledge, there was essentially no study estimating soil CEC by SVMR through either single sensor data or fused sensor data. The specific objectives of this study are to: (1) explore the potential of the fused sensor data (here PXRF and Vis-NIR) for rapid characterization of the soil CEC, (2) compare the performance of SVMR and PLSR models established with different sensor data to select the best model for soil CEC prediction, and (3) prove our hypothesis that fused sensor data could improve the accuracy of soil CEC prediction. The ultimate goal of this study is to find an improved method for more accurate and rapid estimation of CEC in soils.
analysis, < 0.25 mm for CEC analysis, and < 0.150 mm for Vis-NIR and PXRF scans), and stored in paper bags at room temperature for further analysis. Proximal sensing techniques might be influenced by the instrument environment, such as soil particle size, moisture, and geographical distribution of samples (Chang et al., 2005; Ge et al., 2011; Wang et al., 2013). Therefore, the air-dried and sieved soil samples were used for PXRF and Vis-NIR scanning in the controlled laboratory environment in this study to minimize the potential influences of particle size and moisture (Bendor et al., 1999; Silva et al., 2018). 2.2. Measurement of soil cation exchange capacity Soil CEC was measured by the ammonium acetate method (pH 7.0 for acidic and neutral soil and pH 8.5 for calcareous soil) (Zhang and Gong, 2012). 2.3. PXRF scanning Portable X-ray fluorescence (PXRF) spectrometry (NITON XLt 960, UK) was used to obtain elemental concentrations following the manufacturer’s instructions and the recommendations of the Method 6200 (USEPA, 2007). It was conducted with a 40 kV X-ray tube with Ag anode target excitation source and a silicon PIN-diode with a Peltier cooled detector. Prior to soil analysis, the instrument was calibrated by the factory set. Soil sample surface was scanned by PXRF directly in Geochem mode, and the results of each sample were calculated by averaging three measurements (Hu et al., 2014). Geochem mode consisted of three beams operating sequentially and each beam was set to scan for 30 s. Elemental data in this study were selected based on their contents higher than corresponding detection limits to avoid blank data in the modeling. The remaining 18 quantified elements (i.e., Ca, Fe, Mn, Cr, Ni, Cu, Zn, Pb, As, K, Ti, V, Rb, Sr, Zr, Nb, Si, and Al) were then processed through the mean centering and standard deviation scaling for modeling. 2.4. Vis-NIR scanning and spectral pretreatments Prior to spectra collection, soil samples were oven-dried at 45 °C for 24 h. Reflectance spectra were measured in the visible and near-infrared (Vis-NIR) range of 350–2,500 nm using Cary 5000 (Agilent Technologies, CA) under controlled laboratory conditions. It took about 3.5 min to scan one sample, and for each subset of 50 samples, two samples were randomly chosen for replicate measurements. The variation between replicates was < 0.3%. Spectra were sampled at 1-nm intervals with a resolution narrower than 0.048 nm in the visible range (350–700 nm) and narrower than 0.2 nm in the NIR range (700–2500 nm) to obtain 2151 wavelengths. Each soil spectrum was obtained as the mean of three replicate scans. Detailed information about the Cary 5000 spectrometer and protocols of spectra measurements can be found in Zeng et al. (2016). Quantitative spectroscopy analyses were implemented using the Unscrambler 9.3 (Camo Software AS, Oslo, Norway). To eliminate the noise at both edges of the spectrum, Vis–NIR spectra were reduced to 500–2,450 nm. Then, spectral reflectance (R) was transformed to absorbance (A) through the equation A = log10(1/R) to reduce the nonlinearities and scattering effects (Cheng et al., 2019; Gomez et al., 2008; Kemper and Sommer, 2002). Data redundancy was reduced by averaging every ten sequential bands. The remaining 196 wavebands were smoothed through a 9-point Savitzky-Golay (SG) filter with the first derivative using a window size of 11 wavelengths and polynomial order of 2 to reduce the baseline variation and improve the spectral features (Savitzky and Golay, 1964), and multiplicative scatter correction (MSC), which could remove the light scattering variation in the reflectance spectroscopy, was applied (Martens and Naes, 1989). The preprocessed data were then mean-centered and standard-deviationscaled for both PLSR and SVMR models.
2. Materials and methods 2.1. Study area and soil sampling In this study, 572 soil samples were collected from 142 soil profiles with multiple horizons from across the Yunnan Province, China (between longitude 97 to 105° E and latitude 20 to 28° N) (Fig. 1). Spatially, the soil samples were randomly selected from agricultural and forest fields with various parent materials in different climatic environments. The altitude of the sampling sites ranged from 92 m to 5396 m. We deliberately selected sampling sites with widely varying conditions to represent a wide variety of soil physicochemical properties. The soil profiles had a depth to the rock or to 1.2 m. The gravels and plant debris were removed by hand, and the soil samples were airdried at room temperature. Then the soil samples were crushed separately into different sizes (passing through sieves < 2 mm for pH 2
Geoderma 363 (2020) 114163
M. Wan, et al.
Fig. 1. Spatial distribution of 142 sampled soil profiles in Yunnan Province, China.
importance in projection (VIP) scores (Rossel and Behrens, 2010; Wold et al., 2001), which are calculated by:
2.5. Fused sensor dataset The mean-centered and standard-deviation-scaled PXRF and VisNIR data were concatenated directly in a single table as a variable matrix and defined as the fused sensor dataset in this study.
wak 2 (SSYa SSYt )
VIPk (a) = K a
where VIPk(a) is the importance of the kth predictor variable according to a model with a factors, Wak is the corresponding loading weight of the kth variable for the ath PLSR factor, SSYa is the sum of squares of the response variable explained by a PLSR model with a factors, SSYt is the total sum of squares of the response variable, and K is the total number of predictor variables. If the VIP score of a certain wavelength exceeds one, the wavelength is regarded as an important wavelength (Chong and Jun 2005). SVMR is a nonlinear modeling technique and it has obtained increasing popularity in classification and regression with its high computing speed and excellent performance (Bao et al., 2017; Vašát et al., 2017). The SVM reduces the original data to support vectors, and a support-vector network maps the input vectors into a high-dimensional feature space by a kernel function. Then it is feasible for deriving a linear hyperplane as a decision function to construct an optimal hyperplane for nonlinear space through back-transformation (Vapnik, 1999). The epsilon-SVM algorithm and radial basis function (kernel function) were used for modeling in this study. The parameter γ (1e-06, 3.16e-06, 1e-05, 3.16e-05) and cost parameter C (0.001, 0.003, 0.01, 0.03) were fine-tuned by a grid search method with cross-validation, and the optimal parameters were determined after the minimum RMSE was obtained. In the PLSR and SVMR models, Y is the vector of soil CEC measured by conventional laboratory analysis, and X is the sensor data matrix. The two models were performed using the PLS Toolbox version 8.02
2.6. Regression modeling The single sensor datasets and fused sensor dataset were each divided into two subsets for calibration and validation by the Rank-KS algorithm (Kennard and Stone, 1969; Xu et al., 2019). To make sure that the CEC values in the two subsets had the same distribution, the CEC values of 142 surface soils of the profiles were sorted in an increasing order, and then the data were divided into six blocks and the Kennard-Stone algorithm was implemented in each block. The data of two-thirds of the samples were regarded as the calibration dataset (94 soil profiles, 378 soil samples in total) and the remaining were chosen as the validation dataset (48 soil profiles, 194 soil samples in total). All samples from the same profile were allocated together into either the calibration set or the validation set. PLSR handles data multicollinearity using orthogonal factors by dimensionality reduction of spectral variables, namely latent variables (LVs), and these LVs are then used to optimize the covariance between soil property and spectral data in PLSR modeling by leave-one (profile) -out cross-validation. The optimal number of LVs was determined after the minimum root mean squared error (RMSE) was obtained. The van der Voet’s test (1994) was performed to test models with different numbers of extracted LVs against the model that minimizes the predicted residual sum of squares. For interpretation of PLSR, the significant wavelengths are commonly verified through the variable 3
Geoderma 363 (2020) 114163
M. Wan, et al.
(Eigenvector Research, Inc., Wenatchee, WA, USA) that ran under MATLAB version R2016a (The MathWorks, Inc., Natick, MA, USA).
Table 1 T-test and F-test for the three datasets.
2.7. Model evaluation The performance of the different models for predicting soil CEC was evaluated by the validation dataset using the indices of the coefficient of determination (R2), RMSE, and the ratio of performance to interquartile range (RPIQ). The R2 value describes the proportion of the total variance of the observed data that can be explained by the model. It ranges from 0 to 1, with higher values indicating better agreement. The RPIQ is the ratio of the interquartile distance (IQ = Q3-Q1) to the RMSE, which represents the spread of the residuals of the population (Bellon-Maurel et al., 2010). The RMSE is calculated by:
RMSE =
1 N
T-test
F-test
Whole – Calibration Whole – Validation Calibration – Validation
p-Value (α = 0.05) 0.99 0.99 0.96
0.62 0.45 0.28
Table 2 Descriptive statistics of elements determined by portable X-ray fluorescence spectrometry.
i=N
(y
Dataset
y )2
i=1
where y is the observed value, y is the predicted value, and N is the number of samples used for validation. Generally, a robust model has a high R2 and RPIQ, and a low RMSE. 3. Results 3.1. Soil properties and spectra The summary statistics of CEC measured by the laboratory chemical method for the whole, calibration, and validation datasets are presented in Fig. 2. The CEC values of calibration data varied from 5.42 to 50.25 cmol kg−1 with a mean value of 18.22 cmol kg−1 and a coefficient of variation (CV) value of 0.45. The validation data ranged from 5.91 to 45.55 cmol kg−1 with an average of 18.21 cmol kg−1 and a CV value of 0.42. The CV value of the whole dataset was 0.44, indicating that the CEC values were moderately variable in this study area (Wilding, 1985; Xu et al., 2018). According to Student’s t test and Levene’s test (Table 1), the CEC values of calibration and validation datasets were not statistically significantly different in terms of means and variances, respectively. Meanwhile, the Skewness and Kurtosis were used to test the normality of the CEC data, and their values (greater than1) indicated that CEC data was not normally distributed (Ryu, 2011). That is why RPIQ was used to evaluate the models. Eighteen elements were measured by PXRF in all soil samples (Table 2). Different elements had different content ranges and CVs. Most elements had standard deviation values higher than their mean values. That is especially true for Pb, which had the highest CV of 3.2. The Skewness and Kurtosis values were mostly much greater than 1.0, which could be interpreted that soil formation factors, especially the
Element mg kg−1
Mean
SD
Skewness
Kurtosis
CV
Ca Fe Mn Cr Ni Cu Zn Pb As K Ti V Rb Sr Zr Nb Al Si
7065.0 52291.8 8771.3 306.3 93.2 67.9 111.6 51.8 26.6 12145.7 108735.4 3083.0 92.4 66.1 317.5 57.6 92672.7 247614.5
20419.2 35013.9 25299.1 1363.3 114.4 58.1 340.7 132.9 39.9 6854.8 275506.8 37942.1 273.1 68.7 165.0 105.3 24093.9 55687.6
7.9 0.9 3.7 21.9 6.3 3.5 21.6 14.8 5.2 −0.001 2.4 23.5 17.2 5.2 0.9 2.8 0.6 −0.2
82.5 0.7 13.8 507.2 50.5 17.6 496.3 274.7 31.7 −0.3 3.8 559.0 358.8 37.9 2.8 6.6 0.2 −0.2
2.8 0.6 1.2 1.0 1.2 0.9 2.9 3.2 1.6 0.4 0.7 0.6 0.5 1.0 0.4 0.5 0.3 0.2
parent materials in the area of Yunnan Province, varied considerably. Average raw spectra and pretreated spectra at different soil CEC values are shown in Fig. 3. The mean reflectance of raw spectra showed a decreasing tendency with increasing CEC values. Compared with the raw spectra, the pretreated spectra indicated a reduction of the baseline variation and light scattering variation. The spectral features were enhanced and appeared at approximately 1400 nm, 1900 nm, and 2200 nm, which were related to the O–H stretching, H–O–H bending, and the clay lattice Al-OH absorption bands. 3.2. Correlation analysis Correlation coefficients between soil CEC and elemental data were calculated and are presented in Table 3. Except for As, Zn, and Pb, other PXRF elemental data had significant correlations with soil CEC at 0.01 or 0.05 levels. Soil CEC and Fe had the highest positive correlation with
Fig. 2. Descriptive statistics of soil CEC measured by laboratory chemical method for different datasets. Cal: Calibration; Val: Validation; Min: minimum; Max: maximum; SD: standard deviation; CV: coefficient of variation; n: the number of soil samples. 4
Geoderma 363 (2020) 114163
M. Wan, et al.
Fig. 3. Average reflectance spectra of soil samples at different CEC values in Yunnan Province, China: (a) raw spectra; (b) pretreated spectra. Table 3 Correlation coefficients between soil CEC and elemental data (n = 572). Elements
CEC
Elements
CEC
Elements
CEC
Ca Fe Mn Cr Ni Cu
0.092* 0.527** 0.222** 0.490** 0.338** 0.420**
Zn Pb As K Ti V
0.073 0.001 0.055 −0.268** 0.494** 0.515**
Rb Sr Zr Nb Al Si
−0.252** 0.265** −0.289** 0.312** 0.104* −0.511**
Table 4 Calibration and validation statistics of the PLSR models for soil CEC based on different sensor datasets. Sensor dataset
PXRF Vis-NIR PXRF + Vis-NIR
*Correlation is significant at the 0.05 level. **Correlation is significant at the 0.01 level.
Calibration dataset (n = 378)
Validation dataset (n = 194)
R2cva
RMSEcvb
R2prec
RMSEpred
RPIQe
0.49 0.50 0.76
5.36 5.47 3.96
0.50 0.52 0.72
5.30 5.40 4.02
0.82 0.87 2.23
a The determination coefficient between the predicted and measured value in cross-validation. b The root mean squared error of cross-validation (cmol kg−1). c The determination coefficient between the predicted and measured value in validation. d The root mean squared error of validation (cmol kg−1). e The ratio of performance to inter-quartile range.
models for CEC estimation provided different prediction accuracies in terms of different sensor datasets. The validation and calibration generated similar results, which meant that there was no over-fitting issue for the models. The RMSE in validation (RMSEpre) ranged from 4.02 to 5.40, Rpre2 from 0.50 to 0.72, and RPIQ from 0.82 to 2.23. The optimal model was obtained using the fused sensor dataset, with RMSEpre = 4.02, R2pre = 0.72, and RPIQ = 2.23. To visualize the prediction performance of the independent validation dataset, Fig. 5 presents the scatterplots of the observed versus PLSR-predicted values for CEC depending on different sensor datasets. For CEC estimation based on fused sensor dataset, the scatter points and regression line approximated the 1:1 line more closely than those based on single sensor datasets. Fig. 4. Correlations between soil CEC and the spectra.
3.3.2. Important variables for the PLSR model VIP scores indicated the important variables for the PLSR model interpretation of the relationships between soil CEC and different sensor datasets (Fig. 6). The VIP scores of Si, Fe, Cr, Ni, Cu, Ti, V, Sr, and Zr exceeded 1.0, indicating that they were important elements to the PLSR model depending on the PXRF data (Fig. 6). These soil elements were significantly related to soil CEC (Table 3). For Vis-NIR data, the most important wavelengths for CEC prediction were around 1,400, 1,900, 2,200, 2,300, and 2,400 nm. These wavebands were significantly related to the frequency peaks of C–H, O–H, H-O–H and Al-OH absorption bands. The VIP scores of the optimized PLSR prediction based on fused datasets were different from those based on single sensor datasets (Fig. 6). According to the VIP score plot, the major contributors for CEC prediction were Al, Si, Fe, Cr, Ni, Cu, Zn, K, Ti, V, Rb, Sr, Zr, and Nb, along with the wavelengths around 1,400, 1,900, 2,200, 2,300, and 2,400 nm. Compared with the contributors of single PXRF or Vis-NIR
a coefficient of 0.527 at the significance level of 0.01. The correlation coefficients between soil CEC and spectra are plotted in Fig. 4. The correlation between soil CEC and the raw reflectance spectra was negative, however, the curve of the correlation coefficients was smooth without expected peaks. As for the pretreated spectra with SG smoothing and MSC, there were positive and negative correlation peaks related to soil CEC and the spectra. The positive correlation peaks were at 1390 nm, 1870 nm, 2200 nm, and 2360 nm based on SG + MSC pretreated spectra. 3.3. Soil CEC estimation using multivariate modeling 3.3.1. Soil CEC estimation using PLSR Soil CEC was predicted by PLSR based on PXRF data, Vis-NIR data, and fused sensor data in this study (Table 4 and Fig. 5). The PLSR 5
Geoderma 363 (2020) 114163
M. Wan, et al.
Fig. 5. Validation scatter plots of lab-measured CEC vs. predicted CEC by PLSR using different sensor datasets. The dash line is the 1:1 line, and the red solid line is the regression line. The colored regions represent 95% prediction confidence intervals. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
to 4.66), R2pre (from 0.62 to 0.82), and RPIQ (from 1.20 to 2.31) (Table 5). Compared with PLSR, SVMR obtained better results of soil CEC estimation based on both single sensor data and fused sensor data (Table 4 and Table 5). Among the SVMR models for CEC estimation, the validation result for the fused sensor dataset had the lowest RMSEpre (3.02), and highest R2pre (0.82) and RPIQ (2.31), which meant the SVMR model based on fused sensor dataset was more accurate than the other models. The scatterplots of the observed versus SVMR-predicted values for CEC depending on PXRF, Vis-NIR, and fused sensor datasets are presented in Fig. 7. For the fused sensor dataset (Fig. 7c), the regression line for the SVMR model using the independent validation dataset matched well within the 95% prediction confidence intervals, thus indicating a good fit. 4. Discussions Vis-NIR spectroscopy with PLSR and PXRF spectrometry with MLR have been individually used for the rapid estimation of soil CEC in earlier studies (Sharma et al., 2015; Soriano-Disla et al., 2014; Rossel et al., 2006), which mainly focused on linear relationships between some PXRF-measured elements/Vis-NIR-measured spectra and soil CEC. However, PXRF-measured elements/Vis-NIR-measured spectra may not have direct linear relationships with soil CEC. Therefore, in this study, we compared the soil CEC predictions using both linear and nonlinear regressions with Vis-NIR and PXRF data to verify the direct and indirect contributors to the regressions. The results showed that SVMR outperformed PLSR for both single sensor datasets and fused sensor data, suggesting that the relationships between sensor data (spectral data or elemental concentrations) and soil CEC are not always linear in nature. When only single sensor data (Vis-NIR data or PXRF data) were available, SVMR was more effective than PLSR in predicting soil CEC with reasonable accuracy in the study area. Sharma et al. (2015) reported that MLR with PXRF and auxiliary data (clay, pH, organic matter) could improve the prediction of soil CEC compared with using pure PXRF elemental data. However, these auxiliary data also needed conventional laboratory analysis, which is timeand cost-consuming. If auxiliary data (clay, pH, organic matter) could be replaced by the other sensor data, it would save time and money to obtain a reasonable accuracy. Good correlations with soil Vis-NIR spectral reflectance have been reported for clay content, pH, and SOM in particular (Rossel et al., 2006; Zeng et al., 2016; Hong et al., 2017, 2019). Therefore, we hypothesized that Vis-NIR could provide valuable auxiliary data to obtain a better soil CEC prediction. As indicated by the lower RMSE, and higher R2 and RPIQ in this study, PLSR based on fused use of PXRF and Vis-NIR datasets can produce more accurate predictions than that based on either single sensor dataset. The reason may be that the fused datasets can provide more comprehensive information.
Fig. 6. Variable importance in projection (VIP) scores of variables from PLSR model for predicted CEC with different sensor datasets. The dash line is the threshold for VIP score. Table 5 Calibration and validation statistics of the SVMR models for soil CEC based on different sensor datasets. Sensor dataset
PXRF Vis-NIR PXRF + Vis-NIR
Calibration dataset (n = 378)
Validation dataset (n = 194)
R2cva
RMSEcvb
R2prec
RMSEpred
RPIQe
0.74 0.60 0.86
3.96 4.76 2.90
0.72 0.62 0.82
4.06 4.66 3.02
1.60 1.20 2.31
a The determination coefficient between the predicted and measured value in cross-validation. b The root mean squared error of cross-validation (cmol kg−1). c The determination coefficient between the predicted and measured value in validation. d The root mean squared error of validation (cmol kg−1). e The ratio of performance to inter-quartile range.
data, the corresponding VIP scores for PXRF increased while Vis-NIR decreased in fused sensor dataset. 3.3.3. Soil CEC estimation using SVMR The SVMR models based on the three sensor datasets provided different prediction accuracies, in terms of RMSEpre (ranging from 3.02 6
Geoderma 363 (2020) 114163
M. Wan, et al.
Fig. 7. Validation scatter plots of lab-measured CEC vs. predicted CEC by SVMR using different sensor datasets. The dash line is the 1:1 line, and the red solid line is the regression line. The colored regions represent 95% prediction confidence intervals. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Meanwhile, compared with single sensor data, VIP scores based on fused sensor data were higher in the PXRF region and lower in the VisNIR region in this study, indicating that the PXRF elemental data contributed more to the PLSR model with fused sensor data. According to the VIP scores of this study, the important variables of PXRF elemental data are Ca, Fe, Mn, Cr, Ni, Cu, K, Ti, V, Rb, Sr, Zr, Nb, Si, and Al, which are also significantly related with soil CEC. Given that desiliconization and aluminization play a key role in soil weathering and soil texture, these soil elements may contribute to CEC prediction (Stockmann et al., 2016). Because there is no direct spectral response on the Vis-NIR spectra to soil CEC, the CEC is actually indirectly predicted due to its correlation with other properties (SOM and texture) (O'Rourke et al., 2016; Stenberg et al., 2010). Therefore, the important wavelengths associated with SOM and soil texture estimation might be useful for soil CEC prediction. Based on the VIP scores of this study, the important wavelengths were found to be around 1400, 1900, 2200, 2300, and 2400 nm. These wavelengths are related to the C–H group, O–H stretching, H-O–H bending, and clay lattice Al-OH absorption band (Hong et al., 2019; Rossel and Behrens, 2010), which play important roles in predicting SOM and soil texture. According to the prediction results, the SVMR depending on fused sensor data produced a more compelling soil CEC prediction, with lower RMSE, and higher R2 and RPIQ in the validation. Although SVMR requires tuning parameters, it avoids the step of selecting the specific elements for linear regression and performs better than PLSR. This also means that nonlinear relationships exist in nature between spectral/ elemental data and soil CEC (Araújo et al., 2014; Webster, 2000; Xu et al., 2018). In addition, these multivariate proximal sensor techniques can scan soil samples nondestructively and rapidly. Therefore, this study proved the effectiveness of soil CEC estimation using SVMR and fused data, which is environmentally friendly and saves time and money, making the nondestructive and rapid analysis of larger sample sets feasible. Therefore, using the fused sensor data of PXRF and VisNIR is an optimized alternative relative to the conventional laboratory method for the estimation of CEC in soils when the proximal sensor instruments were available. Since it is difficult for Vis-NIR spectroscopy or PXRF spectrometry alone to provide a proper soil characterization, its application in combination with other proximal sensing technologies should be considered in future studies (Brown et al., 2006; Horta et al., 2015) due to the potential of using multi-source data to increase prediction accuracy and robustness. Vis-NIR is widely applied to measure soil organic and mineral components, while PXRF has the potential to estimate the inorganic element concentrations in the soil, by scanning the same soil sample rapidly. Therefore, using the measured data from both sensors can achieve a more accurate and efficient prediction of soil CEC. Future studies should focus on using multiple remote and proximal sensing
techniques to simultaneously predict multiple soil properties for agricultural and environmental applications. 5. Conclusions In this study, the predictive abilities of the PLSR and SVMR models were compared based on different single sensor datasets and a fused sensor dataset for the rapid estimation of soil CEC. The results were promising: SVMR with fused sensor dataset obtained the best soil CEC prediction (RMSE = 3.02, R2 = 0.82, and RPIQ = 2.31) compared with PLSR or SVMR with the single sensor datasets as well as PLSR with the fused sensor dataset, as verified by the validation data. The general order of different data and model combinations in terms of predictive ability is Vis-NIR + PXRF (SVMR) > Vis-NIR + PXRF (PLSR) ≥ PXRF (SVMR) > Vis-NIR (SVMR) > Vis-NIR (PLSR) ≈ PXRF (PLSR). In general, using fused sensor data of PXRF and Vis-NIR with SVMR is suitable for effectively estimating a wide range of soil CEC. Such a finding may represent an important step forward for rapid soil chemical analysis with reasonable accuracy, and provide a rapid and accurate complement to the laboratory chemical method, which may be of great value to actual production agriculture. This study also indicated that single sensor data (i.e., PXRF or VisNIR) is insufficient to provide a comprehensive characterization of soil CEC, and the use of multiple sensing technologies together with appropriate models should be a better choice in soil property estimation. We, therefore, recommend the use of SVMR and fused sensor data from PXRF and Vis-NIR for rapid soil CEC determination. Although fused multi-sensor data is promising ex-situ, as shown in this study, further studies should evaluate the accuracy and efficacy of the fused method in-situ for prediction of soil properties. Acknowledgments This work was financially supported by the National Key Research and Development Program (Grant No. 2018YFC1802601), the Key Frontier Project of Institute of Soil Science, Chinese Academy of Sciences (Grant No. ISSASIP1629), the National Science and Technology Basic Special Program (2014FY110200A10), the Open Fund of Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions (Henan University), Ministry of Education (Gran No. GTYR201901), and China Scholarship Council. The authors thank Professor Ganlin Zhang and Doctor Rong Zeng for providing the visible near-infrared reflectance (Vis-NIR) spectroscopy data of soil samples in this study area. We gratefully thank two anonymous reviewers and editor for their valuable comments and suggestions regarding this paper. 7
M. Wan, et al.
Geoderma 363 (2020) 114163
References
266–274. O'Rourke, S.M., Stockmann, U., Holden, N.M., Mcbratney, A.B., Minasny, B., 2016. An assessment of model averaging to improve predictive power of portable vis-NIR and XRF for the determination of agronomic soil properties. Geoderma 279, 31–44. Rizzo, R., Demattê, J.A.M., Lepsch, I.F., Gallo, B.C., Fongaro, C.T., 2016. Digital soil mapping at local scale using a multi-depth Vis–NIR spectral library and terrain attributes. Geoderma 274, 18–27. Ross, D.S., Ketterings, Q., 1995. Recommended methods for determining soil cation exchange capacity. In: Sims, J.Y., Wolf, A. (Eds.), Recommended soil testing procedures for the Northeastern United States. Northeastern Regional Bulletin #493. Ag Experiment Station, University of Delaware, Newark, DE, pp. 62–70. Rossel, R.A.V., Behrens, T., 2010. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158, 46–54. Rossel, R.A.V., Walvoort, D.J.J., McBratney, A.B., Janik, L.J., Skjemstad, J.O., 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, 59–75. Ryu, E., 2011. Effects of skewness and kurtosis on normal-theory based maximum likelihood test statistic in multilevel structural equation modeling. Behav. Res. Meth. 43, 1066–1074. Savitzky, A., Golay, M.J.E., 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627–1639. Sharma, A., Weindorf, D.C., Wang, D.D., Chakraborty, S., 2015. Characterizing soils via portable X-ray fluorescence spectrometer: 4. Cation exchange capacity (CEC). Geoderma 239–240, 130–134. Shepherd, K.D., Walsh, M.G., 2002. Development of reflectance spectral libraries for characterization of soil properties. Soil Sci. Soc. Am. J. 66, 988–998. Silva, S.H.G., Silva, E.A., Poggere, G.C., Guiherrme, L.R.G., Curi, N., 2018. Tropical soils characterization at low cost and time using portable X-ray fluorescence spectrometer (pXRF): Effects of different sample preparation methods. Ciência e Agrotecnologia 42, 80–92. Soriano-Disla, J.M., Janik, L.J., Rossel, R.A.V., Macdonald, L.M., McLaughlin, M.J., 2014. The performance of vsible, near-, and mid-infrared reflectance spectroscopy for prediction of soil physical, chemical, and biological properties. Appl. Spectrosc. Rev. 49, 139–186. Stenberg, B., Rossel, R.A.V., Mouazen, A.M., Wetterlind, J., 2010. Visible and near infrared spectroscopy in soil science. Adv. Agron. 107, 163–215. Stockmann, U., Cattle, S.R., Minasny, B., Mcbratney, A.B., 2016. Utilizing portable X-ray fluorescence spectrometry for in-field investigation of pedogenesis. Catena 139, 220–231. Sulieman, M., Saeed, I., Hassaballa, A., Rodrigo-Comino, J., 2018. Modeling cation exchange capacity in multi geochronological-derived alluvium soils: an approach based on soil depth intervals. Catena 167, 327–339. Ulusoy, Y., Tekin, Y., Tümsavaş, Z., Mouazen, A.M., 2016. Prediction of soil cation exchange capacity using visible and near infrared spectroscopy. Biosystems Eng. 152, 79–93. USEPA, 2007. Method 6200: field portable x-ray fluorescence spectrometry for the determination of elemental concentrations in soil and sediment. http://www.epa.gov/ osw/hazard/testmethods/sw846/pdfs/6200.pdf. Vapnik, V.N., 1999. An overview of statistical learning theory. IEEE T. Neural Networ. 10, 988–999. van der Voet, H., 1994. Comparing the predictive accuracy of models using a simple randomization test. Chemomet. Intell. Lab. Syst. 25, 313–323. Vašát, R., Kodešová, R., Borůvka, L., 2017. Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation. Comput. Geosci. 104, 75–83. Vasques, G.M., Grunwald, S.J.O.S., Sickman, J.O., 2008. Comparison of multivariate methods for inferential modeling of soil carbon using visible/near-infrared spectra. Geoderma 146, 14–25. Wang, D., Chakraborty, S., Weindorf, D.C., Li, B., Sharma, A., Paul, S., Ali, M.N., 2015. Synthesized use of VisNIR DRS and PXRF for soil characterization: total carbon and total nitrogen. Geoderma 243, 157–167. Wang, S., Li, W., Li, J., Liu, X., 2013. Prediction of soil texture using FT-NIR spectroscopy and PXRF spectrometry with data fusion. Soil Sci. 178, 626–638. Webster, R., 2000. Is soil variation random? Geoderma 97, 149–163. Wilding, L., 1985. Spatial variability: its documentation, accommodation and implication to soil surveys, Soil spatial variability. Workshop 166–194. Wold, S., Sjöström, M., Eriksson, L., 2001. PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. 58, 109–130. Xu, D., Chen, S., Viscarra Rossel, R.A., Biswas, A., Li, S., Zhou, Y., Shi, Z., 2019. X-ray fluorescence and visible near infrared sensor fusion for predicting soil chromium content. Geoderma 352, 61–69. Xu, S., Zhao, Y., Wang, M., Shi, X., 2018. Comparison of multivariate methods for estimating selected soil properties from intact soil cores of paddy fields by Vis-NIR spectroscopy. Geoderma 310, 29–43. Zeng, R., Zhao, Y.G., Li, D.C., Wu, D.W., Wei, C.L., Zhang, G.L., 2016. Selection of “local” models for prediction of soil organic matter using a regional soil vis-nir spectral library. Soil Sci. 181, 13–19. Zhang, G., Gong, Z., 2012. Soil survey laboratory methods. Science Press.
Araújo, S.R., Wetterlind, J., Demattê, J.A.M., Stenberg, B., 2014. Improving the prediction performance of a large tropical vis-NIR spectroscopic soil library from Brazil by clustering into smaller subsets or use of data mining calibration techniques. Eur. J. Soil Sci. 65, 718–729. Bao, N., Wu, L., Ye, B., Yang, K., Zhou, W., 2017. Assessing soil organic matter of reclaimed soil from a large surface coal mine using a field spectroradiometer in laboratory. Geoderma 288, 47–55. Bellon-Maurel, V., Fernandez-Ahumada, E., Palagos, B., Roger, J.-M., McBratney, A., 2010. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. TrAC Trend. Anal. Chem. 29, 1073–1081. Bendor, E., Irons, J.R., Epema, G.F., 1999. Soil reflectance. Chapter in scientific book, In: Remote Sensing for the Earth Sciences: Manual of Remote Sensing 3/3 / Rencz, A.N., pp. 111-188. Brown, D.J., Shepherd, K.D., Walsh, M.G., Mays, M.D., Reinsch, T.G., 2006. Global soil characterization with VNIR diffuse reflectance spectroscopy. Geoderma 132, 273–290. Buol, S., Kamprath, E.J., 1998. A comparison of the contribution of clay, silt, and organic matter to the effective CEC of soils in Sub-Saharan Africa. Soil Sci. 163, 508. Chang, C.C., Lin, C.J., 2011. LIBSVM: A library for support vector machines. ACM. Chang, C.W., Laird, D.A., Hurburgh Jr, C.R., 2005. Influence of soil moisture on nearinfrared reflectance spectroscopic measurement of soil properties. Soil Sci. 170, 244–255. Charman, P.E., Murphy, B.W., 2007. Soils: Their properties and management. Oxford University Press, New York, NY. Cheng, H., Shen, R., Chen, Y., Wan, Q., Shi, T., Wang, J., Wan, Y., Hong, Y., Li, X., 2019. Estimating heavy metal concentrations in suburban soils with reflectance spectroscopy. Geoderma 336, 59–67. Chong, I.G., Jun, C.H., 2005. Performance of some variable selection methods when multicollinearity is present. Chemometr. Intell. Lab. 78, 103–112. Emamgolizadeh, S., Bateni, S.M., Shahsavani, D., Ashrafi, T., Ghorbani, H., 2015. Estimation of soil cation exchange capacity using Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS). J Hydro. 529, 1590–1600. Ge, Y., Morgan, C.L.S., Grunwald, S., Brown, D.J., Sarkhot, D.V., 2011. Comparison of soil reflectance spectra and calibration models obtained using multiple spectrometers. Geoderma 161, 202–211. Gomez, C., Lagacherie, P., Coulouma, G., 2008. Continuum removal versus PLSR method for clay and calcium carbonate content estimation from laboratory and airborne hyperspectral measurements. Geoderma 148, 141–148. Hong, Y., Chen, S., Liu, Y., Zhang, Y., Yu, L., Chen, Y., Liu, Y., Cheng, H., Liu, Y., 2019. Combination of fractional order derivative and memory-based learning algorithm to improve the estimation accuracy of soil organic matter by visible and near-infrared spectroscopy. Catena 174, 104–116. Hong, Y., Yu, L., Chen, Y., Liu, Y., Liu, Y., Liu, Y., Cheng, H., 2017. Prediction of soil organic matter by VIS–NIR spectroscopy using normalized soil moisture index as a proxy of soil moisture. Remote Sens. 10, 28. Horta, A., Malone, B., Stockmann, U., Minasny, B., Bishop, T.F.A., Mcbratney, A.B., Pallasser, R., Pozza, L., 2015. Potential of integrated field spectroscopy and spatial analysis for enhanced assessment of soil contamination: a prospective review. Geoderma 241, 180–209. Hu, W., Huang, B., Weindorf, D.C., Chen, Y., 2014. Metals analysis of agricultural soils via portable X-ray fluorescence spectrometry. B. Environ. Contam. Tox. 92, 420–426. Kemper, T., Sommer, S., 2002. Estimate of heavy metal contamination in soils after a mining accident using reflectance spectroscopy. Environ. Sci. Technol. 36, 2742–2747. Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137–148. Khaledian, Y., Brevik, E.C., Pereira, P., Cerdà, A., Fattah, M.A., Tazikeh, H., 2017. Modeling soil cation exchange capacity in multiple countries. Catena 158, 194–200. Kvalheim, O.M., 2010. Interpretation of partial least squares regression models by means of target projection and selectivity ratio plots. J. Chemometr. 24, 496–504. Leinweber, P., Reuter, G., Brozio, K., 1993. Cation exchange capacities of organo-mineral particle-size fractions in soils from long-term experiments. J. Soil Sci. 44, 111–119. Lucà, F., Conforti, M., Castrignanò, A., Matteucci, G., Buttafuoco, G., 2017. Effect of calibration set size on prediction at local scale of soil carbon by Vis-NIR spectroscopy. Geoderma 288, 175–183. Mahmood, H.S., Hoogmoed, W.B., van Henten, E.J., 2012. Sensor data fusion to predict multiple soil properties. Precision Agriculture 13, 628–645. Martens, H., Naes, T., 1989. Multivariate Calibration. John Wiley & Sons Ltd, New York. Mouazen, A.M., Kuang, B., De Baerdemaeker, J., Ramon, H., 2010. Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy. Geoderma 158, 23–31. Olorunfemi, I.E., Fasinmirin, J.T., Ojo, A.S., 2016. Modeling cation exchange capacity and soil water holding capacity from basic soil properties. Eurasian J. Soil Sci. 5,
8