Comparison of estimated soil bulk density using proximal soil sensing and pedotransfer functions

Comparison of estimated soil bulk density using proximal soil sensing and pedotransfer functions

Journal Pre-proofs Research papers Comparison of estimated soil bulk density using proximal soil sensing and pedotransfer functions Xiao-Lin Sun, Xiao...

752KB Sizes 0 Downloads 91 Views

Journal Pre-proofs Research papers Comparison of estimated soil bulk density using proximal soil sensing and pedotransfer functions Xiao-Lin Sun, Xiao-Qing Wang, Hui-Li Wang PII: DOI: Reference:

S0022-1694(19)30962-X https://doi.org/10.1016/j.jhydrol.2019.124227 HYDROL 124227

To appear in:

Journal of Hydrology

Received Date: Revised Date: Accepted Date:

10 August 2019 8 October 2019 10 October 2019

Please cite this article as: Sun, X-L., Wang, X-Q., Wang, H-L., Comparison of estimated soil bulk density using proximal soil sensing and pedotransfer functions, Journal of Hydrology (2019), doi: https://doi.org/10.1016/ j.jhydrol.2019.124227

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Β© 2019 Published by Elsevier B.V.

Comparison of estimated soil bulk density using proximal soil sensing and pedotransfer functions Xiao-Lin Sun a, Xiao-Qing Wang a, Hui-Li Wang b* a

Guangdong Provincial Key Laboratory for Urbanization and Geo-simulation,

School of Geography and Planning, Sun Yat-sen University, Guangzhou 510275, China, b Guangxi Forestry Research Institute, Nanning 530002, China

*

Corresponding author: Hui-Li Wang. E-mail: E-mail: [email protected].

Address: Guangxi Forestry Research Institute, 23 Yongwu Road, Nanning 530002, China.

1

Abstract: Due to disadvantages of the core sampling method for determining soil bulk density (BD), many soil surveys lack data of BD. To meet extensive demands for BD in hydrology, agriculture, environment, etc., an approach based on a pedotransfer function (PTF) is often established using available data of soil properties such as texture, organic matter content, pH and soil depth. Recently, an approach based on proximal soil sensing (PSS) was also proposed to fulfill this purpose, i.e., subtracting volumetric water content (πœƒ, %) estimated using visible and near infrared spectroscopy (vis-NIRS) from BD of wet soils determined using gamma ray attenuation. With 120 samples collected from 20 profiles and 38 topsoils in a hilly subtropical area in China, this study compared accuracy of the two approaches based on leave-one-out cross-validation (LOOCV) and repeated cross-validations with a wide range of randomly split samples. Texture, organic matter content, pH and depth of the samples were measured, and they were used within multiple linear regression to establish PTFs. Results showed that the approach based on PSS was at least 12% more accurate than the other and its estimated BD were close to those measured using the core sampling method. For instance, in LOOCV, estimated BD based on PSS had root mean square error (RMSE) of 0.082 g cmβˆ’3, coefficient of determination (R2) of 0.92 and residual prediction deviation (RPD) of 2.07, while estimated BD based on the best PTF had RMSE of 0.094 g cmβˆ’3, R2 of 0.70 and PRD of 1.82. The results also showed that accuracy of the approach based on PSS was determined by accuracy of estimated πœƒ using vis-NIRS. However, it is limited to improve accuracy of estimated 2

BD using the approach based on PSS further by improving accuracy of estimated πœƒ. This study demonstrated that the approach based on PSS is promising for estimating BD in future soil surveys. Keywords: Gamma ray attenuation; visual-near infrared spectroscopy; pedotransfer function; proximal soil sensing.

3

1. Introduction Bulk density (BD) is a primary property of soils. It is closely related to soil functions in hydrology, agriculture, environment, etc., by impacting tillage, movements of plant nutrients, water, air and energy in soil, plant root growth and so on (Costa et al., 2013; Yi et al., 2016; Lobsey and Viscarra Rossel, 2016; Tian et al., 2018; Qiao et al., 2019). Thus, it is an important parameter in evaluating land drainage, irrigation, nutrient stock, carbon stock, pollutant mass balance, biomass productivity, etc. (Li et al., 2016; Shiri, et al., 2017; Tian et al., 2019). However, the traditional method for determining BD, i.e., core sampling of a known volume of soil followed by drying at 105 ℃ for 24 h, usually is not very convenient to conduct because of stones and plant roots in soil, and it is also laborious, time consuming, expensive, destructive, prone to measurement error (Quraishi and Mouazen, 2013a and b; Al-Asadi and Mouazen, 2014; Tian et al., 2018). As a result, BD information is missing in a lot of routine soil surveys (Martin et al., 2009; Suuster et al., 2011; Reidy et al., 2016). To meet extensive demands for BD information for agricultural activities, ecological and environmental evaluations and modeling, and many other scientific researches, scientists developed a range of methods which are easy to be conducted, free of labor, non-destructive or cost and time effective, e.g., pedotransfer functions (PTFs), penetration resistance (PR), gamma ray attenuation, thermo-time domain reflectometry, visible and near infrared spectroscopy (vis-NIRS). The method of PTFs is the earliest and most frequently used one (Jeffrey, 1970; 4

Adams, 1973; Federer, 1983). A PTF is an empirical function modeled between BD and readily measured soil properties, such as organic carbon and texture, and easily available soil information, such as soil depth and soil type. It can be modeled using many types of modeling methods, varying from basic statistical methods to multivariate spatial statistics, e.g., linear regression, exponential equation and polynomial equation, and to machine learning methods such as classification and regression trees (Tranter et al., 2007), artificial neural networks (ANN) (Yi et al., 2016; Qiao et al., 2019), random forest (Ramcharan et al., 2017) and generalized boosted models (Martin et al., 2009; Jalabert et al., 2010; Chen et al., 2018). So, given commonly measured soil properties such as organic carbon and texture, the method of PTFs is relatively convenient to be conducted and hence has been widely used so far. However, the method of PTFs suffers several drawbacks (Suuster et al., 2011; Yi et al., 2016). The most important drawback is that a PTF is specific to a geomorphic region or soil type, meaning that it cannot be simply extrapolated beyond the geomorphic region or soil type from which it was developed (McBratney, 2002). For example, De Vos et al. (2005) found that predictive qualities of 12 published PTFs in literature for predicting BD were mostly different between topsoil and subsoil. Thus, although a large number of PTFs have been published in literature, it is still necessary to establish a new one for a specific area or to validate published ones for their applicability in a new area. This unavoidably requires for soil data and modeling in a new area. Another drawback is that the accuracy of a PTF is complexly related to 5

factors such as modeling techniques, available soil properties and interested soils. For example, Tranter et al. (2007) presented that a PTF modeled using multiple linear regression (MLR) outperformed another one modeled using regression tree, but later Martin et al. (2009) presented a contrary result. Different combinations of soil properties have been used in literature to develop a PTF and generally they performed differently in different cases (De Vos et al., 2005; Yi et al., 2016). Some studies suggested to divide interested soils into different ranges and developed a PTF for each range specifically (Harrison and Bocock, 1981), but some others found that this did not help to improve estimation accuracy (De Vos et al., 2005; Benites et al., 2007). Proximal soil sensing (PSS) is another convenient method to measure BD, which is also free of labor and cost and time effective. In literature, a lot of approaches had been developed using PSS, including gamma ray attenuation, PR, frequency domain reflectometry (FDR) and vis-NIRS. Particularly, Lobsey and Viscarra Rossel (2016) proposed to combine gamma ray attenuation and vis-NIRS for estimating soil BD. In this way, BD of wet soil is first determined using gamma ray attenuation and then it is corrected by subtracting volumetric water content (ΞΈ, %) estimated using vis-NIRS. Gamma ray attenuation has been proposed much earlier for measuring BD by Luo and Wells (1992) and Wells and Luo (1992), and vis-NIRS is often used to estimate many soil properties including water content, e.g., gravimetric moisture content in Al-Asadi and Mouazen (2014) and Mouazen and Al-Asadi (2018). Moreover, data fusion is believed to be an important tool for improving prediction of soil properties using PSS 6

(Kuang et al., 2012). For example, Al-Asadi and Mouazen (2014) demonstrated an improvement in predicting gravimetric moisture content and ΞΈ using data fusion of vis-NIRS and FDR, compared with single vis-NIRS or single FDR. Lobsey and Viscarra Rossel (2016) already showed that this approach of combining gamma ray attenuation and vis-NIRS estimated BD with RMSE of 0.055 g m-3 and R2 of 0.90, much more accurately than those estimated BD based on gamma ray attenuation only (i.e., without being adjusted using water content estimated using vis-NIRS). Thus, data fusion of gamma ray attenuation and vis-NIRS probably could estimate BD very accurately. Similar to the method of PTFs, the approach of combining gamma ray attenuation and vis-NIRS for estimating soil BD also has some drawbacks. First, Mouazen and Al-Asadi (2018) concerned about the negative influence of the radioactive energy source on human health. However, this can be overcome using a lead container as a shield. Actually, Viscarra Rossel et al. (2017) have already integrated gamma ray attenuation with digital camera and vis-NIRS into their Soil Condition Analysis System for practical use. Second, the accuracy of this approach for estimating BD is also influenced by those factors that can influence accuracy of estimation of πœƒ. To the best of our knowledge, few studies so far have applied this approach for estimating BD, i.e., Lobsey and Viscarra Rossel (2016), Viscarra Rossel et al. (2017) and Wang et al. (2019). The objectives of this study are to: (1) apply the approach of combining gamma 7

ray attenuation and vis-NIRS to estimate BD based on 120 core samples collected in a hilly subtropical area in China; (2) establish PTFs for estimating BD of the same samples with commonly measured soil properties, i.e., organic matter content (OMC), contents of sand (0.05-2 mm), silt (0.002-0.05 mm) and clay (<0.002 mm), pH and soil depth, and (3) compare accuracy of the approach and PTFs. 2. Materials and methods 2.1 Soil sampling Soil samples were collected in a hilly area of 1.6 km2, located in Nanning, Southwest

China,

within

latitude

22Β°58β€²3β€³-22Β°59β€²17β€³N

and

longitude

108Β°20β€²55β€³-108Β°21β€²54β€³E. The climate of this area is subtropical humid monsoon, with an annual average temperature of 21.6 ΒΊC and an annual average rainfall of 1300.6 mm. The elevation of this area ranges between 130 and 300 m, and the average slope of this area is 24.8 degree. In history, this area has been always used for forestry. Since two decades ago it has been under Eucalyptus plantation. Soils of this area are derived based on mud stone, mud shale and sand shale, and they are latosolic red soil, referred to Ustisol in the U.S. Soil Taxonomy. Fifty-eight sites of this area were selected to collect samples in terms of elevation, aspect and slope. Among them, 20 sites were sampled with a soil profile. Each genetic horizon of a profile was sampled by the core method using standard sharpened steel cylinders of 100 cm3 volume (d=53 mm, h=50 mm). On the remaining 38 sites, topsoil equivalent to the genetic horizon A was sampled in the same way. Depths of 8

all the samplings were recorded when sampling. A total of 120 samples were collected. 2.2 Lab analysis After weighting, the above core samples were scanned in laboratory using a vis-NIRS spectrometer (FieldSpec 4, Analytical Spectral Devices-ASD, Boulder, CO, USA). The spectrometer has a spectral range of 350-2500 nm and a spectral resolution of 3 nm at 700 nm and 6 nm at 1400 and 2100 nm. Its sampling interval is 1.4 nm between 350 nm and 1000 nm, and 1.1 nm between 1001 nm and 2500 nm. It has a light source of a halogen bulb, enclosed in an angle of 12Β° with a contact probe of detection fibers. Using the spectrometer, 9 measurements were made at both ends of a core sample in order to account for soil variation, and 10 spectra were recorded for each measurement in order to minimize noises and to maximize the signal-to-noise ratio. The resulting 180 spectra of each core sample were averaged to represent the whole core sample. At last, the range of each averaged spectra was reduced to 400-2400 nm, to eliminate noises at both edges, and then each spectra was reduced by averaging 10 successive wavelengths. After scanned using the vis-NIRS spectrometer, each core sample was sealed tightly with scotch tape, in order to retain moisture for the following gamma ray attenuation measurement. The gamma ray attenuation measurement were made using GEM18180 (Ortec, USA) with a high-purity germanium gamma detector, when the core samples were wet. The radioactive source was

137Cs

with an activity of 185 MBq and a photon 9

energy of 0.662 MeV. For each core sample, the measurements were made axially through the center and three positions evenly offset the center. The four measurements of each core sample were averaged to represent the whole core sample. After the above measurement of gamma ray attenuation, the core samples were dried at 40 ΒΊC for 7 days and then were repeatedly weighted and dried until the weights of the samples were constant. The weights before and after drying were used to calculate πœƒ and BD. The low temperature of 40 ΒΊC was used in order to keep OMC constant for the following laboratory analysis. Finally, the samples were measured for OMC using the potassium dichromate oxidation method, soil pH using a pH meter (1: 2.5

soil

to

water

ratio)

and

contents

of

sand,

silt

and

clay

using

the Malvern Panalytical Mastersizer 2000 (Malvern, UK), a laser diffraction testing instrument. 2.3 Bulk density measured using proximal soil sensing Following Lobsey and Viscarra Rossel (2016), BD was estimated using the equation, 1

πœŒπ‘π›Ύ = π‘₯πœ‡π‘ ln

𝐼0

( )― 𝐼

πœ‡π‘€

πœ‡π‘  πœŒπ‘€πœƒ

(1)

where πœŒπ‘π›Ύ represents BD in gΒ·cmβˆ’3, π‘₯ is the sample thickness in cm, πœ‡π‘  represents the mass attenuation coefficients of dry soil in cm2 gβˆ’1, 𝐼0 is the unattenuated radiation emitted from the source, 𝐼 is the incident radiation at the detector, and πœ‡π‘€ represents the mass attenuation coefficients of soil water in cm2Β·gβˆ’1, πœŒπ‘€ represents the density of soil water in gΒ·cmβˆ’3. πœ‡π‘  is assumed to be consistent for sensing soil BD 10

with gamma ray attenuation. In this study, the πœ‡π‘  of Wang et al. (2019), i.e., 0.0856 cm2 g-1, was used because the soils were from the same study area. The πœ‡π‘€ and πœŒπ‘€ actually were measured using distilled water and they were also the same to those of Wang et al. (2019), 0.0850 cm2 g-1 and 1 g cmβˆ’3. Similarly, the other parameters in equation (1) were also the same to those of Wang et al. (2019), except 𝐼 and πœƒ. 𝐼 is the measured gamma ray attenuations for individual soil samples. πœƒ is estimated based on the vis-NIRS measurement using PLSR, and they were validated in the section 2.5. 2.4 Pedostransfer functions MLR was used to establish a PTF in this study, because it is simple, straightforward for interpretation. The above measured soil properties, i.e., OMC, pH and contents of sand, silt and clay, as well as recorded soil depth, were used in MLR to establish a PTF. These properties were chosen because they are commonly measured in a soil survey, so they are generally available. Correlations between the properties and with BD were first analyzed. Then, all possible combinations of the soil properties that were significantly correlated with BD at a level of 0.05 but having no multicollinearity were used to establish a PTF based on ordinary least square using the lm function of R (R Core Team, 2016). Meanwhile, commonly used variable selection procedures for MLR, i.e., backward, forward and both directions stepwise methods, were also tried based on all of the measured soil properties, in order to search for the best combination of variables for establishing a 11

PTF. Finally, only PTFs that have every variable with a statistical significance at a level of 0.05 and have residuals being normally distributed (simply evaluated by the Q-Q plot) were used to estimate BD. 2.5 Validation Both leave-one-out cross-validation (LOOCV) and cross-validation with randomly split samples from the 120 collected samples were used to validate BD estimated using a PTF and PSS, as well as the above estimated πœƒ based on vis-NIRS. In LOOCV, each sample was selected to validate the estimation made using the remaining 119 samples. A Taylor diagram was plotted to compare accuracies of the estimated BD in LOOCV. In the cross-validation with randomly split samples, n samples were randomly selected to validate estimation made using the remaining samples. For all PTFs and PSS, the same n samples and remaining samples were used. In this study, n ranged from 1 to 40. For each n, the validation was repeated for 100 times and an averaged validation criterion over the 100 times was computed to represent the validation criterion for the n. The following three criteria were computed in the validations, as well as in calibration of PTFs, root mean squared error (RMSE), residual prediction deviation (RPD, also called prediction deviation ratio in literature) and coefficient of determination (R2). They were calculated as, RMSE = 1

2

R =

1 𝑛 βˆ‘ 𝑛 𝑖 = 1(π‘œπ‘π‘ π‘–

― π‘π‘Ÿπ‘’π‘‘π‘–)2

(2)

𝑛

((𝑛 ― 1)βˆ‘π‘– = 1(π‘π‘Ÿπ‘’π‘‘π‘– ― πœ‡π‘π‘Ÿπ‘’π‘‘)(π‘œπ‘π‘ π‘– ― πœ‡π‘œπ‘π‘ ))2

(3)

𝜎2π‘π‘Ÿπ‘’π‘‘πœŽ2π‘œπ‘π‘ 

12

πœŽπ‘œπ‘π‘ 

(4)

RPD = RMSE

where π‘œπ‘π‘ π‘– is the observed value of sample i, π‘π‘Ÿπ‘’π‘‘π‘– is the corresponding predicted value, πœŽπ‘π‘Ÿπ‘’π‘‘ and πœ‡π‘π‘Ÿπ‘’π‘‘ denote standard deviation and mean of predicted values, respectively, and πœŽπ‘œπ‘π‘  and πœ‡π‘œπ‘π‘  denote standard deviation and mean of observed values, respectively. RMSE indicates prediction precision by measuring the variance of prediction errors and is sensitive to an outlier. A large RMSE value indicates a high prediction error. R2 measures the goodness of fit between the predicted and observed values, and it has the optimal value of 1, indicating a perfect fit. RPD measures the ratio of standard deviation of observed values divided by the RMSE, and it is widely used in literature of PSS to compare predictions made by different models. The larger a RPD, the better predictions are. According to Quraishi and Mouazen (2013b), it is generally classified into three classes: a RPD of smaller than 1.4 indicating a poor prediction, a RPD between 1.4 and 1.8 indicating a fair prediction and a RPD of larger than 1.8 indicating a good prediction. 3. Results and discussion 3.1 Basic statistics of the soil properties Table 1 lists basic statistics of the soil properties. Fig. 1 shows histograms of the soil properties. These statistics suggest that the BD values of this study vary largely and cover general BD values in literature. For example, the 300 BD values in Mouazen and Al-Asadi (2018) vary between 0.879 g cm-3 and 1.671 g cm-3 with an average of 1.304 g cm-3 and a standard deviation of 0.130 g cm-3. The 1013 BD values 13

in Al-Asadi and Mouazen (2014) vary between 0.71 g cm-3 and 2.08 g cm-3 with an average of 1.43 g cm-3 and a standard deviation of 0.22 g cm-3. In Table 1, values of skewness and kurtosis for BD indicate that the distribution of the BD values of this study is closely normal, which is verified by Fig. 1(a). The variations of OMC, contents of sand, silt and clay, pH, depths and πœƒ vary in different degrees. The variations of OMC and depths are quite large, as indicated by coefficients of variation of 59.79% and 54.26%. This is attributed to the sampling method of this study. About half of the samples (i.e., 58) were collected in the genetic horizon A. Generally, OMC in the genetic horizon A is much larger than in deeper horizons. Thus, the variation of OMC in this study is large. Depths of the genetic horizon A are smaller than those of deeper horizons, while soil profiles vary in depth across the hilly study area. So, the distribution of depths in Fig. 1(g) is clearly right skewed, and this is also indicated by the skewness for depth in Table 1. The variations of contents of sand, silt and clay are modest with coefficients of variation of 16.29-23.00% in Table 1. The variation of πœƒ was not big either, as indicated by the coefficient of variation of 16.69%. The variation of pH is the smallest, indicated by the coefficient of variation of 10.11%. This is because the samples of this study are collected in a small area with the same parent materials and climate. However, the distribution of pH is a little bit right skewed, as shown in Fig. 1(f). 3.2 Pedotransfer functions Table 2 lists correlations between the soil properties. BD is strongly correlated 14

with OMC, pH and depth at a statistical level of 0.01 and with the content of silt at a statistical level of 0.05. It is common to see significantly strong correlations between BD and these soil properties in literature (Qiao et al., 2019). Table 2 also shows that there are statistically significant correlations between the soil properties. For example, OMC is statistically significantly correlated with pH and depth. So, these correlated soil properties would not be used together to establish a PTF, to avoid multicollinearity. Based on the correlation analysis, OMC, content of silt, pH and depth are chosen for establishing a PTF for estimating BD using MLR. These four soil properties lead to totally 15 combinations for establishing PTFs. However, only six of the 15 result in a PTF that has every variable being statistically significant and has residuals being normally distributed. The six PTFs established based on them are shown in Table 3 and are numbered from 1 to 6 for citing in the following. Four PTFs are established based on a single variable, i.e., each of the four chosen soil properties, and the other two are based on two variables, i.e., OMC and silt, and silt and depth, respectively. The two pairs in the last two PTFs do not correlate with each other, according to Table 2. Based on all of the soil properties, the backward and both directions stepwise procedures all select OMC and silt for establishing a PTF. Although the forward direction stepwise procedure select all of the properties for establishing a PTF, only OMC and silt have statistical significances at a level of 0.05. Thus, the three variable 15

selection procedures suggest OMC and silt for establishing a PTF, i.e., the PTF 5 discussed in the above. The six PTFs explained 4.5-71% of the variation of BD, in terms of R2 (Table 3). PTF 5 based on OMC and the content of silt has the largest R2, i.e., 0.71, which is very closely followed by PTF 1 based on OM only. This suggests that OM plays a major role in explaining variation of BD in this study. In contrast, the content of silt plays a minor role. This minor role plays by the content of silt is verified by the R2 of PTF 2 based on this soil property only, i.e., 0.045, the smallest R2 in Table 3. PTF 6 based on the content of silt and depth has the third largest R2, i.e., 0.40, closely followed by PTF 4 based on depth only with a R2 of 0.35. PTF 3 based on pH only has the second smallest R2, i.e., 0.085, just a little larger than the smallest one. De Vos et al. (2005) summarized R2 of 12 published PTFs and showed that they vary between 0.41 and 0.96. Comparatively, in this study, except the two PTFs based on the content of silt and pH, respectively, the other four PTFs all have R2 close to those summarized in De Vos et al. (2005). However, the PTFs summarized in De Vos et al. (2005) are much more complex than the PTFs of this study, since the former take different transformations of a soil property (e.g., natural logarithm transformation) or different types of linear functions (e.g., exponential linear and power linear). Although it might be possible for R2 of the PTFs in Table 3 to increase if such transformations and functions are adopted, De Vos et al. (2005) found no substantial enhancement in R2 due to including second and third polynomials in modeling a PTF. 16

So, it remains a question for future studies to investigate if different forms of a soil property could help to increase R2 of a PTF. Recently, PTFs established in Quraishi and Mouazen (2013a) using MLR have R2 ranging from 0.04 to 0.51, smaller than R2 of the PTFs in Table 3. This shows that the PTFs of this study are generally good. In Table 3, values of RMSE and RPD of the models are also in the same order of R2. Particularly, values of RPD suggest that PTFs 1 and 5 predict fairly on the calibration data, while the other four PTFs predict poorly. However, these values of RMSE and RPD might be biased because they are derived from the calibration data. 3.3 Volumetric water content estimated using proximal soil sensing Fig. 2 shows coefficients of correlation between vis-NIRS reflectance and πœƒ of soil samples at individual wavelengths ranging from 404.5 nm to 2404.5 nm. Clearly, the correlations are all negative. Except at the wavelengths of 434.5~454.5 nm and 524.5~834.5 nm (shaded in Fig. 2), the correlations are significant at a statistical level of 0.05 or 0.01. This suggests that it is possible to use the vis-NIRS spectra to estimate πœƒ. Fig. 3 presents accuracy criteria for the estimated πœƒ based on vis-NIRS using PLSR. In LOOCV, RMSE is not very big, i.e., 3.0%, accounting for 12% of average πœƒ of the samples in Table 1. However, R2 and RPD from LOOCV are not very high either, i.e., 0.52 and 1.4, respectively. So, these results from LOOCV suggest that accuracy of the estimated πœƒ based on vis-NIRS using PLSR is poor. In the validation using randomly split samples, values of these accuracy criteria 17

for estimated πœƒ vary with the number of samples for modeling, as shown in Fig. 3. RMSE varies between 2.45% and 3.21%, accounting for 9.7% to 12% of average πœƒ of the samples. R2 varies between 0.49 and 1, while RPD varies between 1.31 and 2.24. These results suggest that accuracies of the estimated πœƒ in the validation using randomly split samples are mostly just fair or even poor, while only sometimes they are good. For example, 55% of the RPD values are between 1.4 and 1.8, and 42.5% are smaller than 1.4, while only 2.5% of the RPD values are larger than 1.8. 3.4 Bulk density estimated using pedotransfer functions and proximal soil sensing Fig. 4 presents scatter plots of measured versus estimated BD in LOOCV. Meanwhile, estimated BD using PSS without corrections for water content are also presented, i.e., the black dots in Fig. 4(g), which are called PSS 1 in the following. Estimated BD using PSS with corrections for water content based on vis-NIRS, i.e., the black dots in Fig. 4(h), are called PSS 2 in the following. Besides, estimated BD using PSS with corrections for water content based on oven-dry are also presented, i.e., the red circles in Fig. 4(g) and (h). Taylor diagram for the estimated BD is shown in Fig. 5. Clearly, PSS 2 in Fig. 4(h) are the closest to the measured BD, while PSS 1 in Fig. 4(g) consistently bias from the measured BD. Among the PTFs, PTFs 1 and 5 are the closest to the measured BD whereas PTFs 2 and 3 are the furthest. This order appears again in the Taylor diagram of Fig. 5, except PSS 1 moving very close to PSS 2. The move of PSS 1 is attributed to the fact that the Taylor diagram is based on 18

standard deviation of differences between estimated and observed BD values, ignoring average of the differences. As shown in Fig. 4(g), although PSS 1 is regressed very well with the measured BD, resulting in a high correlation in the Taylor diagram of Fig. 5, they are the most biased away from the 1:1 line. The bias indicates that they are larger than the measured BD. Obviously, the biases are attributed to the uncorrected water contents in the core samples. Fig. 6 shows accuracy criteria of the estimated BD of PTFs 1-6 and PSS 1-2. In LOOCV, PSS 2 has the smallest RMSE, i.e., 0.082 g cmβˆ’3, accounting for 6.03% of average measured BD in Table 1, and it also has the largest R2 and RPD, i.e., 0.92 and 2.07, respectively. Thus, these accuracy criteria suggest that PSS 2 is the most accurate. Among all PTFs, PTF 5 has the smallest RMSE, i.e., 0.094 g cmβˆ’3, and it has the largest R2 and RPD, i.e., 0.70 and 1.82, respectively. So, PSS 2 is much more accurate than the most accurate PTF, i.e., PTF 5. Compared with PTF 5, PSS 2 decreases RMSE by 12%, and increases R2 and RPD by 31% and 14%, respectively. The order of accuracies of the PTFs is the same to that of modeling R2 of the PTFs in Table 3, inferring that accuracy of a PTF is determined by modeling R2 of the PTF. At last, PSS 1 has the largest RMSE and the smallest RPD, suggesting that PSS 1 is the worst prediction. Although R2 of PSS 1 is the second largest, this is due to the good regression relationship between PSS 1 and the measured BD, as shown in Fig. 4(g). It is already explained in the above that PSS 1 biases the most away from the measured BD. 19

Similar to the results from LOOCV, results from the validation using randomly split samples (Fig. 6) also suggest that PSS 2 is more accurate than PTFs 1-6, except for the RPD values from validations based on samples less than 2. Compared with the best PTF, i.e., PTF 5, PSS 2 decreases RMSE by 2.42-17.09% and increases R2 and RPD by 0-34% and 10-22%, respectively. PSS 1 is still the worst. Besides, results from the validation using randomly split samples also suggest that accuracy of a PTF is determined by modeling R2 of the PTF, as the order of accuracy criteria of the PTFs is the same to that of modeling R2 of the PTFs in Table 3. When the number of samples for validation is larger than 20, values of the accuracy criteria are relatively stable, and R2 values from the validation are close to modeling R2 values of the PTFs. The best PTF established in Quraishi and Mouazen (2013a) using MLR with PR, moisture content, OMC and content of clay results in RMSE of 0.13 g cmβˆ’3 and R2 of 0.51 in validation. Qiao et al. (2019) repeatedly validated five published PTFs and a new one established using MLR with OMC, content of clay and/or soil depth (original or natural logarithm transformed) for their own study area, and reported RMSE of 0.079-0.612 g cmβˆ’3 and R2 of 0.003-0.356. Yi et al. (2016) also repeatedly validated 14 published PTFs and newly established ones using MLR and ANN with OMC, content of clay, BD of organic matter and minerals, organic carbon and soil depth for their own study area, and reported RMSE of 0.14-0.54 g cmβˆ’3 and R2 of 0.22-0.72. So, the best PTF of this study, i.e., PTF 5, with RMSE of 0.094 g cmβˆ’3 and R2 of 0.70, is much more accurate than most of these reported in literature. 20

Quraishi and Mouazen (2013b) developed a PSS model for estimating BD, which combing PR and vis-NIRS estimated moisture content, OMC and clay content using ANN. Their model estimates BD with RMSE of 0.08 g cmβˆ’3 and R2 of 0.84 when validating, and when testing RMSE and R2 are 0.04 g cmβˆ’3 and 0.94, respectively. Al-Asadi and Mouazen (2014) developed PSS models for estimating BD, which combining vis-NIRS measured gravimetric moisture content and FDR measured πœƒ. They used both PLSR and ANN for modeling, and the ANN based model presents a better accuracy in validation with RMSE of 0.095 g cmβˆ’3 and R2 of 0.81. In Mouazen and Al-Asadi (2018), a similar PSS model results in RMSE of 0.061-0.086 g cmβˆ’3, R2 of 0.57-0.86 and RPD of 1.51-2.60 for different levels of moisture content in validation. Lobsey and Viscarra Rossel (2016) used the same approach of this study but a different modeling technique for estimating πœƒ (i.e., ANN). Their model results in RMSE of 0.055 g cmβˆ’3 and R2 of 0.897 in validation. So, comparatively, accuracy of PSS 2 of this study is similar to or better than those published. The comparison between PSS 1 and 2 shows that correcting estimated BD using estimated πœƒ is important for estimating BD based on PSS. This can be clearly seen in equation (1). In Fig. 4(h) PSS 2 is much closer to measured BD than PSS 1 in Fig. 4(g), and Fig. 6 clearly shows that PSS 2 is much more accurate than PSS 1. For example, the LOOCV in Fig. 6 suggests that, compared with PSS1, PSS 2 decreases RMSE by 59% and increases R2 by 5.5% and RPD by 145%. 21

However, improving estimation accuracy of πœƒ further could not lead to an equivalent improvement in estimation accuracy of BD. As shown in Fig. 4(g), estimated BD corrected using measured πœƒ also deviate a little from the 1:1 line, and in Fig. 4(h) they are quite close to PSS 2. In fact, estimated BD corrected using measured πœƒ have RMSE of 0.080 g cmβˆ’3, R2 of 0.94 and RPD of 2.12. So, the improvements in accuracy of these BD over PSS 2 are very small, i.e., 2.4% of decrease in RMSE, 3.2% of increase in R2 and 2.5% of increase in RPD. This can be attributed to measurement errors in laboratory analysis on πœƒ. In addition, this is also because πœƒ and its parameter in equation (1) are not large. As shown in Table 1, πœƒ in weight averagely accounts for 18.65% of BD. In equation (1), its parameter is 0.99 g cmβˆ’3. 4. Conclusions This study applied the approach based on PSS, i.e., combining gamma ray attenuation and vis-NIRS, to estimate BD using 120 core samples collected in a hilly subtropical area in China, and compared it with another commonly used, convenient and efficient approach for estimating BD, i.e., PTF. It demonstrates that the approach based on PSS is more accurate than the other. The former leads to estimated BD values close to the measured ones, with RMSE of about 0.082 g cmβˆ’3, R2 larger than 0.90 and RPD larger than 2.00. In contrast, the latter leads to estimated BD values biased much more away from the measured ones, with RMSE larger than 0.094 g cmβˆ’3, R2 smaller than 0.70 and RPD smaller than 1.80. Besides, this study also shows 22

that accuracy of the approach based on PSS is mainly determined by the accuracy of estimated πœƒ using vis-NIRS. Nevertheless, improving accuracy of estimated πœƒ further could only lead to a small improvement in accuracy of the approach based on PSS. This study summarizes that the approach based on PSS is promising for estimating BD in future soil surveys.

Acknowledgements This research is financially supported by the National Natural Science Foundation of China (41771246 and 41541006). Prof. Jun Li at School of Geography and Planning, Sun Yat-sen University, provided the vis-NIRS spectrometer for this study. Mr. Qi Luo at College of Physics and Energy, Shenzhen University, provided GEM18180 for this study. We are grateful to them.

References Adams, W.A., 1973. The effect of organic matter on the bulk and true densities of some uncultivated podzolic soils. Eur. J. Soil Sci. 24, 10–17. Al-Asadi, R.A., Mouazen, A.M., 2014. Combining frequency domain reflectometry and visible and near infrared spectroscopy for assessment of soil bulk density. Soil Tillage Res. 135, 60–70. Benites, V.M., Machado, P.L.O.A., Fidalgo, E.C.C., Coelho, M.R., Madari, B.E., 2007. Pedotransfer functions for estimating soil bulk density from existing soil 23

survey reports in Brazil. Geoderma 139, 90–97. Chen, S., Richer-de-Forges, A.C., Saby,N.P.A., Martina, M.P., Walter, C., Arrouays, D., 2018. Building a pedotransfer function for soil bulk density on regional dataset and testing its validity over a larger area. Geoderma 312, 52–63. Costa, J.C., Borges, J.A.R., Pires, L.F., 2013. Soil bulk density evaluated by gamma-ray attenuation: Analysis of system geometry. Soil Tillage Res. 129, 23–31. De Vos, B., Meirvenne, M.V., Quataert, P., Deckers, J., Muys, B., 2005. Predictive quality of pedotransfer functions for estimating bulk density of forest soils. Soil Sci. Soc. Am. J. 69, 500–510. Federer, C.A., 1983. Nitrogen mineralization and nitrification: depth variation in four New England forest soils. Soil Sci. Soc. Am. J. 47, 1008-1014. Harrison, A.F., Bocock, K.L., 1981. Estimation of soil bulk-density from loss-on-ignition values. J. Appl. Ecol. 18, 919–927. Jalabert, S.S.M., Martin, M.P., Renaud, J.-P., Boulonne, L., Jolivet, C., Montanarella, L., Arrouays, D., 2010. Estimating forest soil bulk density using boosted regression modelling. Soil Use Manage. 26, 516–528. Jeffrey, D.W., 1970. A note on the use of ignition loss as a means for the approximate estimation of soil bulk density. J. Ecol. 58, 297–299. Kuang, B., Mahmood, H.S., Quraishi, Z., Hoogmoed, W.B., Mouazen, A.M., van Henten, E.J., 2012. Sensing soil properties in the laboratory, in situ, and online: a 24

review. In: Sparks, D. (Ed.), Advances in Agronomy, vol. 114. Academic Press, Agron, UK, pp. 155–224. Li, D., Gao, G., Shao, M., Fu, B., 2016. Predicting available water of soil from particle-size distribution and bulk density in an oasis–desert transect in northwestern China. J. Hydrol. 538, 539-550. Lobsey, C.R., Viscarra Rossel, R.A., 2016. Sensing of soil bulk density for more accurate carbon accounting. Eur. J. Soil Sci. 67, 504–513. Luo, X., Wells, L.G., 1992. Evaluation of gamma ray attenuation for measuring soil bulk density Part I. Laboratroy investigation. T. ASAE 35, 17–26. Martin, M.P., Seen, L.D., Boulonne, L., Jolivet, C., Nair, K.M,, Bourgeon, D.G., Arrouays, D., 2009. Optimizing pedotransfer functions for estimating soil bulk density using boosted regression trees. Soil Sci. Soc. Am. J. 73, 485–493. McBratney, A.B., Minasny, B., Cattle, S.R., Vervoort, R.W., 2002. From pedotransfer functions to soil inference systems. Geoderma 109, 41–73. Mouazen, A.M., Al-Asadi, R.A., 2018. Influence of soil moisture content on assessment of bulk density with combined frequency domain reflectometry and visible and near infrared spectroscopy under semi field conditions. Soil Tillage Res. 176, 95–103. Qiao, J., Zhu, Y., Jia, X., Huang, L., Shao, M., 2019. Development of pedotransfer functions for predicting the bulk density in the critical zone on the Loess Plateau, China. J. Soil Sediment 19, 366-372. 25

Quraishi, M.Z., Mouazen, A.M., 2013a. Development of a methodology for in situ assessment of topsoil dry bulk density. Soil Tillage Res. 126, 229–237. Quraishi, M.Z., Mouazen, A.M., 2013b. A prototype sensor for the assessment of soil bulk density. Soil Tillage Res. 134, 97–110. R Core Team, 2016. R: A language and environment for statistical computing. R Foundation

for

Statistical

Computing,

Vienna,

Austria.

URL

https://www.R-project.org/. (Accessed on Jan. 28, 2019) Ramcharan, A., Hengl, T., Beaudette, D., Wills, S., 2017. A soil bulk density pedotransfer function based on machine learning: a case study with the NCSS soil characterization database. Soil Sci. Soc. Am. J. 81:1279–1287. Reidy, B., Simo, I., Sills, P., Creamer, R.E., 2016. Pedotransfer functions for Irish soils – estimation of bulk density (ρb) per horizon type. Soil 2, 25-39. Shiri, J., Keshavarzi, A., Kisi, O., Karimi, S., Iturraran-Viveros, U., 2017. Modeling soil bulk density through a complete data scanning procedure: Heuristic alternatives. J. Hydrol. 549, 592-602. Suuster, E., Ritz, C., Roostalu, H., Reintam, E., KΓ΅lli, R., Astover, A., 2011. Soil bulk density pedotransfer functions of the humus horizon in arable soils. Geoderma 163, 74-82. Tranter, G., Minasny, B., McBratney, A.B., Murphy, B., McKenzie, N.J., Grundy, M., Brough, D., 2007. Building and testing conceptual and empirical models for predicting soil bulk density. Soil Use Manage. 23, 437–443. 26

Tian, Z., Kool, D., Ren, T., Horton, R., Heitman, J.L., 2019. Approaches for estimating unsaturated soil hydraulic conductivities at various bulk densities with the extended Mualem-van Genuchten model. J. Hydrol. 572, 719-731. Tian, Z., Lu, Y., Ren, T., Horton, R., Heitman, J.L., 2018. Improved thermo-time domain reflectometry method for continuous in-situ determination of soil bulk density. Soil Tillage Res. 178, 118–129. Viscarra Rossel, R.A., Lobsey, C.R., Sharman, C., Flick, P., McLachlan, G., 2017. Novel proximal sensing for monitoring soil organic C stocks and condition. Environ. Sci. Technol. 51, 5630-5641. Wang, X., Sun, X., Wang, H., 2019. Determination of soil bulk density with Gamma ray and visible-near infrared spectroscopy. Acta Pedologica Sinica (in Chinese with English abstract) Wells, L.G., Luo, X., 1992. Evaluation of gamma ray attenuation for measuring soil bulk density Part II. Field investigation. T. ASAE 35, 27–32. Yi, X., Li, G., Yin, Y., 2016. Pedotransfer functions for estimating soil bulk density: A case study in the Three-River headwater region of Qinghai province, China. Pedosphere 26, 362-373.

27

Table 1 Statistics of soil properties (n=120). Soil property

Min

Max

Mean

SD a

CV (%) b

Skewness

Kurtosis

BD c (gΒ·cm-3)

0.98

1.78

1.36

0.17

12.56

0.01

-0.59

OMC d (gΒ·kg-1)

2.37

46.87

20.48

12.24

59.79

0.31

-1.21

Sand (0.05-2 mm) (%)

31.18

71.5

46.22

8.94

19.34

0.66

-0.12

Silt (0.002-0.05 mm) (%)

20.29

54.24

38.16

6.22

16.29

-0.65

0.01

Clay (<0.002 mm) (%)

6.6

24.17

15.79

3.63

23.00

-0.15

-0.37

pH

3.84

6.51

4.68

0.47

10.11

1.43

2.83

Depth (cm)

10

60

20.11

10.91

54.26

1.22

1.36

ΞΈ e (%)

15.49

37.85

25.23

4.21

16.69

0.05

0.11

a:

Standard deviation.

b:

Coefficient of variation.

c:

Bulk density.

d:

Organic matter content.

e:

Volumetric water content.

28

Table 2 Correlations between soil properties (n=120). Soil property

BD (gΒ·cm-3)

OMC (gΒ·kg-1)

-0.83**

Sand (%)

0.13

0.022

Silt (%)

-0.21*

0.068

-0.93**

Clay (%)

-0.057

-0.10

-0.87**

0.70**

pH

0.29**

-0.33**

0.095

-0.18

0.007

Depth (cm)

0.59**

-0.74**

-0.11

0.018

0.19*

*:

OMC (gΒ·kg-1)

Sand (%)

Statistical significance at the level of 0.05.

**:

Statistical significance at the level of 0.01.

29

Silt (%)

Clay (%)

pH

0.34**

Table 3 Established PTFs for estimating BD and validation results based on calibration data. PTF

Formula

R2

Adj-R2

RMSE

RPD

PTF 1

1.59-0.012 Γ— OM

0.69

0.68

0.095

1.79

PTF 2

1.58-0.0058 Γ— Silt

0.045

0.037

0.17

1.03

PTF 3

0.87+0.11 Γ— pH

0.085

0.077

0.16

1.05

PTF 4

1.17+0.0093 Γ— Depth

0.35

0.35

0.14

1.25

PTF 5

1.76-0.011 Γ— OM-0.0043 Γ— Silt

0.71

0.71

0.091

1.87

PTF 6

1.40-0.0061 Γ— Silt+0.0094 Γ—

0.40

0.39

0.13

1.30

Depth

30

(a)

(b)

(c)

(d)

31

(e)

(f)

(g)

(h)

Fig. 1 Histograms of bulk density (BD, g cm-3), organic matter content (OMC, g kg-1), sand (%), silt (%), clay (%), pH, depth (cm) and volumetric water content (ΞΈ, %).

32

Fig. 2 Correlations between volumetric water content (ΞΈ, %) and vis-NIR reflectance at individual wavelengths. The red dashed and dotted horizontal lines represent statistical significant correlation values at levels of 0.05 and 0.01, respectively. The shaded indicate wavelengths where correlations are not significant.

33

(a)

(b)

(c) Fig. 3 Accuracy criteria for estimated volumetric water contents (ΞΈ, %) based on vis-NIRS spectra using PLSR. Dashed horizontal lines represent values of criteria computed based on LOOCV.

34

(a)

(b)

(c)

(d)

35

(e)

(f)

(g)

(h)

Fig. 4 Measured versus estimated bulk density (BD, g cm-3) based on LOOCV. The 1:1 lines are plotted for comparison of the data. PSS 1 and 2 in (g) and (h), respectively, represent BD estimated using PSS without and with corrections for water content based on vis-NIRS. BD estimated using PSS with corrections for water content based on oven-dry are also plotted as red circles in (g) and (h).

36

37

Fig. 5. A Taylor diagram for estimated BD using PTFs and PSS.

38

(a)

(b)

(c) Fig. 6 Accuracy criteria for estimated bulk density (BD, g cm-3) using different approaches.

39

Declaration of interests

β˜’ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

β˜’The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

40

Highlights 

Soil BD values estimated using PSS were at least 12% more accurate than using PTFs



PSS estimated BD with RMSE of about 0.082 g cmβˆ’3, R2 > 0.90 and RPD > 2.00



Accuracy of PSS estimated BD depends on accuracy of water content based on vis-NIRS



PSS is promising for estimating BD in future soil surveys

41