Spectral fingerprinting of soil organic matter composition

Spectral fingerprinting of soil organic matter composition

Organic Geochemistry 46 (2012) 127–136 Contents lists available at SciVerse ScienceDirect Organic Geochemistry journal homepage: www.elsevier.com/lo...

530KB Sizes 0 Downloads 151 Views

Organic Geochemistry 46 (2012) 127–136

Contents lists available at SciVerse ScienceDirect

Organic Geochemistry journal homepage: www.elsevier.com/locate/orggeochem

Spectral fingerprinting of soil organic matter composition Lauric Cécillon a,⇑, Giacomo Certini b, Holger Lange c, Claudia Forte d, Line Tau Strand a a

Department of Plant and Environmental Sciences, Norwegian University of Life Sciences, Box 5003, 1432 Ås, Norway Dipartimento di Scienze delle Produzioni Vegetali, del Suolo e dell’Ambiente Agroforestale (DiPSA), Università di Firenze, P.le Cascine 28, 50144 Firenze, Italy c Norwegian Forest and Landscape Institute, Box 115, 1431 Ås, Norway d Institute of Chemistry of Organometallic Compounds, National Research Council, via G. Moruzzi 1, 56124 Pisa, Italy b

a r t i c l e

i n f o

Article history: Received 3 May 2010 Received in revised form 15 February 2012 Accepted 17 February 2012 Available online 27 February 2012

a b s t r a c t Large scale environmental monitoring schemes would benefit from accurate information on the composition of soil organic matter (SOM), but so far routine procedures for describing SOM composition remain a chimera. Here, we present the initial assessment of a two step strategy for expeditious determination of SOM composition that involves: (i) building infrared fingerprints from near and mid infrared spectroscopies, two rapid and cheap yet reliable technologies; and (ii) calibrating such infrared fingerprints with multivariate chemometrics from a molecular mixing model based on the more expensive and time consuming 13C nuclear magnetic resonance technique, which discriminates five biochemical components: carbohydrate, protein, lignin, lipid and black carbon. We show fair to excellent predictive ability of the calibrated infrared fingerprints for four out of these five biochemical components, with cross-validated ratios of performance to inter-quartile distance from 3.2 to 8.3, on a small set of 23 soil samples with a wide range of organic carbon content (12–500 g/kg). Multivariate calibration models were highly selective (<2% of infrared data were used for all models). However, the specificity to one particular biochemical component of the infrared wavebands automatically selected by each model was relatively low, except for lipid. Achieving direct predictions of SOM composition on unknown soil samples with infrared spectroscopy alone will require further independent validation and a larger number of samples. Overall, the implementation of our strategy at a broader scale, based on available 13C nuclear magnetic resonance soil libraries, could provide a cost effective solution for the routine assessment of SOM composition. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction In soils, most carbon occurs as organic matter, which is currently described as a complex mixture of biopolymers (Kelleher and Simpson, 2006; Lehmann et al., 2008) with five main biochemical components: carbohydrate, protein, lignin, lipid and black carbon (Schmidt and Noack, 2000; Gleixner et al., 2001). The composition of soil organic matter (SOM) is tightly linked to key environmental issues, such as the global carbon budget, nutrient cycling or bioavailability of toxic chemicals in soils (Koelmans et al., 2006; Olk, 2006; Schmidt et al., 2011). All available methods for quantifying SOM composition share the disadvantage of being time consuming, expensive and sometimes difficult to compare (Kögel-Knabner, 2000; Grandy and Neff, 2008), and therefore, are all unsuitable for routine environmental monitoring. Several attempts have been made to find cost effective strategies allowing the determination of most soil biochemical components simultaneously, within a single fingerprint. For instance, analytical pyrolysis provides many molecular features of SOM (Grandy and Neff, ⇑ Corresponding author. Present address: Irstea, UR EMGR Ecosystèmes montagnards, 2 rue de la Papeterie-BP 76, Saint-Martin-d’Hères F-38402, France. Tel.: +33 476 762 787; fax: +33 476 513 803. E-mail address: [email protected] (L. Cécillon). 0146-6380/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.orggeochem.2012.02.006

2008; Rumpel et al., 2009). However, this technique has several limitations and does not allow the quantification of black carbon (Saiz-Jimenez, 1994). Designing a fingerprint for the quantification of SOM composition including black carbon is thus challenging, but requirements for such a routine method can be easily set out: rapid, cost effective, quantitative, reliable and universal. During the last decades, solid state 13C nuclear magnetic resonance spectroscopy (NMR) has emerged as an attractive nondestructive analytical method that provides detailed information on the chemistry of organic carbon (Baldock et al., 1992; Kögel-Knabner, 2000; Forte et al., 2006). The content of the five main biochemical components mentioned above in a soil sample can be quantified by applying a molecular mixing model (MMM) to its 13C NMR spectrum (Hedges et al., 2001; Nelson and Baldock, 2005). However, although this strategy has proven its efficiency in terms of rough quantification (Kaal et al., 2007), it remains inappropriate when designing a routine method as it is neither rapid nor cost effective. Conversely, near (NIR) and mid (MIR) infrared spectroscopies are reliable, rapid and cost effective techniques for a rough characterization of organic matter (Smith, 1999; Workman and Weyer, 2007). When used in conjunction with reference methods and multivariate statistical models, they become robust methods for quantifying many soil properties (Viscarra Rossel

128

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

et al., 2006; Cécillon et al., 2009), including carbon functional groups identified by 13C NMR (Leifeld, 2006; Terhoeven-Urselmans et al., 2006). Further improvements of infrared technology enabling robust predictions with a more detailed understanding of chemical bonds involved in calibration models are achievable using a data fusion technique called ‘‘outer product analysis’’ (OPA) which combines both NIR and MIR spectra of a sample into a single infrared fingerprint (Barros et al., 1997). However, the suitability of this data fusion procedure has never been tested for soil analysis, and the ability of infrared spectroscopy to simultaneously quantify a suite of soil biochemical components has not been investigated so far. This paper describes an initial assessment based on a small number of samples of a new routine procedure for determining SOM composition (i.e. soil content in carbohydrate, protein, lignin, lipid and black carbon) with a fusion of soil infrared spectroscopic data. We used the two step approach outlined in Fig. 1 to achieve this goal: (i) building infrared fingerprints of soil samples from the outer product of their NIR and MIR spectra; (ii) calibrating such infrared fingerprints for each biochemical component on reference values obtained from NMR–MMM. 2. Materials and methods 2.1. Soil sampling and chemical analysis Soil samples were collected in a typical montane heathland area of Norway, Storgama (59°020 N, 8°300 E, 560 m.a.s.l., mean annual temperature = 5 °C, mean annual precipitation = 994 mm), under three dominant vegetation types: heather (Calluna vulgaris L.) on podzols in well drained areas, white moss (Sphagnum spp.) on histosols in poorly drained areas, and moorgrass (Molinia coerulea L.) alternatively on podzols or histosols in intermediate areas (Strand et al., 2008). Three soil pits were dug down to the granite bedrock in each vegetation type (average soil depth = 43 cm, n = 9). Soil pits were sampled at three soil depths: top 5 cm, mid section and basal 5 cm of the soil profile, resulting in a total of 27 soil samples. Total carbon and nitrogen content of all samples were determined by dry combustion using a LECO CHN1000 analyser. The low pH of the investigated soils (pH < 5) guaranteed that all of the measured carbon was organic. 2.2. Spectroscopic measurements NIR and MIR measurements were performed on 1–2 g of each soil sample using diffuse-reflection with an Antaris II FT-NIR

analyser (64 scans recorded per sample, spectral range = 10,000– 4000 cm1, spectral resolution = 16 cm1), and diamond attenuated total reflection (ATR) with a Nicolet 8700 FT-IR analyser (32 scans recorded per sample, spectral range = 4000–700 cm1, spectral resolution = 4 cm1), respectively (Thermo Scientific). MIR spectral data in the diamond interference and CO2 region (2450– 1900 cm1) were discarded. Prior to infrared analyses, soil samples were air dried, finely ground by a ball mill and further dried overnight at 55 °C, to limit interferences with water without altering SOM. Solid state cross polarization-magic angle spinning (CPMAS) 13 C NMR spectra were recorded on a Bruker AMX 300-WB spectrometer, equipped with a 4 mm CPMAS probe. The operating frequencies were 300.13 and 75.47 MHz for 1H and 13C, respectively; the p/2 pulse was 3.4 ls on the 1H channel. A contact time of 2 ms and a relaxation delay of 4 s were used. The MAS speed was 8 kHz and the number of scans recorded for each sample ranged between 4800 and 40,000. Prior to NMR analysis, soil samples were treated with 2% HF according to Skjemstad et al. (1994). This preliminary treatment of samples is required to remove paramagnetic iron oxides, which cause broadened resonances and signal loss. Such a necessary treatment, however, may lead to formation of some organic artefacts or losses (Skjemstad et al., 1994; Dai and Johnson, 1999; see Section 3.3). Four low quality spectra (from samples with insufficient carbon content resulting in poor signal/noise ratio) were discarded, resulting in a total of 23 reliable NMR spectra. 2.3. Processing of spectral data NIR and MIR spectra were preprocessed using second derivatives (Savitsky–Golay filter; polynomial degree = 2; number of points = 21) and multiplicative scatter correction on the full spectrum. Fusion of NIR and MIR spectroscopic data was performed using OPA (Barros et al., 1997; Jaillais et al., 2005; Veselá et al., 2007) of the full rank scores from NIR and MIR principal component transform (PCT; Barros et al., 2008). For each of the 23 soil samples with reliable NMR spectra, infrared fingerprints were computed from the outer product of their PCT preprocessed NIR (23 scores) and MIR (23 scores) spectra, which resulted in 23 data matrices (23 rows by 23 columns). PCT is based on a full eigen decomposition of the NIR and MIR matrices before proceeding to OPA. When used in partial least squares regression (PLSR), PCT dramatically accelerates the cross-validation of the calibration models and is also parsimonious in computer memory requirements (Barros and Rutledge, 2004). Thus using PCT strongly sped up the

Fig. 1. General scheme of our routine procedure to determine SOM composition. (i) Two step strategy: building infrared fingerprints from NIR and MIR spectra, and calibrating them using reference values of SOM composition inferred from NMR–MMM using partial least squares regression (thin arrows). (ii) Routine application: routine use of a validated spectral library for the prediction of SOM composition on unknown samples with infrared fingerprints alone (bold dotted arrows). Abbreviations: SOM = soil organic matter; IR = infrared; NIR = near infrared; MIR = mid infrared; NMR = nuclear magnetic resonance; MMM = molecular mixing model; PLSR = partial least squares regression.

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

calculation without altering the results: small sized matrices were obtained, each one summarizing exactly the whole NIR and MIR information of the soil sample (529 outer product variables in the PCT space instead of 1,076,262 in the original space). Further details on the procedure for computing infrared fingerprints are provided in the Appendix (see Section A.1 and Fig. A1), and the R script of PCT-OPA-PLSR is provided as Supplementary material. The signals from all 13C NMR spectra were divided into seven chemical shift regions accounting for specific carbon functional groups: alkyl (0–45 ppm), N-alkyl or methoxy (45–60 ppm), O-alkyl (60–95 ppm), O2-alkyl (95–110 ppm), aromatic (110–145 ppm), O-aromatic (145–165 ppm), and carbonyl (165–210 ppm). For each soil sample, SOM composition (percentage of each biochemical component) was inferred with a molecular mixing model (MMM; Hedges et al., 2001; Nelson and Baldock, 2005; Kaal et al., 2007) using the percentage of alkyl, O-alkyl, aromatic and carbonyl shift regions, together with the N:C ratio as input variables, and MMM parameters defined by Baldock et al. (2004). Percentages of carbohydrate, protein, lignin, lipid and black carbon calculated by NMR–MMM were further transformed into soil contents (g/kg) by multiplying them by the total organic carbon content of each soil sample (Leifeld, 2006; Table 1). Further details on the implementation of the MMM and an estimation of the error associated with the MMM algorithm are reported in the Appendix (see Section A.2). The calibration of infrared fingerprints with the NMR–MMM reference values of SOM composition was achieved through PLSR (Tenenhaus, 1998). One PLSR model was built for each soil biochemical component, using a selection of the most important outer product variables as predicted by the variable importance on the projection (VIP) method (Tenenhaus, 1998; Cécillon et al., 2008). The VIP method computed one score for each of the 1,076,262 outer product variables (original space) corresponding to a measure of its importance in one PLSR model. Only influential variables with high VIP scores were kept in each model (see the Appendix, Section A.3; Cécillon et al., 2008). A new PLSR was then performed with the selected outer product variables. For each biochemical component, PLSR-VIP models were also built from MIR and NIR spectra alone, so as to assess the usefulness of implementing infrared fingerprints (NIR and MIR data fusion with OPA) for the determination of SOM composition. The prediction performance of each obtained PLSR model was assessed by a full-model leave-one-out cross-validation (n = 23), which is the most robust validation tool for small data sets in quantitative multivariate modeling (Martens and Dardenne, 1998), with the examination of the following statistics: the root-mean-square error of cross-validation (RMSECV; expressed as a proportion of the mean of reference values), the cross-validated coefficient of determination (Q2), the ratio of performance-to-deviation (RPD; computed as the ratio of the Table 1 Summary statistics of reference data (n = 23).a Property

Unit

Min

Max

Mean

sd

R2 with Corg

Corg Ntot Carbohydrate Protein Lignin Lipid Black C Carbohydrate Protein Lignin Lipid Black C

g/kg g/kg % % % % % g/kg g/kg g/kg g/kg g/kg

11.6 0.3 2.3 10.6 0 13.1 0 0.9 1.2 0 6.4 0

499.9 31.6 51.5 23.8 38.3 60.1 14.1 241.7 111.4 126.5 259.7 66.2

295.7 15.7 21.3 18.2 10.2 41.1 7.7 76.6 55.8 22.5 111.1 25.8

200.7 11.2 11.7 3.6 11.1 14.7 3.8 71.1 39.7 32.8 86.7 21.3

– 0.75 0.42 0.07 0.06 0.16 0.19 0.79 0.78 0.02 0.55 0.68

Abbreviations: sd = standard deviation; Corg = organic carbon; Ntot = total nitrogen. a Reference values for the five biochemical components were calculated from the NMR–MMM approach (n = 23; see Section 2.3 and the Appendix, Section A.2).

129

standard deviation of reference values to the RMSECV), the ratio of performance to inter-quartile distance (RPIQ; computed as the ratio of the inter-quartile distance of reference values IQ = Q3  Q1 to the RMSECV; Bellon-Maurel et al., 2010), the bias (computed as the difference of the mean of the predicted vs. the mean of the reference values; Bellon-Maurel et al., 2010) and the detailed analysis of the outer product variables selected by the VIP method for each PLSR model. The latter analysis consisted in attributing chemical bonds or carbon functional groups from the literature (Smith, 1999) to the main selected rows of infrared fingerprints (corresponding to MIR wavebands) for each PLSR model, so as to assess the agreement with the chemical structure of each investigated biochemical component. Although the NIR region contains meaningful information regarding the chemical structure of organic matter, the examination of selected MIR wavebands was preferred to that of NIR, so as to take advantage of the easier assignment of wavebands to specific chemical functional groups in the MIR than in the NIR region (Smith, 1999; Workman and Weyer, 2007), especially for spectra of complex mixtures such as soils. 3. Results and discussion 3.1. Predictive performance of infrared spectroscopies Infrared fingerprint determinations vs. NMR–MMM reference values for the five biochemical components investigated in our soil sample set are plotted in Fig. 2. The summary statistics of reference data are reported in Table 1 and the cross-validated statistics of PLSR models for infrared fingerprints (NIR–MIR), NIR and MIR alone are reported in Table 2. A strong predictive ability of infrared fingerprints was found for carbohydrate and protein content. Their PLSR models used 3 and 2 latent variables (LV) respectively, and showed cross-validated coefficients of determination (Q2) above 0.9, ratios of performance to inter-quartile distance (RPIQ) above 5, and root-mean-square errors of cross-validation (RMSECV) below 30% (Table 2; Fig. 2a and b). Reasonable cross-validated statistics were obtained for the prediction of lipid content with Q2 of 0.77, RPIQ of 3.9, RMSECV of 37% with only 2 LV (Table 2; Fig. 2d). For black carbon, infrared fingerprints showed rather low cross-validated PLSR statistics with Q2 of 0.66, RPIQ of 3.2, RMSECV of 47% with 2 LV (Table 2; Fig. 2e). Conversely, PLSR models built with infrared fingerprints showed no prediction performance for lignin content, with 1 LV, Q2 close to 0, RPIQ below 1.5, and a very high RMSECV (140%; Table 2; Fig. 2c). This can be easily explained by the number of zero values (9 among 23 samples) calculated for lignin by the NMR molecular mixing model (Fig. 2c). Thus, except for lignin content, all prediction models built with infrared fingerprints could be considered as fair (black carbon) to excellent (carbohydrate, protein, lipid) according to Chang et al. (2001), and considering the heterogeneity of our sample set depicted by a wide range of content for all biochemical components (Table 1). However, those cross-validated statistics might be overoptimistic because of (i) the small size of our data set (23 samples) and (ii) the lack of an independent validation set. For the five biochemical components investigated, the predictive performance of PLSR models built from NIR–MIR infrared fingerprints clearly outperformed that of PLSR models built from NIR spectra alone (Table 2). Conversely, the predictive performance of PLSR models built from infrared fingerprints or from MIR spectra alone were very similar (Table 2), thereby confirming that MIR is often more accurate than NIR to study soil carbon on ground and dry soils under laboratory conditions (Reeves III, 2010). However, using MIR spectra alone may be questionable on soil samples with contrasting particle size and moisture content, which justifies the fusion of NIR and MIR data when designing a routine method to study SOM (see e.g. Bellon-Maurel and McBratney, 2011).

0

50

100

150

200

250

300 250 150

200

Q² = 0.77 RMSECV = 36.9% RPIQ = 3.9

50

100

g.kg−1

Lipid

d

0

50 0

a

IR PCT−OPA−PLSR predicted values

Q² = 0.90 RMSECV = 28.5% RPIQ = 5.2

150

200

Carbohydrate

100

g.kg−1

250

300

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

IR PCT−OPA−PLSR predicted values

130

0

300

0

50

100

150

200

250

300

60 40 20

g.kg−1

Q² = 0.66 RMSECV = 46.8% RPIQ = 3.2

e 0

150

20

40

60

Reference values g.kg−1

Lignin Q² = 0.03 RMSECV = 140% RPIQ = 1.4

50

g.kg−1

100

150

Black C

Reference values g.kg−1

c

0

IR PCT−OPA−PLSR predicted values

100

0

b

IR PCT−OPA−PLSR predicted values

150 100

Q² = 0.93 RMSECV = 18% RPIQ = 8.3

50

g.kg−1

IR PCT−OPA−PLSR predicted values

Protein

0

50

Reference values g.kg−1

Reference values g.kg−1

0

50

100

150

Reference values g.kg−1 Fig. 2. Scatter plots of infrared fingerprint predicted vs. calculated NMR–MMM reference values for biochemical components content, and cross-validated statistics of PLSR models. The dashed lines indicate 1:1. See also Table 2 for additional cross-validated statistics of PLSR models.

Table 2 Cross-validated statistics of PLSR models built from infrared spectra (n = 23).a Property (g/kg)

Carbohydrate Protein Lignin Lipid Black C

NIR

MIR

NIR–MIR fingerprints

Q2

LV

RMSECV

Q2

LV

RMSECV

Q2

LV

RMSECV

Bias

RPD

RPIQ

0.76 0.79 0.14 0.53 0.61

1 1 1 1 1

44.8 32.1 131.8 52 50

0.90 0.93 0.06 0.78 0.67

2 2 1 2 1

28.4 17.8 137.8 35.8 46.5

0.90 0.93 0.03 0.77 0.66

3 2 1 2 2

28.5 18 140 36.9 46.8

2.3 0.1 0.0 0.4 0.1

3.3 3.9 1.0 2.1 1.8

5.2 8.3 1.4 3.9 3.2

a Abbreviations: Q2 = cross-validated R2; LV = number of latent variables; RMSECV (%) = root-mean-square error of cross-validation (expressed as a proportion of the mean of reference values); RPD = ratio of performance-to-deviation; RPIQ = ratio of performance to inter-quartile distance (see Section 2.3; Bellon-Maurel et al., 2010).

3.2. Outer product variables selected for each PLSR model Only a fraction of infrared fingerprints were used within each PLSR model (see Section 2.3). Actually, as shown in Fig. 3, all PLSR models were highly selective since most of the 1,076,262 infrared

outer product variables were not used by the four reliable PLSR models (98.3%; greyish pixels in Fig. 3). Among the remaining 1.7% of the infrared variables, there were important overlaps between each PLSR model (infrared variables used in more than one PLSR model represented 1.2%; white pixels in Fig. 3), showing

131

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

Fig. 3. Matrix representing the infrared outer product variables automatically selected for the PLSR model of each biochemical component. Selection of the outer product variables is based on their PLSR-VIP score for each biochemical component (see Section 2.3 and the Appendix, Section A.3). Overlapping variables are variables selected for more than one biochemical component. Spectra on top and on the right hand are illustrations of the second derivatives of the NIR and MIR spectra, respectively, of one soil sample. Assignments of selected MIR wavebands to chemical functional groups are provided in Table 3.

a relatively low specificity of the selected infrared wavebands to each biochemical component. The PLSR model for lipid was the more specific with 30% of specific infrared wavebands (among 9756 outer-product variables selected for this model; see below). Conversely, PLSR models for carbohydrate, protein and black carbon were much less specific with 10%, 12% and 2% of specific infrared wavebands, respectively. The PLSR models for carbohydrate, protein, lipid and black carbon used respectively 0.9%, 1.3%, 0.9%, and 0.9% of the total infrared variables (red, blue, yellow and black

pixels in Fig. 3, respectively). The PLSR model for lignin used 1.5% of infrared variables, but we did not consider it in this paragraph or represent it in Fig. 3 since it was judged not reliable (see Section 3.1). The 1035–1018 cm1 MIR spectral region was selected for all biochemical components by the PLSR-VIP method and represented at least 35% of the selected outer-product variables (Table 3). This infrared waveband can be assigned to C–O bonds of alcohol, C–O–C bonds of ethers and Si–O–Si bonds of silica (Smith, 1999). Thus, as infrared spectra were measured on bulk soils, this waveband probably reflects SOM (mostly carbohydrates, which comprise numerous alcohol groups and C–O–C bonds), but also soil minerals such as phyllosilicates which may be tightly associated to SOM (Gleixner et al., 2001; von Lützow et al., 2006; Schmidt et al., 2011). Interestingly, and even if it was widely used by all models, this waveband was even more associated to the carbohydrate component (70% of the 10,024 outer-product variables selected for this model; Table 3). No absorbance in this MIR region was expected for black carbon (Bornemann et al., 2008; Cheng et al., 2008), although residual absorbance from the initial material can be found at low charring temperatures (Keiluweit et al., 2010). Therefore, the strong influence of this waveband to predict black carbon content (Table 3) may be due to mineral absorbance, suggesting an intimate association between clay minerals and this stable carbon pool. Regarding black carbon, even if the PLSR model was the least specific (see above), it is interesting to notice that the few specific infrared variables selected for this biochemical component were all from the 814–818 cm1 region, a MIR waveband assigned to aromatic CH which is often used to assess the degree of condensation of aromatic C in black carbon (Fig. 3; Smith, 1999; Keiluweit et al., 2010). The 1485–1480 cm1 MIR spectral region was selected for carbohydrate, protein and black carbon components, but represented <15% of the selected outer-product variables for these three models (Table 3). This infrared waveband could be assigned to methoxy group or aliphatic CH3, which can be present in numerous soil organic compounds such as carbohydrates, proteins, lipids and lignins (Smith, 1999; Gleixner et al., 2001). One striking result was that the selected MIR wavebands for the lipid biochemical component correspond mainly to aliphatic CH2 groups (65% of the selected variables for this model; Table 3). Aliphatic CH2 groups are the principal characteristics and specific

Table 3 MIR wavebands selected for the prediction of each biochemical component and assigned chemical functional groups.a Biochemical component

Main MIR wavebands (cm1) and assigned chemical functional group

Carbohydrate

Protein

Lipid

Black carbon

1035–1018 (ca. 70%) = Primary

1035–1018 (ca. 47%) = Primary

2926–2916 (ca. 45%) = Aliphatic

1035–1020 (ca. 65%) = Primary

Alcohol (C–O stretching) and/or

Alcohol (C–O stretching) and/or

CH2 (CH stretching)

Alcohol (C–O stretching) and/or

Ether group (C–O–C) and/or Silica (Si–O–Si)

Ether group (C–O–C) and/or

1030–1022 (ca. 35%) = Primary

Ether group (C–O–C) and/or

Silica (Si–O–Si)

Alcohol (C–O stretching) and/or

Silica (Si–O–Si)

Ether group (C–O–C) and/or Silica (Si–O–Si) 2852–2848 (ca. 20%) = Aliphatic CH2 (CH stretching) Additional MIR wavebands (cm1) and assigned chemical functional group

1485–1480 (ca. 8%) = Methoxy

1485–1480 (ca. 15%) = Methoxy

group (CH3 Umbrella mode)

group (CH3 Umbrella mode)

group (CH3 Umbrella mode)

and/or Aliphatic CH3 (CH3 bending)

and/or Aliphatic CH3 (CH3 bending)

and/or Aliphatic CH3 (CH3 bending)

2925–2918 (ca. 15%) = Aliphatic

818–814 (ca. 15%) = Aromatic

1485–1480 (ca. 15%) = Methoxy

CH2 (CH stretching)

CH (CH bending)

818–814 (ca. 6%) = Aromatic CH (CH bending)

2922 (ca. 5%) = Aliphatic CH2 (CH stretching)

a According to Smith (1999). Percentages (in brackets) indicate the importance of a specific MIR waveband for the prediction of a biochemical component (i.e. proportion of the infrared outer product variables used for a PLSR model).

132

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

chemical structures of the most important lipid substances: glycerides (which encompass fatty acids, waxes and related compounds such as cutin and suberin; Gleixner et al., 2001). Therefore, the relatively good specificity of the PLSR model for lipid discussed above is associated with a chemically meaningful selection of infrared wavebands (Table 3; Fig. 3). 3.3. Methodology characterization 3.3.1. Time and money Examining whether our two step strategy fulfills all requirements of a routine quantification method for SOM composition, we find that building infrared fingerprints (first step) is straightforward, and can be considered as rapid and cost effective, since only half a day was necessary for scanning our sample set. Conversely, implementing NMR–MMM to calibrate infrared fingerprints (second step) is a limiting factor; the acquisition of high quality NMR spectra requires time (in most cases 24–48 h per sample), expertise and expensive equipment. 3.3.2. Quantitative reliability Despite the limited number of samples in our data set, we showed that PLSR models built from infrared fingerprints could be satisfactorily used for four biochemical components (Fig. 2). The weak results for lignin content can be explained by the number of zero values (9 among 23 lignin values; Fig. 2c) calculated by the NMR–MMM, which is detrimental to the quality of the PLSR model. But as mentioned above, it should be noted that the cross-validated PLSR statistics presented in this paper may be overoptimistic and are not a definitive proof of model reliability (see Section 3.1). Indeed, it is very difficult to estimate the real error associated with the computing of PLSR models based on 23 samples only. One other recurrent question regarding the application of PLSR in quantitative infrared spectroscopy concerns the degree of certainty in directly predicting a sample property rather than the correlation of this property with another sample component. By definition of their calculation (see Section 2.3) and of their chemical nature, most biochemical components are correlated to organic carbon, except for lignin, again because of the number of zero values (Table 1). But the coefficients of determination between organic carbon and biochemical components are lower than the Q2 values obtained by PLSR models using infrared fingerprints, except for black carbon (Tables 1 and 2). Furthermore, the specificity of infrared wavebands automatically selected for the prediction of carbohydrate (mainly assigned to C–O bonds) and lipid (mainly assigned to CH2 functional groups) contents is an important argument for the reliability of our method (Table 3). However, we also observed that the chemical assignment of infrared wavebands could be difficult when working with bulk soils (because of mineral absorbance), and that one non-specific waveband at 1025 cm1 dominated most PLSR models except for the prediction of lipid content. The reliability of our method is also strongly dependent on the quality of the NMR–MMM as a reference method to infer SOM composition. A recent first attempt in comparing NMR–MMM with other techniques such as analytical pyrolysis (Py–GC–MS) was moderately successful: quantitative differences between the two techniques can be sometimes very large, although a generally good relation was found (Kaal et al., 2007). Errors in the NMR–MMM approach may have three main origins which have been recently analyzed by Hockaday et al. (2009): (i) the definition of the MMM algorithm, (ii) the NMR technique used and (iii) the HF treatment of soil samples. First, the MMM algorithm is based on the assumption that SOM is a mixture of a few biochemical components with typical elemental compositions and 13C NMR characteristics, which is a rough classification of SOM, although rather close to recent

experimental results (Baldock et al., 2004; Nelson and Baldock, 2005; Kelleher and Simpson, 2006; Lehmann et al., 2008). Actually, most components are heterogeneous with a variety of chemical structures; for example black carbon encompasses a continuum of aromatic structures with varying O:C and H:C ratios, each having specific behavior in soils (Schmidt and Noack, 2000). But another source of uncertainty is directly linked to the mathematical implementation of the algorithm used to solve the system of simultaneous equations of the MMM. Implementing the generalized reduced gradient algorithm for nonlinear optimization (GRG2; Microsoft Office Excel; the classical implementation of the MMM allowing constraints of positivity for biochemical components; see e.g. Rodríguez-Murillo et al., 2011) yields slightly different results than the approach used in this study (based on the Moore– Penrose generalized inverse matrix; see the Appendix, Section A.2). We estimated the root mean square error of both algorithms and found that the Moore–Penrose generalized inverse matrix approach reached a similar accuracy and could even outperform the classical GRG2 approach (see the Appendix, Section A.2). Even if analysis of such differences is beyond the scope of this paper, identifying the best implementation of the MMM algorithm is an important issue, the GRG2 algorithm being physically optimal, the Moore–Penrose algorithm being mathematically optimal. A second source of uncertainty in the NMR–MMM is the choice of the 13C NMR polarization technique and associated acquisition parameters such as contact time and relaxation delay. Direct polarization NMR (also known as Block decay or single pulse excitation) is intrinsically more quantitative than cross polarization experiments which may lead to significant underestimation of particular species such as aromatic C (Smernik and Oades, 2000; Hockaday et al., 2009). Therefore, the use of CPMAS in the NMR–MMM could lead to systematic weaker result for infrared PLSR prediction of black carbon and lignin, which are characterized by high levels of aromaticity (Gleixner et al., 2001). Finally, the HF treatment used to remove magnetic species from mineral soils generates a third source of uncertainty for the NMR–MMM. HF treatment typically removes 10–30% of organic carbon in mineral soils (Hockaday et al., 2009). Quantitative changes of carbon distribution are not always observed, although preferential loss of carbohydrate and carboxyl C has been reported (Dai and Johnson, 1999; Rumpel et al., 2006; Knicker, 2011). The possible modification of SOM composition after HF treatment raises an important question regarding the reliability of our approach. Since IR spectra were measured from untreated samples, we cannot guarantee that samples measured by NIR, MIR (untreated) and NMR (HF treated) were chemically the same. 3.3.3. Universality The last criterion for assessing our method is its universality, which is obviously not achieved yet, since our sample set is coming from the same Norwegian site. However, soils were sampled under three different vegetation types and within various soil horizons (see Section 2.1), resulting in a large range of organic carbon content (Table 1), which allowed us to test our strategy on a widely heterogeneous data set. 3.4. Necessary improvements of the method Hitherto, two main limitations remain: the lack of rapidity and cost effectiveness (essentially related to NMR spectra acquisition), and the lack of universality (i.e. a method validated on unknown samples from various locations and soil types, and a method able to predict also lignin). But these two limitations could be solved at the same time by enlarging our spectral database with additional samples representing a wide range of ecosystems. Such a spectral library would be made of NIR, MIR and NMR spectra of

133

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

the same soil samples, and when large and diverse enough, it will be possible to predict SOM composition of unknown samples with infrared fingerprints alone (Fig. 1). Obtaining infrared spectra is straightforward, but obtaining a large NMR spectral database is a huge task and may limit the applicability of our strategy. We thus suggest a re-use of soil samples already characterized by solid state 13 C NMR. An attempt at collecting such materials from colleagues worldwide, and at archiving them into an interactive database of SOM characteristics has started within an open project of the European MOLTER network (http://www.molter.no/). This open project has recently produced a prototype of an online computational platform designed to foster molecular level studies on SOM, as recently claimed by Schmidt et al. (2011). All spectral data and associated metadata of this study can be freely downloaded and processed online from this computational platform (http://molterdb.irstea.fr/; platform kick-off April 2012). Our objective is now to collect new samples and use them to calibrate infrared fingerprints as described in this paper, so as to provide an independent validation of our method. 4. Conclusions Despite the fact that the number of samples is too low to perform a fully validated methodology, we have provided the initial assessment of a novel routine method enabling the prediction of the main biochemical components in soils. The availability of such fast track quantitative methods for determining SOM composition would profoundly enhance the accuracy of environmental monitoring efforts in many fields. Applying our strategy could speed up data transfer from soil monitoring networks to dynamic models and enable the realization of detailed maps of soil organic matter composition. However, additional efforts are needed to validate this method on a broader scale. Acknowledgments This study was financially supported by the University of Life Sciences (UMB, Ås, Norway). S. De Danieli (Irstea, Grenoble, France) and S. Le Bras (Thermo Fisher Scientific, France) are thanked for their help in collecting NIR and MIR spectra of the soil samples. D. Rutledge and C. Cordella (AgroParisTech, France) provided useful advice on the outer product analysis of infrared spectra. P. Bellamy (Cranfield University, UK) is thanked for improving the English of an earlier draft of this manuscript. E. Ancelet, F. Bray (Irstea, Grenoble, France), D. Rasse (Bioforsk, Ås, Norway) and colleagues from the ESF-MOLTER network (http://www.molter.no/) are thanked for their help and support in designing the MOLTER online computational platform. M.B. Yunker, A.S. Barros and two anonymous reviewers are thanked for their constructive comments which strongly improved this paper. Appendix A A.1. Procedure for computing infrared fingerprints For the n samples of our data set, second derivative NIR (r wavenumbers) and MIR (c wavenumbers) signals are made positive, by adding to each signal the absolute value of the minimum of all derivative spectra. NIR and MIR signals were then decomposed using principal component transform (PCT; Barros et al., 2008). The full rank scores from one spectral domain (NIR, n scores) are multiplied by all the full rank scores from the other spectral domain (MIR, n scores) individually (direct or outer product), resulting in a set of n data matrices containing all possible products of the intensities in the two domains (Fig. A1a; Jaillais et al., 2005;

this figure illustrates outer product analysis in the original space). These data matrices are then unfolded to give n row-vectors of length n  n which are concatenated row-wise to produce the (n, n  n) matrix which will be called the outer product matrix in the PCT space (Fig. A1b; Jaillais et al., 2005). The statistical analysis of this outer product matrix, using PLSR and the VIP method (see Section 2.3), produces vectors such as VIP scores which can be folded back to give a Z matrix of dimensions (n, n) in the PCT space, that can be transformed in a Z matrix of dimensions (r, c) in the original NIR–MIR space using the transformation presented in Barros et al., 2008 (see Supplementary material for the R script of this transformation). This Z matrix in the original space may be used to highlight the outer product variables that were selected for each biochemical component by the VIP method (Fig. 3). A.2. Implementation of the NMR molecular mixing model (MMM), and estimation of the associated error In the NMR–MMM approach, the proportion of each biochemical component is calculated by solving Eqs. (A.1)–(A.6) simultaneously (Nelson and Baldock, 2005):

aþbþcþdþeþf ¼1

ðA:1Þ

ana þ bnb þ cnc þ dnd þ ene þ fnf ¼ nsample

ðA:2Þ

aaa þ bab þ cac þ dad þ eae þ f af ¼ asample

ðA:3Þ

aba þ bbb þ cbc þ dbd þ ebe þ f bf ¼ bsample

ðA:4Þ

aca þ bcb þ ccc þ dcd þ ece þ f cf ¼ csample

ðA:5Þ

ada þ bdb þ cdc þ ddd þ ede þ f df ¼ dsample

ðA:6Þ

with a, b, c, d, e and f = the proportions of biochemical components. a = carbohydrate (representing cellulose, hemi-cellulose, mucopolysaccharides and smaller molecular weight saccharides); b = protein (representing proteins, peptides and amino acids); c = lignin; d = lipid (representing cutin, suberin and aliphatic membrane components); e = black carbon, f = an additional pure carbonyl component (so as to allow the presence of additional carboxyl carbon in the components a, b, c, d). Neither chitin nor glycoproteins are included as model components, because they have 13 C NMR spectra and N:C ratios intermediate between, and indistinguishable from a mixture of protein and carbohydrate (Nelson and Baldock, 2005). n = N:C ratio. a = proportion of total NMR signal in the 0–45 ppm region. b = proportion of total NMR signal in the 60–95 ppm region. c = proportion of total NMR signal in the 110–145 ppm region. d = proportion of total NMR signal in the 165–215 ppm region. This linear system of equations can be rewritten as a matrix Eq. (A.7):

0

1 B na B B B aa B Bb B a B @ ca

1

1

1

1

1

nb

nc

nd

ne

nf

da

db

ab ac ad ae af bb

bc

bd

be

bf

cb cc cd ce cf dc

dd

de

df

1 0 1 1 0 1 a C BbC B nsample C C B C C B C B C C B C BcC C Ba C  B C ¼ ½A  ½x ¼ ½k ¼ B sample C C BdC C Bb C B C B sample C C B C C B A @eA @ csample A dsample f ðA:7Þ

This matrix Eq. (A.7) is then solved through computing the Moore–Penrose generalized inverse of the matrix A (Eq. (A.8)):

½x ¼ ½Aþ  ½k

ðA:8Þ

134

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

Fig. A1. Detail of the outer product analysis of infrared dataa. (a) Building infrared fingerprints from NIR (r wavenumbers) and MIR (c wavenumbers) spectra. (b) Calibrating infrared fingerprints with NMR–MMM reference values of SOM composition using multivariate statistics (PLSR). aExample of an outer product analysis in the original space (r  c); reprinted from Jaillais et al. (2005), with permission from Elsevier.

The few slightly negative values obtained for the proportions of some biochemical components (a–f) in certain soil samples were set to zero, and the proportions of all biochemical components were subsequently redistributed so as to make their sum match 100%. The computation of the MMM using the Moore–Penrose generalized inverse matrix approach as described above (using Eq. (A.8)) gives an exact solution to the system of simultaneous equations. However, the above mentioned adjustments are a source of significant error that should be estimated. We estimated the error of this approach by (i) recomputing the N:C ratio and the proportions of total NMR signal in the 0–45 ppm, 60–95 ppm, 110–145 ppm and 165–215 ppm regions using Eq. (A.9), with [xpred] being [x] after the above mentioned adjustments; (ii) computing the root mean square error of prediction (RMSEP) for the N:C ratio and the four proportions of total NMR signal (i.e. by comparing [kpred] vs. [k]; see Eq. (A.9)).

½kpred  ¼ ½A  ½xpred 

ðA:9Þ

The RMSEPs, expressed as a percentage of the mean, were 5.4%, 4.8%, 1.9%, 8.5% and 10.9% for the N:C ratio and the proportions of total NMR signal in the 0–45 ppm, 60–95 ppm, 110–145 ppm and 165–215 ppm regions, respectively.

Using the same strategy, we also compared the error of this MMM algorithm based on the Moore–Penrose generalized inverse matrix, with the classical GRG2 (Microsoft Office Excel) generalized reduced gradient algorithm for nonlinear optimization algorithm commonly used to compute the MMM (see e.g. RodríguezMurillo et al., 2011). Thus, in Eq. (A.9) we just replaced [xpred] as predicted by the GRG2 approach (which used constraints of positivity for all biochemical components). The RMSEPs of the GRG2 approach, expressed as a percentage of the mean, were respectively 10.3%, 6.3%, 0.4%, 3.2% and 10.0% for the N:C ratio and the proportions of total NMR signal in the 0–45 ppm, 60–95 ppm, 110–145 ppm and 165–215 ppm regions. Thus, the RMSEP of the GRG2 approach was higher than with the Moore–Penrose generalized inverse matrix approach for the N:C ratio, but RMSEPs of both algorithms were rather similar for the four NMR chemical shift regions. However, it should be noted that the GRG2 algorithm did not reach convergence and produced a warning message in most cases (for 15 samples out of 23). A.3. Software and statistics NMR raw data (i.e. free induction decay signals) were processed with the MestReNova software (MestreLab Research). Processing of NMR data included (i) applying a Fourier transform to the free

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

induction decay signals, (ii) apodization, (iii) baseline correction, (iv) and spectral offset frequency correction. Integrals for each chemical shift region of NMR spectra, NMR–MMM, PCT-OPA-PLSR, and all statistical treatments were computed with the R software version 2.14 (R Development Core Team, 2011), using the RStudio integrated development environment (http://rstudio.org/), the hyperSpec package from C. Beleites, the StreamMetabolism package from S.A. Sefick Jr., the pls package (Mevik and Wehrens, 2007), and a modification of the VIP algorithm of Chong and Jun (2005) first implemented in R by B.H. Mevik. The R script for PCT-OPA-PLSR is provided as Supplementary material. VIP threshold values for selecting variables in infrared fingerprints (outer product of NIR and MIR spectra) were tuned so as to optimize the prediction performance of each PLSR model. VIP threshold values of 3.35, 3.35, 2.5, 3.5 and 3.3 for the PLSR models of carbohydrate, protein, lignin, lipid, and black carbon respectively were arbitrarily chosen from the examination of false-color maps of VIP scores (original space). Appendix B. Supplementary material Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.orggeochem.2012.02.006.

Associate Editor—Mark Yunker

References Baldock, J.A., Oades, J.M., Waters, A.G., Peng, X., Vassallo, A.M., Wilson, M.A., 1992. Aspects of the chemical structure of soil organic materials as revealed by solidstate 13C NMR spectroscopy. Biogeochemistry 16, 1–42. Baldock, J.A., Masiello, C.A., Gélinas, Y., Hedges, J.I., 2004. Cycling and composition of organic matter in terrestrial and marine ecosystems. Marine Chemistry 92, 39– 64. Barros, A.S., Rutledge, D.N., 2004. Principal components transform-partial least squares: a novel method to accelerate cross-validation in PLS regression. Chemometrics and Intelligent Laboratory Systems 73, 245–255. Barros, A.S., Safar, M., Devaux, M.F., Robert, P., Bertrand, D., Rutledge, D.N., 1997. Relations between mid-infrared and near-infrared spectra detected by analysis of variance of an intervariable data matrix. Applied Spectroscopy 51, 1384– 1393. Barros, A.S., Pinto, R., Jouan-Rimbaud Bouveresse, D., Rutledge, D.N., 2008. Principal component transform – outer product analysis in the PCA context. Chemometrics and Intelligent Laboratory Systems 93, 43–48. Bellon-Maurel, V., McBratney, A., 2011. Near-infrared (NIR) and mid-infrared (MIR) spectroscopic techniques for assessing the amount of carbon stock in soils – critical review and research perspectives. Soil Biology & Biochemistry 43, 1398– 1410. Bellon-Maurel, V., Fernandez-Ahumada, E., Palagos, B., Roger, J.M., McBratney, A., 2010. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. Trends in Analytical Chemistry 29, 1073–1081. Bornemann, L., Welp, G., Brodowski, S., Rodionov, A., Amelung, W., 2008. Rapid assessment of black carbon in soil organic matter using mid-infrared spectroscopy. Organic Geochemistry 39, 1537–1544. Cécillon, L., Cassagne, N., Czarnes, S., Gros, R., Brun, J.J., 2008. Variable selection in near infrared spectra for the biological characterization of soil and earthworm casts. Soil Biology & Biochemistry 40, 1975–1979. Cécillon, L., Barthès, B.G., Gomez, C., Ertlen, D., Genot, V., Hedde, M., Stevens, A., Brun, J.J., 2009. Assessment and monitoring of soil quality using near infrared reflectance spectroscopy (NIRS). European Journal of Soil Science 60, 770–784. Chang, C.W., Laird, D.A., Mausbach, M.J., Hurburgh Jr., C.R., 2001. Near-infrared reflectance spectroscopy – principal components regression analyses of soil properties. Soil Science Society of America Journal 65, 480–490. Cheng, C.H., Lehmann, J., Engelhard, M.H., 2008. Natural oxidation of black carbon in soils: changes in molecular form and surface charge along a climosequence. Geochimica et Cosmochimica Acta 72, 1598–1610. Chong, I.G., Jun, C.H., 2005. Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems 78, 103–112. Dai, K.H., Johnson, C.E., 1999. Applicability of solid-state 13C CP/MAS NMR analysis in Spodosols: chemical removal of magnetic materials. Geoderma 93, 289–310. Forte, C., Piazzi, A., Pizzanelli, S., Certini, G., 2006. CP MAS 13C spectral editing and relative quantitation of a soil sample. Solid State Nuclear Magnetic Resonance 30, 81–88.

135

Gleixner, G., Czimczik, C.J., Kramer, C., Lühker, B., Schmidt, M.W.I., 2001. Plant compounds and their turnover and stabilization as soil organic matter. In: Schulze, E.D., Heimann, M., Harrison, S., Holland, E., Lloyd, J.L., Prentice, C., Schimel, D. (Eds.), Global Biogeochemical Cycles in the Climate System. Academic Press, San Diego, pp. 201–215. Grandy, A.S., Neff, J.C., 2008. Molecular C dynamics downstream: the biochemical decomposition sequence and its impact on soil organic matter structure and function. Science of the Total Environment 404, 297–307. Hedges, J.I., Baldock, J.A., Gélinas, Y., Lee, C., Peterson, M., Wakeham, S.G., 2001. Evidence for non-selective preservation of organic matter in sinking marine particles. Nature 409, 801–804. Hockaday, W.C., Masiello, C.A., Randerson, J.T., Smernik, R.J., Baldock, J.A., Chadwick, O.A., Harden, J.W., 2009. Measurement of soil carbon oxidation state and oxidative ratio by 13C nuclear magnetic resonance. Journal of Geophysical Research 114, G02014. Jaillais, B., Pinto, R., Barros, A.S., Rutledge, D.N., 2005. Outer-product analysis (OPA) using PCA to study the influence of temperature on NIR spectra of water. Vibrational Spectroscopy 39, 50–58. Kaal, J., Baldock, J.A., Buurman, P., Nierop, K.G.J., Pontevedra-Pombal, X., Martínez Cortizas, A., 2007. Evaluating pyrolysis–GC/MS and 13C CPMAS NMR in conjunction with a molecular mixing model of the Penido Vello peat deposit, NW Spain. Organic Geochemistry 38, 1097–1111. Keiluweit, M., Nico, P.S., Johnson, M.G., Kleber, M., 2010. Dynamic molecular structure of plant biomass-derived black carbon (Biochar). Environmental Science and Technology 44, 1247–1253. Kelleher, B.P., Simpson, A.J., 2006. Humic substances in soils: are they really chemically distinct? Environmental Science and Technology 40, 4605–4611. Knicker, H., 2011. Solid state CPMAS 13C and 15N NMR spectroscopy in organic geochemistry and how spin dynamics can either aggravate or improve spectra interpretation. Organic Geochemistry 42, 867–890. Koelmans, A.A., Jonker, M.T.O., Cornelissen, G., Bucheli, T.D., Van Noort, P.C.M., Gustafsson, Ö., 2006. Black carbon: the reverse of its dark side. Chemosphere 63, 365–377. Kögel-Knabner, I., 2000. Analytical approaches for characterizing soil organic matter. Organic Geochemistry 31, 609–625. Lehmann, J., Solomon, D., Kinyangi, J., Dathe, L., Wirick, S., Jacobsen, C., 2008. Spatial complexity of soil organic matter forms at nanometre scales. Nature Geoscience 1, 238–242. Leifeld, J., 2006. Application of diffuse reflectance FT-IR spectroscopy and partial least squares regression to predict NMR properties of soil organic matter. European Journal of Soil Science 57, 846–857. Lützow, M.v., Kögel-Knabner, I., Ekschmitt, K., Matzner, E., Guggenberger, G., Marschner, B., Flessa, H., 2006. Stabilization of organic matter in temperate soils: mechanisms and their relevance under different soil conditions – a review. European Journal of Soil Science 57, 426–445. Martens, H.A., Dardenne, P., 1998. Validation and verification of regression in small data sets. Chemometrics and Intelligent Laboratory Systems 44, 99–121. Mevik, B.H., Wehrens, R., 2007. The pls package: principal component and partial least squares regression in R. Journal of Statistical Software 18, 1–24. Nelson, P.N., Baldock, J.A., 2005. Estimating the molecular composition of a diverse range of natural organic materials from solid-state 13C NMR and elemental analyses. Biogeochemistry 72, 1–34. Olk, D.C.A., 2006. A chemical fractionation for structure–function relations of soil organic matter in nutrient cycling. Soil Science Society of America Journal 70, 1013–1022. R Development Core Team, 2011. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Reeves III, J.B., 2010. Near- versus mid-infrared diffuse reflectance spectroscopy for soil analysis emphasizing carbon and laboratory versus on-site analysis: where are we and what needs to be done? Geoderma 158, 3–14. Rodríguez-Murillo, J.C., Almendros, G., Knicker, H., 2011. Wetland soil organic matter composition in a Mediterranean semiarid wetland (Las Tablas de Daimiel, Central Spain): insight into different carbon sequestration pathways. Organic Geochemistry 42, 762–773. Rumpel, C., Rabia, N., Derenne, S., Quenea, K., Eusterhues, K., Kögel-Knabner, I., Mariotti, A., 2006. Alteration of soil organic matter following treatment with hydrofluoric acid (HF). Organic Geochemistry 37, 1437–1451. Rumpel, C., Chabbi, A., Nunan, N., Dignac, M.F., 2009. Impact of land use change on the molecular composition of soil organic matter. Journal of Analytical Applied Pyrolysis 85, 431–434. Saiz-Jimenez, C., 1994. Analytical pyrolysis of humic substances: pitfalls, limitations, and possible solutions. Environmental Science and Technology 28, 1773–1780. Schmidt, M.W.I., Noack, A.G., 2000. Black carbon in soils and sediments: analysis, distribution, implications and current challenges. Global Biogeochemical Cycles 14, 777–793. Schmidt, M.W.I., Torn, M.S., Abiven, S., Dittmar, T., Guggenberger, G., Janssens, I.A., Kleber, M., Kögel-Knabner, I., Lehmann, J., Manning, D.A.C., Nannipieri, P., Rasse, D.P., Weiner, S., Trumbore, S.E., 2011. Persistence of soil organic matter as an ecosystem property. Nature 477, 49–56. Skjemstad, J.O., Clarke, P., Taylor, J.A., Oades, J.M., Newman, R.H., 1994. The removal of magnetic materials from surface soils. A solid state 13C CP/MAS n.m.r. study. Australian Journal of Soil Research 32, 1215–1229. Smernik, R.J., Oades, J.M., 2000. The use of spin counting for determining quantitation in solid state 13C NMR spectra of natural organic matter 2. HFtreated soil fractions. Geoderma 96, 159–171.

136

L. Cécillon et al. / Organic Geochemistry 46 (2012) 127–136

Smith, B., 1999. Infrared Spectral Interpretation: A Systematic Approach. CRC Press, Taylor & Francis Group, New York. Strand, L.T., Haaland, S., Kaste, Ø., Stuanes, A.O., 2008. Natural variability in soil and runoff from small headwater catchments at Storgama, Norway. Ambio 37, 18– 28. Tenenhaus, M., 1998. La régression PLS. Editions Technip, Paris. Terhoeven-Urselmans, T., Michel, K., Helfrich, M., Flessa, H., Ludwig, B., 2006. Nearinfrared spectroscopy can predict the composition of organic matter in soil and litter. Journal of Plant Nutrition and Soil Science 169, 168–174.

ˇ opíková, J., Coimbra, M.A., 2007. Veselá, A., Barros, A.S., Synytsya, A., Delgadillo, I., C Infrared spectroscopy and outer product analysis for quantification of fat, nitrogen, and moisture of cocoa powder. Analytica Chimica Acta 601, 77–86. Viscarra Rossel, R.A., Walvoort, D.J.J., McBratney, A.B., Janik, L.J., Skjemstad, J.O., 2006. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, 59–75. Workman Jr., J., Weyer, L., 2007. Practical Guide to Interpretive Near-Infrared Spectroscopy. CRC Press, Taylor & Francis Group, New York.