Bioresource Technology 99 (2008) 8445–8452
Contents lists available at ScienceDirect
Bioresource Technology journal homepage: www.elsevier.com/locate/biortech
Enhanced discrimination and calibration of biomass NIR spectral data using non-linear kernel methods Nicole Labbé a,*, Seung-Hwan Lee b, Hyun-Woo Cho c, Myong K. Jeong c, Nicolas André a a
Forest Products Center, University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4570, USA Biomass Technology Research Center, National Institute of Advanced Industrial Science and Technology, AIST Chugoku, Suehiro 2-2-2, Hiro, Kure, Hiroshima 737-0197, Japan c Department of Industrial and Information Engineering, University of Tennessee, 311 South and East Stadium, 1425 South Stadium Drive, Knoxville, TN 37996-0700, TN, USA b
a r t i c l e
i n f o
Article history: Received 22 March 2007 Received in revised form 19 December 2007 Accepted 19 February 2008 Available online 14 April 2008
Keywords: Near infrared Biomass Ash content Kernel PCA Kernel PLS
a b s t r a c t Rapid methods for the characterization of biomass for energy purpose utilization are fundamental. In this work, near infrared spectroscopy is used to measure ash and char content of various types of biomass. Very strong models were developed, independently of the type of biomass, to predict ash and char content by near infrared spectroscopy and multivariate analysis. Several statistical approaches such as principal component analysis (PCA), orthogonal signal correction (OSC) treated PCA and partial least squares (PLS), Kernel PCA and PLS were tested in order to find the best method to deal with near infrared data to classify and predict these biomass characteristics. The model with the highest coefficient of correlation and the lowest RMSEP was obtained with OSC-treated Kernel PLS method. Published by Elsevier Ltd.
1. Introduction The utilization of biomass has gained increased importance due to threats of uncertain petroleum supply in the near future and concerns about environmental pollution. Cellulosic plant materials represent a suitable source for production of valuable products. However, many physico-chemical, structural and compositional factors hinder biomass conversion processes. For example, in the pyrolysis and gasification process, minerals presence can represent an issue. They can be the cause of slagging, and corrosion of equipment and maybe reduction of the rate of combustion. However, they can also have positive effects such as playing a catalyst role in the process (Agblevor and Besler, 1996; Vamvuka et al., 2006). An optimization of biomass processing for energy purposes first requires an understanding of the physical, anatomical structure and chemical composition of the complex material. Near infrared (NIR) spectroscopy has been shown to be an effective method to characterize the organic composition of biomass (Sanderson et al., 1996; Moore and Owen, 2001; Kelley et al., 2002; Starks et al., 2006) and to measure its physical (Cogdill et al., 2004; Hofmeyer and Pedersen, 1996) and mechanical properties (Rials et al., 2002; Kelley et al., 2004a; André et al., 2006a). Moreover, NIR spectroscopy cou-
* Corresponding author. Tel.: +1 865 946 1126; fax: +1 865 946 1109. E-mail address:
[email protected] (N. Labbé). 0960-8524/$ - see front matter Published by Elsevier Ltd. doi:10.1016/j.biortech.2008.02.052
pled with multivariate analysis has been developed to classify preservatives-treated lumbers (So et al., 2004a). The classification was possible by the differences in the inorganic composition of the preservatives. From this work, it was demonstrated that NIR spectroscopy can be employed to obtain information on the inorganic composition of wood. Unlike ultraviolet/visible and mid-infrared spectroscopy, NIR spectroscopy has a nearly total dependance on statistics and chemometrics to develop models. The majority of peaks seen in the near infrared are overtone and combination peaks of molecular vibrations. They are weaker than the fundamental bands and arise from the O–H, C–H, S–H, and N–H stretching modes. Principal component analysis (PCA) and partial least squares (PLS) are the most common multivariate data analytical techniques employed to extract information from near infrared data of wood and wood composites (André et al., 2006a,b; Curran et al., 1992; Hoffmeyer and Pedersen 1985; Gindl et al., 2001; Kelley et al., 2002, 2004b,c; Martens and Naes, 1989; Rials et al., 2002; Schimleck et al., 1999, 2001, 2002; So et al., 2004b; Thumm and Meder, 2001). Prediction of ash and char content in biomass can be stated as a multivariate calibration problem. A main objective of the calibration model is to predict these properties from experimental or historical NIR spectral data before any processing, such as gasification or fermentation. However, the high dimensionality and collinearity of such data makes it difficult or impossible in some
8446
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452
cases to construct the calibration model (Qin, 2003). The need to model such data led to the adoption of multivariate calibration models such as partial least squares (PLS). PLS is a dimension reduction technique that seeks to find a set of latent variables by maximizing the covariance of two variable blocks (i.e., predictor X and response Y). It has proven to be useful in various calibration problems (Kourti, 2005; Chiang et al., 2000; Rosipal and Trejo, 2001). In particular, PLS has been shown to be a powerful technique for multivariate calibration of noisy, collinear, high-dimensional, and ill-conditioned data (Qin, 2003). When dealing with non-linear data, however, PLS may be inappropriate because non-linear behavior in the data cannot be represented well. Although PLS is an effective technique for calibration, it is still a linear technique in nature. To overcome such limitation, ‘‘kernel trick” has been used to develop a non-linear kernel version of PLS, called kernel PLS (KPLS) (Rosipal and Trejo, 2001; Schölkopf et al., 1998). The basic idea of the kernel trick is that input data are first mapped into a kernel feature space by a non-linear mapping function and then these mapped data are analyzed. The selection of linear or non-linear PLS techniques for calibration depends on the characteristics of the problem of interest. In linear case, data can be modeled effectively by using a linear PLS technique. Such a linear case is the simplest problem, in which both linear and non-linear techniques are expected to produce good prediction performances. The use of a linear technique in non-linear case, however, may not model most of the data correctly. The predictor variable X, in general, contains unwanted variation that is unrelated (or orthogonal) to the response variable Y. In such case, the unwanted variation may give rise to the degradation of the prediction ability of a calibration model. In addition, calibration models are likely to need a large number of latent components to achieve desirable prediction results (Wold et al., 1998). To overcome this problem, calibration data are often preprocessed or corrected prior to data analysis. Common approaches for preprocessing include multiplicative signal correction (MSC) (Geladi et al., 1985), standard normal variate (SNV) (Barnes et al., 1989), and principal component analysis (PCA) (Sun, 1997). Wold et al. (1998) developed an orthogonal signal correction (OSC) method to remove from X unwanted variation that is unrelated or orthogonal to Y. OSC can selectively remove the largest variation of X having no correlation with Y. This is possible because OSC utilizes the response Y to construct a kind of signal filter for X (Eriksson et al., 2000). Of course, the other preprocessing methods can be also considered as different cases
of filtering. Compared to the other methods, OSC is a PLS-based solution and is mathematically well-defined. It has been successfully applied to multivariate calibration of NIR spectroscopy data and classification of nuclear magnetic resonance (NMR) spectral data (Eriksson et al., 2000; Wold et al., 1998; Westerhuis et al., 2001). In this work, several statistical approaches are investigated to classify and predict ash and char content from near infrared spectra collected on different types of biomass.
2. Methods In this section, PCA, KPCA, PLS, KPLS, and OSC are briefly introduced. The relationship among PCA, KPCA, PLS, and KPLS is shown in Fig. 1 (see Frank and Friedman, 1993 for the analytical comparison of PCA and PLS). PCA transforms the original data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PLS will try to find the multidimensional direction in the X space that explains the maximum multidimensional variance direction in the Y space. The main difference between these two techniques is that PLS uses the information of both input space (X) and output space (Y) while PCA considers only the input space. KPCA and KPLS are non-linear versions of PCA and PLS, respectively. The detailed introductions for PCA–KPCA and PLS–KPLS are given in Sections 2.1 and 2.2, respectively. The orthogonal signal correction (OSC) method is one of the popular preprocessing procedures that removes the unwanted variations in the data matrix between samples that are not correlated with the Y-vector (Wold et al., 1998). OSC pre-processing (or other data-preprocessing procedures) can be performed before applying PCA, KPCA, PLS, and KPLS. The detailed procedure for OSC preprocessing will be presented in Section 2.3. 2.1. PCA and KPCA PCA defines a lower-dimensional subspace that captures as much variation in the data matrix as possible. Mathematically, PCA relies on an eigenvector decomposition of the covariance matrix of the data matrix (Jackson 1980). Let us consider a set of M observations in an n-dimensional space xk 2 Rn , k = 1,. . ., M. Linear principal component analysis seeks to find eigenvalues (k P 0) and the associated eigenvectors v satisfying
Fig. 1. Relationship among PCA, KPCA, PLS, and KPLS.
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452
kv ¼ Cv;
ð1Þ
8447
where C is the M sample estimate of the covariance matrix. Since all solutions for non-negative eigenvalues must lie in the span of the input data, the equivalent equation is given by
(M N) residual matrices. The PLS method searches for weight vectors w and c that maximize the sample covariance between t and u (Wold, 1975). By regressing X (Y) on t (u), a loading vector p (q) can be computed as follows:
khxk ; vi ¼ hxk ; Cvi
p ¼ ðtT tÞ1 XT t;
ð10Þ
q ¼ ðuT uÞ1 YT u:
ð11Þ
for all k ¼ 1; . . . ; M;
ð2Þ
where hx; yi is the dot product between x and y. To derive Kernel PCA, the eigenvalue problem kv = CFv needs to be solved. Here, CF is the M sample estimate of the covariance matrix in the feature space F M 1 X Uðxj ÞUðxj ÞT ; C ¼ M j¼1 F
ð3Þ
where U() is a non-linear function. Similarly to Eqs. (1) and (2), the eigenvalue equation can be written as khUðxk Þ; vi ¼ hUðxk Þ; CF vi
for all k ¼ 1; . . . ; M;
ð4Þ
and there exists coefficients ai, i = 1,. . ., M, such that v¼
M X
aj Uðxj Þ:
ð5Þ
j¼1
Combining Eqs. (3)–(5) yields (Schölkopf et al., 1998) * + M M M X X 1 X k aj hUðxk Þ; Uðxj Þi ¼ Uðxk Þ; Uðxi Þ hUðxi Þ; Uðxj Þi; M j¼1 j¼1 j¼1
tk ¼ hvk ; UðxÞi ¼
j¼1
akj hUðxj Þ; UðxÞi
¼
M X
k aM j aj kðxj ; xÞ:
ð6Þ
ð7Þ
j¼1
The first q (6M) non-linear principal components are used to describe the input data. Note that in KPCA the number of principal components (PC) q may exceed the input dimensionality n. While linear PCA can extract n PCs, the maximum number of PCs in KPCA is M. 2.2. PLS and KPLS PLS is a multivariate projection method for modeling a relationship between independent variables X and dependent variable(s) Y (Hoskuldsson 1988). PLS seeks to find a set of latent variables that maximizes the covariance between X(M n) and Y(M N). It decomposes X and Y into the form X ¼ TPT þ E; T
Y ¼ UQ þ F;
B ¼ WðPT WÞ1 CT ¼ XT UðTT XXT UÞ1 TT Y:
ð8Þ ð9Þ
where T and U are (M A) matrices of the extracted A score vectors, P (n A) and Q (M A) loading matrices, and E (M n) and F
ð12Þ
KPLS is the non-linear kernel version of ‘‘linear” PLS that can deal with the non-linear variations in the data. Non-linear data can be mapped into a higher-dimensional non-linear feature spaces F by non-linear mappings in which they vary linearly. Similarly to KPCA, the introduction of kernel functions enables us to avoid both performing explicit non-linear mappings and computing dot products in F. KPLS algorithm is directly derived from PLS algorithm with some modifications. Let U be the (M A) matrix whose ith row is Uðxi Þ in an A-dimensional feature space F. Similarly to Eq. (12), the regression coefficient matrix B of KPLS has the form B ¼ UT UðTT KUÞ1 TT Y:
for all k = 1,. . ., M. Now define an M M kernel matrix K by ½Kij ¼ hUðxi Þ; Uðxj Þi. By introducing a kernel function kðxi ; xj Þ ¼ hUðxi Þ; Uðxj Þi, it is now neither necessary to know the form of the non-linear mapping function UðxÞ nor to calculate dot products in F. Thus, appropriate kernel functions are utilized to evaluate them in the input space. Such a kernel trick can be implemented because of the fact that dot products in F have an equivalent kernel in the input space (Schölkopf et al., 1998). Using the M M kernel matrix K, (E6) can be expressed as kMKa ¼ K2 a where a ¼ ½a1 ; . . . ; aM T . Finding solutions of this equation is equivalent to solve the eigenvalue problem of Mka = Ka for nonzero eigenvalues (Schölkopf et al., 1998). Then this yields eigenvectors a1 ; . . . ; aM corresponding to eigenvalues k1 P P kM. The dimensionality of the input data is reduced by retaining only the first q eigenvectors. For a test vector x, its principal components or t-scores tk, k = 1,. . ., q are obtained by projecting UðxÞ onto the eigenvectors vk in F M X
Then, a PLS regression model can be expressed as Y = XB + G. Here B represents the regression coefficients, which are given by
ð13Þ
Thus the predictions of KPLS on training data and test data can be made as follows: b ¼ UB ¼ KUðTT KUÞ1 TT Y; Y b t ¼ UB ¼ Kt UðTT KUÞ1 TT Y; Y
ð14Þ ð15Þ
where Ut and Kt represents U and K corresponding to test data points, respectively. 2.3. Orthogonal signal correction (OSC) OSC is a pre-processing technique for removal of undesirable systematic variation in the data. It was first developed by Wold et al. (1998) to remove systematic variation from the predictor X that is orthogonal (or unrelated) to the response Y. The largest variation of X having zero correlation with Y is selectively removed from X. The first step of OSC is to calculate the first PC score vector t from X. The score vector t is then orthogonalized with respect to Y producing the following actual correction vector t*: n o t ¼ I YðYT YÞ1 YT t:
ð16Þ
Then a PLS weight vector w is computed such that Xw = t*, followed by the calculation of a new score vector t = Xw. These processes are repeated until t has converged. Finally, a loading vector p is computed and the correction term tpT is subtracted from X. The next components can be calculated in the same way. After the introduction of OSC by Wold et al. (1998), several OSC algorithms have been reported (Fearn, 2000; Sjoblom et al., 1998; Westerhuis et al., 2001). This work adopted as a preprocessing method a direct orthogonal signal correction (DOSC) algorithm developed by Westerhuis et al. (2001) because it gives more reliable results. 2.4. Materials Three types of woody biomass, red oak (Quercus rubra), yellow poplar (Liriodendron tulipifera L.), and hickory (Carya spp.) and three herbaceous biomasses, switchgrass (Panicum virgatum L.), corn stover (Zea mays L.), and sugarcane bagasse (Saccharum spp.), were chosen for this comparative study. Three wood samples were collected from three different trees for each wood species. Three switchgrass and corn stover samples were harvested from
8448
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452
different locations in the growing plot and three samples were taken from a large batch of bagasse. A total of eighteen biomass samples were used for the study. The samples were ground using a Wiley Mill (Thomas Scientific, Swedesboro, NJ) to 1 mm size. Ash content measurements were conducted based on the method (similar to ASTM standard method, E1755-01) recommended by National Renewable Energy Laboratory (NREL). Biomass samples (1.5 g) were oven-dried at 105 °C and placed into tarred crucibles. The samples were then burned by an ashing burner until no more smoke or flame appears. The burned samples were placed in a muffle furnace for 24 ± 6 h at 575 ± 25 °C, then in a desiccator to cool down and weighed. The measurements were performed in triplicates for each of the eighteen biomass samples. The mean and standard deviation were calculated. Char content were measured using thermal gravity analysis (TGA) (Perkin–Elmer Pyris 1 TGA). Samples of 6–7 mg were first heated from 50 °C to 105 °C at a rate of 25 °C min1 and kept at 105 °C for 10 min to remove the moisture. Then, the samples were heated to 750 °C at the same heating rate under nitrogen atmosphere (flow rate of 20 mL min1). Char content was calculated from the final weight loss at 750 °C by using the initial weight of the sample without moisture content. Three measurements per sample were also performed. The mean and standard deviation were reported. Table 1 resumes the ash and char content of the different types of biomass used for the study. 2.5. Near infrared spectroscopy The NIR spectra were recorded using an Analytical Spectral Devices (ASD) Field Spectrometer at 1-nm interval between 350 and 2500 nm. A fiber optic probe oriented at a right angle to the sample surface was used to collect the reflectance spectra. The ground samples were placed in a 10 cm diameter container. The container was placed on a spinning table. The samples were illuminated with a 50 watts tungsten halogen lamp oriented at 30° above the samples and aligned parallel with the longitudinal axis of the sample. Thirty two scans were collected and averaged into a single average spectrum. A 5 cm diameter spot was sampled and three spectra per samples were recorded. In total, 54 spectra were collected for the whole biomass set. The reflectance spectra were transferred from the ASD spectrometer to an Unscrambler (www.camo.com) file
Table 1 Ash and char content of the biomass set Biomass
Ash content (%)
Char content (%)
Mean
SD
Mean
SD
Red oak
Sample 1 Sample 2 Sample 3
0.151 0.215 0.473
0.001 0.022 0.014
11.683 13.821 14.519
0.375 0.491 0.810
Yellow poplar
Sample 1 Sample 2 Sample 3
0.218 0.410 0.185
0.015 0.005 0.014
10.416 13.316 11.647
0.359 0.433 0.079
Hickory
Sample 1 Sample 2 Sample 3
0.532 1.065 0.652
0.045 0.007 0.010
12.562 17.196 14.683
0.359 0.633 0.273
Switchgrass
Sample 1 Sample 2 Sample 3
2.892 2.796 3.110
0.051 0.105 0.091
17.976 19.114 19.825
0.218 0.719 0.347
Corn stover
Sample 1 Sample 2 Sample 3
4.652 3.361 3.161
0.009 0.307 0.105
21.788 18.789 19.270
0.960 0.243 0.504
Bagasse
Sample 1 Sample 2 Sample 3
2.273 3.241 2.490
0.240 0.099 0.151
16.023 16.692 16.123
0.910 0.099 0.914
and converted to absorbance spectra. The data set was reduced by averaging the spectra that were collected at 1 nm intervals to a spectral data set at 4 nm intervals and the data were saved as a text file and imported in Matlab (www.mathworks.com) for further analysis. 3. Results and discussion Industrial use of biofuels demands on-line discrimination and calibration (prediction) methodology for rapid characterization of their chemical and physical properties. The most important property in biofuels is the calorific value that is influenced by moisture and ash content as well as the chemical composition of the dry biomass. In this work, the potential use of NIR spectroscopic procedure for the real-time assessment of contents of ash and char in different types of biomass is evaluated. This tool can be applied for on-line prediction in processes to standardize biofuels or in biofuel plants for process monitoring. 3.1. Biomass discrimination Fig. 2 shows NIR spectra of several types of biomass with various ash and char contents. Several bands, present in all samples, can be assigned to different chemical groups. For instance, the bands at 1460 and 1920 nm can be assigned to the second and first overtone of OH stretching vibration, respectively. The band at 2295 nm is assigned to the CH combination of biomass (Shenk et al., 2001). However, no differences between biomass types can be directly observed from the near infrared spectra. PCA score plots can be used to discriminate the six different biomass species based on their NIR spectral features. A linear PCA was first performed on the NIR spectral data. Usually, the first two or three principal components (PC) capture most of the variation present in the data so that it may facilitate visualization of different clusters of data. Fig. 3 shows a set of PCA score plots for the NIR spectral data. Fig. 3a represents the PC1–PC2 scores plot obtained by applying PCA on the raw NIR spectral data, while Fig. 3b and c are based on the MSC-treated and the OSC-treated NIR spectral data, respectively. In order to apply MSC to the raw spectral data, each spectrum was regressed against the average spectrum, thus obtaining an intercept and a slope for each raw spectrum. Each spectrum is MSC corrected by subtracting its corresponding intercept from each variable of the spectrum and dividing by its corresponding slope. In the case of OSC, the method was modified and used in a PLS-discriminant analysis (PLS–DA) formulation for enhanced discrimination (Barker and Rayens, 2003). As a result, the Y matrix of dependent variables in OSC contains information about class memberships of the NIR data, which is coded in the following way Bagasse
Red oak Switchgrass Corn stover Hickory Yellow poplar
1000
1250
1500
1750
2000
2250
2500
Wavelength (nm) Fig. 2. Near infrared spectra of woody and herbaceous biomass.
8449
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452
30
40
a
b 20 10
0
PC2
PC2
20
-20
-10
-40 -40
-20
-10
-20Poplar Yellow 0Hickory-40- -30
20
PC1
Corn stover Switchgrass 0.2 Bagasse Red oak
40
c
PC3
PC2
-20
-10
0
10
20
30
40
PC1
d
0.1
20
0
0.0
-0.1
-20
-40 -30
0
-20
-10
0
10
20
-0.2 -0.8
PC1
-0.4
0.0
0.4
0.8
PC1
Fig. 3. Score plots: (a) raw data-based PCA, (b) MSC-treated PCA, (c) OSC-treated PCA and (d) OSC-treated KPCA.
2
3
1n1
0n1
0n1
6 0n 6 2 Y¼6 4
1n2
0n2 7 7 7; 5
0np
0np
1np
Table 2 Overview of the PLS calibration models for ash content in biomass
ð17Þ
where np denotes the number of samples in the pth cluster (i.e., 9 for all clusters in this work) and 0np is a (np 1) vector of all zeros. The PCA model of the raw NIR spectral data used three PCs explaining 99.2% of total variation of X (i.e., R2X(cum) = 0.992). Similarly, the PCA model of the MSC-treated data used three PCs with R2X(cum) = 0.998 while the OSC-treated data needed four PCs with R2X(cum) = 0.994. Fig. 3 allows the comparison of the four different discrimination models. The PCA models of the raw data and MSCtreated data do not allow the classification of the biomass. Differences are observed between the different types of biomass by OSC-treated PCA (Fig. 3c). The woody biomasses (red oak, yellow poplar and hickory) are positive along PC1 relative to the herbaceous biomasses (switchgrass, bagasse and corn stover). Moreover, switchgrass, corn stover and bagasse are clearly distinguished along PC2. The classification of the woody biomasses is not as clear as for the herbaceous materials. Therefore, to achieve the separation among the woody biomass species, Kernel PCA was employed (Fig. 3d). A radial basis kernel function kðx; yÞ ¼ expðkx yk2 =cÞ was utilized to represent the NIR spectral data. Consequently, a stronger discrimination was achieved using non-linear PCA of the OSC-treated NIR data. Six distinct clusters are obtained with KPCA with each cluster representing a biomass type. In KPCA, the score plot of PC1–PC3 produced the better separation than that of PC1– PC2. Woody biomasses are separated from herbaceous biomasses by PC1 and among themselves by PC3. One can conclude that OSC-treated KPCA has higher potential for classifying different types of biomass than the other three methods. 3.2. Calibration results Table 2 summarizes the results from three PLS models using raw, MSC-treated, and OSC-treated NIR spectral data. These PLS re-
PLS ash calibration models
R2X (cum) R2Y (cum) Q2 (cum) RMSEC
Raw data
MSC data
OSC data
0.998 0.949 0.926 0.248
0.999 0.940 0.904 0.310
0.981 0.999 0.999 0.0234
sults were obtained using all 54 spectra as training data for building each PLS model. OSC was performed using raw (y) values of ash and char contents instead of class membership, as done before for the discrimination purpose. As shown in Table 2, there are four parameters that can be calculated to determine a model ability to fit the data and its predictive power: R2X(cum), R2Y(cum), Q2(cum), and RMSEC. R2X(cum) and R2Y (cum) represent a cumulative sum of squares of X and Y explained, respectively, and Q2(cum) represents a cumulative fraction of the total variance of Y predicted by the extracted components. In addition, the quality of the models can be evaluated by comparing the root mean square error of calibration (RMSEC) for the training data and the root mean square error of prediction (RMSEP) for the test data (none of which are included in the training data). The PLS model based on raw data, for example, used 99.8% of the variation of X to explain 94.9% of the variation in Y. This model has a predictive power of Q2(cum) of 92.6% with RMSEC = 0.248. The PLS model using OSC-treated data has the minimum RMSEC value of 0.0234 with the highest Q2(cum) of 0.999. Compared to the raw data and MSC based PLS model, the use of OSC preprocessing techniques in calibration improved the calibration performance in terms of R2X(cum), R2Y(cum), Q2(cum), and RMSEC. Such an improvement in calibration can be seen from Fig. 4, in which pre^ ) were plotted against observed values (y) to dicted values of ash (y visualize the prediction performance. In such a plot, the data ^ when a calibrashould fall on the diagonal (target line), i.e., y ¼ y
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452
b Ash Content Predicted by NIR-Raw data
a 5 4 3 2 1 0
0
1
2
3
4
5
Ash Content Predicted by NIR-MSC data
8450
5 4 3 2 1 0 -1 0
c
Ash Content Predicted by NIR-OSC data
Measured Ash Content (%)
1
2
3
4
5
Measured Ash Content (%)
5 4 3 2 1 0
0
1
2
3
4
5
Measured Ash Content (%) Fig. 4. Predicted ash content versus measured ash content plots for PLS models using (a) raw data, (b) MSC-treated data and (c) OSC-treated data.
tion model predicts the data perfectly. Fig. 4 shows that the PLS model using OSC-treated data has the best performance with almost every samples on the diagonal. To evaluate the prediction performance of each calibration model using the test data (RMSEP), a leave-three-out procedure was performed on all 54 samples. There are several ways of evaluating the performance of a calibration model. Two popular approaches are holdout validation and K-fold cross-validation (Tan et al., 2006). In the holdout validation, the data are divided randomly into a training and a test data set. In general, less than a third of the initial samples is used for validation data. The calibration model is built based on the training data set and its performance is evaluated using the test data set. However, the hold-out method has several drawbacks, especially when dealing with smaller size of data. For example, fewer observations are used for the model-building and it may cause larger prediction error for the test set, and the accuracy of a model could be highly dependent on the composition of the training and testing data sets. One of the popular validation approaches for the problems with smaller sample size is K-fold cross-validation. In K-fold cross-validation, the original observations are partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for validating the model, and the remaining K 1 subsamples are used as training data. This procedure is repeated K times (the folds) so that each partition is used for testing exactly once. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation.
The leave-one-out cross-validation is a special case of K-fold cross-validation with K = N, the size of the data set. This procedure involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. The advantage of this approach is that it utilizes as much data as possible for model-building. The test sets are mutually exclusive and they effectively cover the entire data set. The drawback of this procedure is that it is computationally expensive to repeat the procedure N times. The data set used for this study was composed of 18 samples, and three measurements per sample were analyzed. The mean and standard deviation for each sample are reported in Table 1. These three measurements are similar to each other compared to other observations of different samples. Thus, one should avoid using some of the measurements for training and the remaining ones for testing because this may produce biased results (i.e., too good accuracy). That is, all three measurements from the same sample are included either in the training or in the test data sets. A leave-three-out procedure was chosen for an unbiased comparison. RMSEP results for ash and char contents are given in Table 3. As an extension of linear PLS calibration models, non-linear Kernel PLS calibration models were also tested and presented. In KPLS, after testing prediction results using various kernel functions, it was found that the radial basis kernel function is appropriate for modeling the NIR data sets.
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452 Table 3 Leave-three-out RMSEP results of PLS and KPLS models RMSEP
PLS using raw data MSC-treated PLS OSC-treated PLS OSC-treated KPLS
Ash
Char
1.08 0.957 0.643 0.429
3.32 2.87 1.98 1.06
For the prediction results of ash content found in Table 3, OSC data-based calibration models of linear PLS and kernel PLS show higher prediction performance (lower RMSEP) than the others. The linear PLS calibration model using OSC-treated data, for example, was able to predict the ash content with RMSEP = 0.643 whereas RMSEP = 0.957 for MSC data and RMSEP = 1.08 for raw data. In addition, compared to linear PLS using OSC data, kernel PLS using OSC data shows a better performance with a lower RMSEP value of 0.429. Similar prediction results patterns were obtained for the char content (Table 3). Overall, the kernel PLS using OSC-treated data showed the best prediction performance in that it yielded the lowest RMSEP values for ash (i.e., 0.429) and char (i.e., 1.06). From the PLS analysis, it is also possible to determine the chemical features that are responsible for the correlation between the NIR spectra of the samples and the predicted characteristics. Fig. 5 shows the regression coefficients for the PLS models devel-
a
Intensity (A.U.)
0.008
0.004
0.000
-0.004 1250
1500
1750
2000
2250
References
Intentsity (A.U.)
0.012
0.008
0.004
0.000
-0.004 1000
4. Conclusion
2500
Wavelength (nm)
b
oped from the OSC-treated data. The regression coefficients for the KPLS models are difficult to obtain because of the complicated non-linear structure of the model. The regression coefficients of the PLS models obtained for the ash content are very similar to those obtained for the char content. A quick analysis between the char content and the ash content reveals that there is a linear correlation of r2 = 0.83 between the two parameters, suggesting that the two characteristics are correlated. This could be a reasonable explanation for the similarities found in the regression coefficients. A jack-knifing routine (Martens and Martens, 2000) was performed to detect the most significant variables for both models. Jack-knifing can help select important peaks and guide the assignment work. Yet, each variable from both models was marked as significant. Indeed, the spectral data were pre-processed using OSC, which removes parts of the spectral data that are not correlated to ash or char contents. However, some bands can be attributed to chemical groups such as the bands at 1439, 1915 nm can be assigned to hydroxyls groups present in the biomass and the band at 2263 nm can be attributed to C–H stretching vibration. Inorganic constituents are not directly detected by NIR spectroscopy, because most of the ash forming atom species do not absorb radiation in the near infrared region. Therefore, there must be some association of these minerals with organic compounds that allow NIR spectroscopy to be used to predict ash content in biomass. According to the NIR models, these interactions occurred in O–H and C–H groups.
In this work, various approaches were tested to classify and predict the ash and char content of biomass by near infrared spectroscopy. Models were developed on the raw near infrared data and on data subjected to pretreatments such as multiplicative scatter correction and orthogonal signal correction. Moreover, kernel PCA and PLS were employed to classify the various biomasses and to predict char and ash contents. The best models were obtained with the data that combined an OSC treatment and the kernel method, demonstrating that near infrared spectral data coupled with the appropriate multivariate method can be employed to predict with accuracy properties of biomass such as ash and char content. This article reports a proof of concept to predict ash and char content in unknown biomass with the developed models. The number of samples and types of biomass was limited for that specific study. More samples and types of biomass need to be analyzed to obtain highly robust calibration models.
0.012
1000
8451
1250
1500
1750
2000
2250
2500
Wavelength (nm) Fig. 5. Regression coefficients obtained from OSC-treated linear PLS: (a) ash and (b) char.
Agblevor, F.A., Besler, S., 1996. Inorganic compounds in biomass feedstocks. 1. Effect on the quality of fast pyrolysis. Energy Fuels 10, 293–298. André, N., Labbé, N., Rials, T.G., Kelley, S.S., 2006a. Assessment of wood load condition by near infrared spectroscopy. J. Mater. Sci. 41, 1879–1886. André, N., Young, T.M., Rials, T.G., 2006b. On-line monitoring of the buffer capacity of particleboard furnish by near-infrared spectroscopy. Appl. Spectrosc. 60, 1204–1209. Barker, M., Rayens, W., 2003. Partial least squares for discrimination. J. Chemom. 17, 166–173. Barnes, R.J., Dhanoa, M.S., Lister, Susan J., 1989. Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl. Spectrosc. 43, 737–891. Chiang, L.H., Russell, E.L., Braatz, R.D., 2000. Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemom. Intell. Lab. Syst. 50, 243–252. Cogdill, R.P., Schimleck, L.R., Jones, P.D., Peter, G.F., Daniels, R.F., Clark, A., 2004. Estimation of the physical wood properties of Pinus taeda L. Radial strips using least squares support vector machines. J. Near Infrared Spectrosc. 12, 263–269. Curran, P.J., Dungan, J.L., Macler, B.A., Plummer, S.E., Peterson, D.L., 1992. Reflectance spectroscopy of fresh whole leaves for the estimation of chemical concentration. Remote Sens. Environ. 39, 153–166. Eriksson, L., Trygg, J., Johansson, E., Bro, R., Wold, S., 2000. Orthogonal signal correction, wavelet analysis, and multivariate calibration of complicated process fluorescence data. Anal. Chim. Acta 420, 181–195.
8452
N. Labbé et al. / Bioresource Technology 99 (2008) 8445–8452
Fearn, T., 2000. On orthogonal signal correction. Chemom. Intell. Lab. Syst. 50, 47– 52. Frank, I., Friedman, J.J., 1993. A statistical view of some chemometrics regression tools. Technometrics 35, 109–135. Geladi, P., MacDougall, D., Martens, H., 1985. Linearization and scatter-correction for near-infrared reflectance spectra of meat. Appl. Spectrosc. 3, 491–500. Gindl, W., Teischinger, A., Schwanninger, M., Hinterstoisser, B., 2001. The relationship between near infrared spectra of radial wood surfaces and wood mechanical properties. J. Near Infrared Spectrosc. 9, 255–261. Hoffmeyer, P., Pedersen, J., 1985. Evaluation of density and strength of norway spruce wood by near infrared reflectance spectroscopy. Holz Roh Werkst. 53, 165–170. Hofmeyer, P., Pedersen, J.G., 1996. Evaluation of density and strength of norway spruce wood by near infrared reflectance spectroscopy. Holz Roh Werkst 53, 165–170. Hoskuldsson, A., 1988. PLS regression methods. J. Chemom. 2, 211–228. Jackson, J.E., 1980. A User’s Guide to Principal Component Analysis. John Wiley & Sons, New York, NY. Kelley, S.S., Jellison, J., Goodell, B., 2002. Use of NIR and MBMS coupled with multivariate analysis for detecting the chemical changes associated with brown-rot biodegradation of spruce wood. FEMS Microbiol. Lett. 209, 107–111. Kelley, S.S., Rials, T.G., Snell, R., Groom, L.R., Sluiter, A., 2004a. Use of near infrared spectroscopy to measure the chemical and mechanical properties of solid wood. Wood Sci. Technol. 38, 257–276. Kelley, S.S., Rowell, R.M., Davis, M., Jurich, C.K., Ibach, R., 2004b. Measuring the chemical composition of agricultural fibers with near infrared spectroscopy and pyrolysis molecular beam mass spectrometry. Biomass Bioenerg. 27, 77–88. Kelley, S.S., Rials, T.G., Groom, L.R., So, C.L., 2004c. Use of near infrared spectroscopy to predict the mechanical properties of six softwoods. Holzforschung 58, 252– 260. Kourti, T., 2005. Application of latent variable methods to process control and multivariate statistical process control in industry. Int. J. Adapt. Control. Sig. Proc. 19, 213–246. Martens, H., Martens, M., 2000. Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food Qual. Prefer. 11, 5–16. Martens, H., Naes, T., 1989. Multivariate Calibration. Wiley, New York, NY. Moore, K.A., Owen, N.L., 2001. Infrared spectroscopy studies of solid wood. Appl. Spectrosc. Rev. 36, 65–86. Qin, S.J., 2003. Statistical process monitoring: basics and beyond. J. Chemom. 17, 480–502. Rials, T.G., Kelley, S.S., So, C.L., 2002. Use of advanced spectroscopic techniques for predicting the mechanical properties of wood composites. Wood Fiber Sci. 34, 398–407.
Rosipal, R., Trejo, L.J., 2001. Kernel partial least squares regression in reproducing Kernel Hilbert space. J. Mach. Learn. Res. 2, 97–123. Sanderson, M.A., Agblevor, F., Collins, M., Johnson, D.K., 1996. Compositional analysis of biomass feedstocks by near infrared reflectance spectroscopy. Biomass Bioenerg. 11, 365–370. Schimleck, L.R., Michell, A.J., Raymond, C.A., Muneri, A., 1999. Estimation of basic density of Eucalyptus globulus using near-infrared spectroscopy. Can. J. For. Res. 29, 194–201. Schimleck, L.R., Evans, R., Illic, J., 2001. Estimation of Eucalyptus delegatensis wood properties by near infrared spectroscopy. Can. J. Forest Res. 31, 1671–1675. Schimleck, L.R., Evans, R., Illic, J., Matheson, A.C., 2002. Estimation of wood stiffness of increment cores by near-infrared spectroscopy. Can. J. Forest Res. 32, 129– 135. Schölkopf, B., Smola, A.J., Müller, K., 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319. Shenk, J.S., Workman, J.J., Westerhaus, M.O., 2001. Application of NIR spectroscopy to agricultural products. In: Burns, D.A., Ciurczak, E.W. (Eds.), Handbook of Near-Infrared Analysis. Marcel Dekker, New York, pp. 419–474. Sjoblom, J., Svensson, O., Josefson, M., Kullberg, H., Wold, S., 1998. An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra. Chemom. Intell. Lab. Syst. 44, 229–244. So, C.L., Lebow, S.T., Groom, L.H., 2004a. The application of near infrared (NIR) spectroscopy to inorganic preservative-treated wood. Wood Fiber Sci. 36, 330– 336. So, C.-L., Via, B.K., Groom, L.H., Schimleck, L.R., Shupe, T.F., Kelley, S.S., Rials, T.G., 2004b. Near infrared spectroscopy in the forest products industry. Forest Prod. J. 54, 6–11. Starks, J.P., Zuli, D., Phillips, W.A., Coleman, W., 2006. Development of canopy reflectance algorithms for real-time prediction of bermudagrass pasture biomass and nutritive values. Crop Sci. 46, 927–934. Sun, J., 1997. Statistical analysis of NIR data: data pretreatment. J. Chemom. 11, 525–532. Tan, P., Steinbach, M., Kumar, V., 2006. Introduction to Data Mining. Pearson Addison Wesley. Thumm, A., Meder, R., 2001. Stiffness prediction of radiata pine clearwood test pieces using near infrared spectroscopy. J. Near Infrared Spectrosc. 9, 117– 122. Vamvuka, B., Troulinos, S., Kastanaki, E., 2006. The effects of mineral matters on the physical and chemical activation of low rank coal and biomass materials. Fuel 85, 1763–1771. Westerhuis, J.A., de Jong, S., Smilde, A.K., 2001. Direct orthogonal signal correction. Chemom. Intell. Lab. Syst. 56, 13–25. Wold, S., Antti, H., Lindgren, F., Öhman, J., 1998. Orthogonal signal correction of near-infrared spectra. Chemom. Intell. Lab. Syst. 44, 175–185.