Journal of Analytical and Applied Pyrolysis 95 (2012) 95–100
Contents lists available at SciVerse ScienceDirect
Journal of Analytical and Applied Pyrolysis journal homepage: www.elsevier.com/locate/jaap
Multivariate curve resolution provides a high-throughput data processing pipeline for pyrolysis-gas chromatography/mass spectrometry Lorenz Gerber a,∗ , Mattias Eliasson b , Johan Trygg b , Thomas Moritz a , Björn Sundberg a a b
Umeå Plant Science Centre, Swedish University of Agricultural Sciences, Department of Forest Genetics and Plant Physiology, SE-901 83 Umeå, Sweden Computational Life Science Centre (CLiC), Umeå University, SE-901 87 Umeå, Sweden
a r t i c l e
i n f o
Article history: Received 17 October 2011 Accepted 17 January 2012 Available online 24 January 2012 Keywords: Py-GC/MS High-throughput Multivariate analysis Data processing Lignocellulose Wood
a b s t r a c t We present a data processing pipeline for Pyrolysis-Gas Chromatography/Mass Spectrometry (Py-GC/MS) data that is suitable for high-throughput analysis of lignocellulosic samples. The aproach applies multivariate curve resolution by alternate regression (MCR-AR) and automated peak assignment. MCR-AR employs parallel processing of multiple chromatograms, as opposed to sequential processing used in prevailing applications. Parallel processing provides a global peak list that is consistent for all chromatograms, and therefore does not require tedious manual curation. We evaluated this approach on wood samples from aspen and Norway spruce, and found that parallel processing results in an overall higher precision of peak area from integrated peaks. To further increase the speed of data processing we evaluated automated peak assignment solely based on basepeak mass. This approach gave estimates of the proportion of lignin (as syringyl-, guaiacyl and p-hydroxyphenyl-type lignin) and carbohydrate polymers in the wood samples that were in high agreement with those where peak assignments were based on full spectra. This method establishes Py-GC/MS as a sensitive, robust and versatile high-throughput screening platform well suited to a non-specialist operator. © 2012 Elsevier B.V. All rights reserved.
1. Introduction Analytical pyrolysis coupled to gas chromatography/mass spectrometry (Py-GC/MS) has been extensively used for characterization of complex polymeric matrices typically found in biological samples such as lignocellulosics [1]. Py-GC/MS is fast, sensitive and requires minimal sample preparation while yielding highly reproducible and comprehensive chemical fingerprints [2]. These features make Py-GC/MS attractive as a high throughput-screening platform for large sample series. Combined with multivariate data analysis methodologies, chemical-fingerprinting methods are ideally suited for classification and identification of differences in lignocellulosic materials according to their chemical compositions [3–5]. However, few high-throughput applications for Py-GC/MS have been demonstrated so far. A major bottleneck for Py-GC/MS setups when handling a large number of samples is the lack of an appropriate data processing pipeline. The demands for such data processing methods are fast and accurate deconvolution, identification and integration of peaks. Current methods commonly used for Py-GC/MS data, such as software solutions from instrument suppliers as well as freely
∗ Corresponding author. Tel.: +46 0 90786 8411; fax: +46 0 907868165. E-mail address:
[email protected] (L. Gerber). 0165-2370/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jaap.2012.01.011
available tools like AMDIS or OpenChrome, process chromatograms sequentially (one after the other). [6,7]. Sequential data processing, however, results in several drawbacks when applied to large datasets. An inconsistent peak table between samples is often a major issue as it is time consuming and difficult to match the respective peaks from sample to sample. Furthermore, it is usually necessary to adapt the extracted mass spectra to a common form. Thus, tedious curating is needed, either manually or by a complex framework of rules [8]. To obtain high quality data from sequential processing requires that the whole data set is reintegrated using the curated and unified global peak list. Sequential data processing pipelines are therefore less suitable for automated high-throughput applications. An additional bottleneck when using Py-GC/MS for highthroughput applications is that manual curation of peak lists is required for each sample batch. When Py-GC/MS is used in the characterization of lignocellulosic samples the processed data are normally presented as peak tables representing 50–100 compounds with molecular identifications along with the numeric peak integration values [2,9–11]. The molecular identifications are generated semi-automatically by tools such as MS-SEARCH [12] followed by manual reconciliation of the matches. Preparation of a curated peaklist with molecular identifications often takes the majority of total time in a Py-GC/MS project. Furthermore, it requires substantial training for the operator to be able to perform this task.
96
L. Gerber et al. / Journal of Analytical and Applied Pyrolysis 95 (2012) 95–100
We have established and evaluated a data processing pipeline for Py-GC/MS analysis based on multivariate curve resolution by alternate least square (MCR-AR). It makes use of a custom-made solution for multivariate-based parallel processing of chromatograms adopted from mass spectrometry based metabolomics [13]. By applying MCR-AR, large amounts of data can be processed simultaneously and therefore overcome the problems associated with sequential data processing. In addition, we present an automated pyrolysis data analysis approach for lignocellulosic samples where pyrolytic degradation products are grouped according to a broad class of precursor polymers (i.e. guaiacyl, syringyl, p-hydroxyphenol or carbohydrates). The classifications are based solely on the highest abundant m/z channel of each identified compound (the basepeak). MCR-AR based sample processing in combination with automated group-wise peak identification proved to be a highly reproducible and robust method, which provides an attractive solution for high-throughput characterization of lignocellullosic materials. The MCR-AR processing pipeline is also generally applicable to other polymer characterization by PyGC/MS.
Rawdata NetCDF
Smoothing
Alignment
Background Substraction
Setting Processing Windows
MCR-AR Processing
2. Materials and methods 2.1. Wood samples Aspen wood was obtained from the base of the stem of greenhouse grown trees (Populus tremula x tremuloides) of about 1.5 m high. Wood from Norway spruce (Picea abies) was sampled from the five outermost annual rings of ca. 20-year-old field grown trees. The freeze-dried wood was ground to powder in a ball-mill (MM400, Retsch, Germany). The powder was weighed to 50 g (±10 g) (XP6, Mettler-Toledo, Switzerland) and transferred to autosampler containers (Eco-cup SF, Frontier Laboratories, Japan). Ten technical replicates were prepared for each sample type. 2.2. Pyrolysis-GC/MS instrument The analytical setup consisted of an oven pyrolyzer equipped with an auto sampler (PY-2020iD and AS-1020E, FrontierLabs, Japan) connected to a GC/MS system (Agilent, 7890A/5975C, Agilent Technologies AB, Sweden). The pyrolysis oven was set to 450 ◦ C, the interface to 340 ◦ C and the injector to 320 ◦ C. The injector was operated with a split ratio of 16:1, with helium as the carrier. After one minute the gas saver mode was switched on to vent away pyrolysate bleed of the sample remaining in the pyrolyzer oven. The pyrolysate was separated on a DB-5MS capillary column (30 m × 0.25 mm i.d., 0.25 m film thickness; J&W, Agilent Techologies AB, Sweden). The GC temperature program started at 40 ◦ C and was increased by 32 ◦ C min−1 to 100 ◦ C, by 6 ◦ C min−1 to 120 ◦ C, by 15 ◦ C min−1 to 250 ◦ C and finally by 32 ◦ C min−1 to 320 ◦ C where the temperature was kept for 3 min which resulted in a total run time of 19 min. The interface to the MS was kept at 280 ◦ C. The mass spectrometer with a quadrupolar type analyzer scanned the range from m/z 35 to m/z 250 resulting in a scan rate of 6.22 scans s−1 . The mass spectrometer was operated at unit mass resolution. 2.3. Raw data processing Raw datafiles were converted to NetCDF format in Agilent Chemstation Data Analysis (Version E.02.00.493). The employed data processing pipeline was described elsewhere [13]. Briefly, it consists of chromatogram smoothing and alignment, background correction and MCR-AR (Fig. 1). The data was smoothed by application of a moving average (length = 3). For initial background correction, the minimum value of each mass channel was subtracted from the respective mass channel. Chromatograms were
Data Analysis Fig. 1. Flow diagram of the data processing pipeline. The steps within the bracket were done with a Matlab script [13]. Data analysis, mainly peak identification and calculation of figures of merit was done in R [18].
global, linear aligned by maximizing the co-variance of the TIC in between samples [14]. For performance reasons, and as a preparative step for background correction, the set of overlaid TIC chromatograms were at points of local minima automatically divided into non-overlapping windows of 200 to 450 scans. Each mass channel of each chromatogram was then again baseline corrected per processing window by linear interpolating and subsequent subtracting of the area under the line between the first and the last data point in the processing window. Each such processing window represents a data cube of size N × K × L, where N = number of samples, K = scans and L = mass channels. The data cube was then unfolded to the matrix (N × K) × L to which the alternating regression (AR) algorithm was applied [15]. This iterative method, applied to curve resolution of GC/MS data, alternates between deconvoluting the chromatographic and the mass spectral profile until the two solutions converge. The algorithm starts with the assumption to find one distinct compound (rank = 1). A number of constraints such as non-negativity are applied to the found solution. Then the rank is increased by one and a new solution is calculated. This procedure is repeated until three consecutive rounds result in compounds not eluting in the same sequence for every sample. The last solution with correct sequence of eluting compounds is used as final solution. Applying MCR-AR to every processing window yielded each a matrix with peak areas (N = samples, K = peaks), and one with mass spectra for each peak. The MCR-AR processing for 20 samples lasted for about 30 minutes (Windows XP, Matlab 7.0, Intel Core 2 6600 2.4 GHz, 2 GB RAM) including the pre-processing of the data. 2.4. Data analysis To demonstrate the performance of the instrumental analysis and to compare with conventional data processing pipelines, peak identification was initially done manually using a library from literature containing 186 spectra [16,17].
L. Gerber et al. / Journal of Analytical and Applied Pyrolysis 95 (2012) 95–100
(a)
97
signal
aspen Norway spruce
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
RT minutes
(b)
5.4
(c)
5.6
5.8
6
6.2
6.4
6.6
5.4
5.6
5.8
6
6.2
6.4
6.6
Fig. 2. Chromatograms of pyrolyzed aspen and Norway spruce samples. (a) Typical total ion count (TIC) chromatograms of both sample types. (b,c) Enlargement of a region from the TIC in (a) where the peaks detected and resolved by MCR-AR are shown below the TIC for aspen (b) and Norway spruce (c) samples.
In the data analysis method adopted for high-throughput screening applications, a script written in the programming environment R (supplementary data) generated the identification list by assigning every peak to a broad class of precursor polymers of the pyrolytic degradation products such as carbohydrates, syringyltype lignin (S), guaiacyl-type lignin or (G), p-hydroxyphenyl-type lignin (H) according to the basepeak of the extracted mass spectra [18]. 3. Results and discussion High throughput chemical fingerprinting aims to screen and classify a large number of samples with high sensitivity. Here, sensitivity is defined as the ability to detect small differences in the chemical fingerprints between samples. Sensitivity depends on technical reproducibility to identify and quantify specific peaks. Experimental variation cannot only be introduced during sample preparation and analytical measurement, but also during data processing routines. To assess the overall performance of the method we analyzed ten technical replicates from milled wood of both aspen and Norway spruce. The analytical procedure was adapted from Bjurhager et al. [19]. Minor modifications were necessary to optimize operation of the auto sampler-fed oven pyrolyzer. Further, to improve peak shapes we adjusted the quadrupole mass-analyzer to high scan rates by limiting the scan range. A higher number of data points across peaks both increase the precision of the analysis and the performance of the data processing algorithm. The raw data from chemstation was exported as NetCDF files and then processed by the MCR-AR algorithm. In screening applications the curve resolution algorithm must often cope with a substantial variation in number and intensity of abundant peaks. To mimic this situation we analyzed and processed the samples from aspen and spruce together.
The MCR-AR algorithm yielded deconvoluted mass spectra where overlapping peaks were well resolved (Fig. 2). The extracted mass spectra were of high quality and in good agreement with published spectra from lignocellulosic materials (Fig. 3). A detailed evaluation showed that 37% of the raw signal consisted of resolved peaks, and 58% was attributed to background signal from unavoidable sample bleed after pyrolysis. The remaining 5% was noise. The MCR-AR algorithm resolved a total number of 96 peaks in the current dataset. Average relative standard deviation (RSD) for peaks larger than 0.1% of total peak area was 8.7% for both sample types (aspen 8.7% ± 1.8, spruce 8.7% ± 1.2, 95% C.I.). These peaks represented more than 99% of the integrated peak area. Thus, we concluded that application of MCR-AR to analytical pyrolysis produced highly reproducible chemical fingerprints well suited for further sample discrimination applications, for example by multivariate analysis. We found several advantages of MCR-AR when it was compared to AMDIS, a software initially developed to automatically detect and resolve peaks in complex GC/MS chromatograms. A major problem with AMDIS was inconsistent peak tables as a result of sequential
Table 1 Five mass channels for which the basepeak of the degradation product cannot unambiguously be assigned to a specific precursor polymer using a library of 186 mass spectra[14,15]. Fourteen of the library spectra had basepeaks corresponding to these 5 m/z values; C = carbohydrate, G = guaiacyl, S = syringyl, H = p-hydroxyphenol, P = phenolic, U = unknown. m/z
Possible assignments
108 110 122 138 168
GPH GC GHHHH GU SG
(1a)
100
100
100
L. Gerber et al. / Journal of Analytical and Applied Pyrolysis 95 (2012) 95–100 100
98
(3a) 80
80
(4a)
50
60
70
80
90
100
100
80
100
80
100
80
80
60
20
40
40
60 40
0
20
60
100
40
(4b)
60
80
80 60
80
(3b)
(2b)
40
60
20
100
(1b)
20
40
100
40
100
40 45 50 55 60 65 70
50
60
70
80
90
100
80
100
120
140
120
60
80
100
120
140
80 40 20
60
80
100 120 140
80
100 120 140
80
80
60 40 20
20 40
40
(8b)
40
60 40 20 100
100 120 140
60
80
80 60 40
80
80
(7b)
(6b)
20
60
60
100
100
(5b)
40
40
100
60
60
60
60 40 40
40
(8a)
20
40 20 120
100
100
100
80
80 60
60 40
80
80
(7a)
(6a)
20
60
60
100
(5a)
40
40
100
40
100
100
40 45 50 55 60 65 70
80
relative signal intensity
20
0
0
20
20
20
40
40
40
40
60
60
60
60
80
80
(2a)
60
80
100
120
140
40
60
m/z Fig. 3. Mass spectra resolved by MCR-AR compare well with spectra obtained by traditional approaches. Spectra resolved by the MCR-AR algorithm (indicated with the letter a), are compared to the corresponding library spectra from literature (indicated with the letter b) [16,17]. The compounds are: 2-butenal (cis or trans) (1), dihydro-methylfuranone (2), 4-hydroxy-5,6-dihydro-(2H)-pyran-2-one (3), 2-hydroxy-1-methyl-1-cyclopentene-3-one (4), Guaiacol (5), Guaiacol, 4-methyl- (6), Guaiacol, 4-vinyl-(7), Syringol (8).
sample processing. Several approaches for post-processing have been developed to overcome this problem, but they all require tedious data handling [8,20]. Moreover, the integrals of corresponding peaks in technical replicates produced by AMDIS showed, in our hands, a much higher RSD than could be expected from the analytical setup. When we performed manual integration of these peaks it was evident that the unexpected high variation resulted from an artifact of the peak detection and integration algorithm. To improve the robustness of AMDIS several add-on tools have been presented
[21,22]. However, these workarounds add more processing steps, which increase processing time and add complexity by including additional parameters to be optimized. In contrast, MCR-AR solves the robustness issue by parallel processing with a minimum number of process parameters to determine. This is an advantage when working with a wide range of different sample types, as optimization of process parameters is quick and straightforward. In many applications where Py-GC/MS is used to analyze lignocellulosic material, the focus is to estimate the relative quantities of
L. Gerber et al. / Journal of Analytical and Applied Pyrolysis 95 (2012) 95–100
99
Table 2 Manually curated peak list from aspen and Norway spruce samples. ID numbers, compound names and classifications are according to [14,15]. C = carbohydrate, G = guaiacyl, S= syringyl, H = p-hydroxyphenyl. ID 2 18 24 16 19 12 42 10 27 26 28 32 21 29 43 34 38 37 40 44 41 54 47 51 2 57 60 61 70 24 72 66 79 84 77 34 68 40 38 42 46 49 51 47 48 43 44 50 101 100 62 56 55 58 57 53 69 102 63 64 59 70 54 60 76 75 80
RT (s)
Type
Name
135.284 154.756 156.365 166.503 168.434 178.411 187.905 191.767 191.928 194.181 198.686 199.652 200.617 206.732 216.548 222.341 232.318 233.606 236.663 245.674 258.065 261.927 263.536 284.295 294.594 306.823 308.593 326.455 352.685 370.064 374.248 390.823 393.558 433.144 437.972 442.96 458.569 503.305 527.442 551.741 554.316 559.787 607.258 609.028 617.717 627.212 629.625 633.487 638.637 646.844 648.292 654.89 667.28 668.729 680.958 683.855 685.625 704.774 712.337 720.705 724.889 732.13 753.05 758.36 768.337 853.946 857.808
C C C C C C C C C C C C C C C C C C C C C C C C H C C C C G C C C C C G C G G S G G S G G G G G C C S G S G G G S C G S G S G G S S S
Carbon dioxide Acetic acid 1-Hydroxy-2-butanone 2-Butenal (cis or trans) Hydroxypropanone Unknown: similar to 3-Pentanone 2-Butenoic acid methyl ester or Methoxy-dihydrofuran Unknown: similar to 1-Penten-3-one 3-Butenal-2-one 3-Hydroxypropanal (isomer of compound no. 19) (3H)-Furan-2-one Butanedial Acetic anhydride or 2-Oxo-propanoic acid methyl ester (2H)-Furan-3-one 3-Furfuryl alcohol 2-Furaldehyde 1-Acetyloxypropane-2-one 2-Furfuryl alcohol Dihydro-methyl-furanone or Dimethyl-dihydro-furan 4-Cyclopentene-1,3-dione 2-Acetylfuran (5H)-Furan-2-one Dihydro-methyl-furanone (isomer of compound no. 36) Isomer of compound no. 57: unknown Phenol 4-Hydroxy-5,6-dihydro-(2H)-pyran-2-one Methyl-dihydro-(2H)-pyran-2-one 2-Hydroxy-1-methyl-1-cyclopentene-3-one gamma-Lactone derivative: unknown Guaiacol Anhydrosugar: unknown 3-Hydroxy-2-methyl-pyran-4-one 4-Hydroxy-3-methyl-(5H)-furanone or 3-Methyl-2,4-furandione 1,5-Anhydro-arabinofuranose Overlapping spectra (Mw 116, 128, 144) Guaiacol, 4-methylCompound similar to no. 59 Guaiacol, 4-ethylGuaiacol, 4-vinylSyringol Isoeugenol (cis) Homovanillin Syringol, 4-methylIsoeugenol (trans) Guaiacol, 4-propylC10.H10.02. C10.H10.02. Acetoguaiacone 1,6-Anhydro-beta-D-glucopyranose = Levoglucosan 1,6-Anhydro-beta-D-mannopyranose Syringol, 4-ethylGuaiacyl acetone Syringol, 4-vinylConiferyl alcohol structure isomer Propioguaiacone Guaiacol, 4-(oxy-allyl)Syringol, 4-propenyl- (cis) Anhydrosugar: unknown Dihydroconiferyl alcohol Syringaldehyde Coniferyl alcohol (cis) Syringol, 4-propenyl- (trans) Coniferaldehyde Coniferyl alcohol (trans) Syringyl acetone Sinapaldehyde Sinapyl alcohol (trans)
lignin types and carbohydrate polymers in the sample. This means that the pyrolytic degradation products are classified according to their precursor polymers. In the prevailing standard workflow, tools like MS-SEARCH are used to search and match the extracted mass spectra with pure spectra of standard compounds in libraries.
Afterwards, the peak areas of matched spectra are assigned to their precursor polymers according to the literature. The whole process is rather time consuming and requires a trained operator. However, more than 90% of the tabulated peaks in our mass spectral library of lignocellulosic materials have a basepeak unique for only one of
100
L. Gerber et al. / Journal of Analytical and Applied Pyrolysis 95 (2012) 95–100
Table 3 Comparison between automated and manual peak classification. Automated peak classification is based on basepeak only. Values show percent of total peak area. Mean ± 0.95% C.I. Sample type
Aspen
Norway spruce
Manual Carbohydrate G Lignin S Lignin H Lignin
62.7 10.4 20.4 1.7
± ± ± ±
0.8 0.4 0.4 0.1
Automated
Manual
± ± ± ±
52.51 40.8 0.1 0.0
66.3 10.4 20.3 1.66
0.8 0.4 0.4 0.1
± ± ± ±
proportions of biomass samples compared to FT-IR. These advantages are contrasted by a lower sample throughput compared to both Py-MBMS and FT-IR. Acknowledgements
Automated 0.5 0.6 0.0 0.0
57.5 40.8 0.1 0.2
± ± ± ±
0.6 0.6 0.0 0.0
the four pyrolytic degradation product precursor classes: guaiacyl, syringyl, p-hydroxyphenol or carbohydrates. Only five m/z channels represent basepeaks from more than one precursor polymer (Table 1). We implemented peak assignment by using solely the basepeak into a short R script (Supplementary Data) and performed an evaluation by comparing the results to those obtained from traditional peak assignments using mass spectra library searches (Table 2). It is obvious that such a simplistic approach compromises accuracy to a certain extent. Still, the estimated percentages of S, G and H lignin agreed well between the two approaches, although a minor difference was found in the quantification of the carbohydrate fraction (Table 3). To a certain extent, the presented method is similar to the data analysis approach employed by pyrolysis-molecular beam mass spectrometry (Py-MBMS). This approach often disregards signal variation in the time domain, and sample characterization is based only on the averaged or summed signal per mass channel. It has been shown that this approach can provide information on S, G, H lignin and the carbohydrate fraction [23,24]. Applying this strategy to Py-GC/MS provides a much quicker workflow from raw data to interpretable results than conventional molecular assignments, and it does not compromise the much higher sensitivity achieved by chromatographic separation of Py-GC/MS setups compared to Py-MBMS. Our data analysis protocol is therefore well suited to large scale screening applications where samples of interest are identified by multivariate approaches. The broad peak classification of lignins and carbohydrates can help to identify appropriate approaches for further in-depth characterization of the separating trait(s). 4. Conclusions Application of MCR-AR to the data processing pipeline for Py-GC/MS analysis overcomes the bottleneck that prevents wide application of this technique as a tool for processing large sets of complex biological samples. Applications may range from screening of wood and cell walls of natural- and mutant populations, to off-line quality control in pulp and paper or dietary fibre processing industries. To date, mostly FT-IR spectroscopy or Py-MBMS instruments are used for these purposes [25–27]. Advantages of using Py-GC/MS are increased sensitivity because of the additional chromatographic separation compared to Py-MBMS, and the ability to quickly quantify S-,G-, and H-lignin and carbohydrate
This work was supported by grants from FORMAS (FuncFiber/Bioimprove – centre of excellence in wood science), Vetenskapsrådet, the Swedish Energy Agency, EU Programme Renewall (FP7/2007–2013), VINNOVA, the Kempe foundation and Bio4Energy, the Swedish Programme for renewable energy. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.jaap.2012.01.011. References [1] S.C. Moldoveanu, Techniques and Instrumentation in Analytical Chemistry 20 (1998) 510. [2] D. Meier, I. Fortmann, J. Odermatt, O. Faix, Journal of Analytical and Applied Pyrolysis 74 (2005) 129. [3] M. Bylesjö, M. Rantalainen, O. Cloarec, J.K. Nicholson, E. Holmes, J. Trygg, Journal of Chemometrics 20 (2006) 341. [4] A. Gorzsas, H. Stenlund, P. Persson, J. Trygg, B. Sundberg, Plant J (2011). [5] M. Hedenstrom, S. Wiklund-Lindstrom, T. Oman, F.C. Lu, L. Gerber, P. Schatz, B. Sundberg, J. Ralph, Molecular Plant 2 (2009) 933. [6] S. Stein, Journal of The American Society for Mass Spectrometry 10 (1999) 770. [7] P. Wenig, J. Odermatt, Bmc Bioinformatics 11 (2010). [8] P. Wenig, J. Odermatt, Journal of Analytical and Applied Pyrolysis 87 (2010) 85. [9] G. Telysheva, T. Dizhbite, G. Dobele, A. Arshanitsa, O. Bikovens, A. Andersone, V. Kampars, Journal of Analytical and Applied Pyrolysis 90 (2011) 126. [10] J.W. Choi, I.Y. Eom, K.H. Kim, J.Y. Kim, S.M. Lee, H.M. Yeo, I.G. Choi, Bioresource Technology 102 (2011) 3437. [11] A.M. Patten, M. Jourdes, C.L. Cardenas, D.D. Laskar, Y. Nakazawa, B.Y. Chung, V.R. Franceschi, L.B. Davin, N.G. Lewis, Molecular Biosystems 6 (2010) 499. in http://chemdata.nist.gov/mass-spc/ms[12] MS-SEARCH, search/downloads/NISTDEMO 08.exe, last visited 10-10-2011. [13] P. Jonsson, A.I. Johansson, J. Gullberg, J.J.A. Trygg, B. Grung, S. Marklund, M. Sjostrom, H. Antti, T. Moritz, Analytical chemistry 77 (2005) 5635. [14] G. Malmquist, R. Danielsson, Journal of Chromatography A 687 (1994) 71. [15] E.J. Karjalainen, Chemometrics and Intelligent Laboratory Systems 7 (1989) 31. [16] O. Faix, I. Fortmann, J. Bremer, D. Meier, Holz Als Roh-Und Werkstoff 49 (1991) 213. [17] O. Faix, D. Meier, I. Fortmann, Holz Als Roh-Und Werkstoff 48 (1990) 281. [18] R Development Core Team, R Foundation for Statistical Computing, 2011, Vienna, Austria. [19] I. Bjurhager, A.M. Olsson, B. Zhang, L. Gerber, M. Kumar, L.A. Berglund, I. Burgert, B. Sundberg, L. Salmen, Biomacromolecules 11 (2010) 2359. [20] R. Aggio, S.G. Villas-Boas, K. Ruggiero, Bioinformatics 27 (2011) 2316. [21] V. Behrends, G.D. Tredwell, J.G. Bundy, Analytical Biochemistry 415 (2011) 206. [22] C.D. Broeckling, I.R. Reddy, A.L. Duran, X. Zhao, L.W. Sumner, Analytical Chemistry 78 (2006) 4334. [23] M.H. Studer, J.D. DeMartini, M.F. Davis, R.W. Sykes, B. Davison, M. Keller, G.A. Tuskan, C.E. Wyman, Proceedings of the National Academy of Sciences of the United States of America 108 (2011) 6300. [24] G. Tuskan, D. West, H.D. Bradshaw, D. Neale, M. Sewell, N. Wheeler, B. Megraw, K. Jech, A. Wiselogel, R. Evans, C. Elam, M. Davis, R. Dinus, Applied Biochemistry and Biotechnology 77–9 (1999) 55. [25] J.L. Wegrzyn, A.J. Eckert, M. Choi, J.M. Lee, B.J. Stanton, R. Sykes, M.F. Davis, C.J. Tsai, D.B. Neale, New Phytologist 188 (2010) 515. [26] Y. Zhang, C.Y. Cao, W.Y. Feng, M. Xu, Z.H. Su, X.M. Liu, W.J. Lu, Spectroscopy and Spectral Analysis 31 (2011) 652. [27] A.B. Champagne, K.V. Emmel, Vibrational Spectroscopy 55 (2011) 216.