Multivariate near infrared spectroscopy models for predicting methanol and water content in biodiesel

Multivariate near infrared spectroscopy models for predicting methanol and water content in biodiesel

Analytica Chimica Acta 595 (2007) 107–113 Multivariate near infrared spectroscopy models for predicting methanol and water content in biodiesel Pedro...

192KB Sizes 2 Downloads 58 Views

Analytica Chimica Acta 595 (2007) 107–113

Multivariate near infrared spectroscopy models for predicting methanol and water content in biodiesel Pedro Felizardo a , Patr´ıcia Baptista a , Jos´e C. Menezes b , M. Joana Neiva Correia a,∗ a

b

Centre of Chemical Processes, IST, Technical University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal Centre for Biological and Chemical Engineering, IST, Technical University of Lisbon, Av. Rovisco Pais, 1049-001 Lisbon, Portugal Received 18 October 2006; received in revised form 31 January 2007; accepted 21 February 2007 Available online 24 February 2007

Abstract The transesterification of vegetable oils, animal fats or waste oils with an alcohol (such as methanol) in the presence of a homogeneous catalyst (sodium hydroxide or methoxyde) is commonly used to produce biodiesel. The quality control of the final product is an important issue and near infrared (NIR) spectroscopy recently appears as an appealing alternative to the conventional analytical methods. The use of NIR spectroscopy for this purpose first involves the development of calibration models to relate the near infrared spectrum of biodiesel with the analytical data. The type of pre-processing technique applied to the data prior to the development of calibration may greatly influence the performance of the model. This work analyses the effect of some commonly used pre-processing techniques applied prior to partial least squares (PLS) and principal components regressions (PCR) in the quality of the calibration models developed to relate the near infrared spectrum of biodiesel and its content of methanol and water. The results confirm the importance of testing various pre-processing techniques. For the water content, the smaller validation and prediction errors were obtained by a combination of a second order Savitsky–Golay derivative followed by mean centring prior to PLS and PCR, whereas for methanol calibration the best results were obtained with a first order Savitsky–Golay derivative plus mean centring followed by the orthogonal signal correction. © 2007 Elsevier B.V. All rights reserved. Keywords: Biodiesel; Near infrared; Calibration models; Data pre-processing

1. Introduction There are several advantages of using biodiesel as a fuel in diesel motors, such as the reduction of the greenhouse gases emissions, the increasing of the eco-efficiency and, if waste frying oils (WFO) are used as the raw-material for biodiesel production, the treatment of industrial and household wastes [1,2]. The production of biodiesel may be achieved by a homogeneous (sodium hydroxide or methoxyde) catalysed transesterification reaction between a lipid (vegetable oils and fats) and a short chain alcohol, such as methanol, to produce an ester and a by-product, glycerol. This reaction occurs stepwise, with mono and diglycerides as intermediate products [3]. At the end of the reaction period, the glycerol rich-phase is separated from the ester layer by decantation or centrifugation. After



Corresponding author. Tel.: +351 21 8417344; fax: +351 21 8417246. E-mail address: [email protected] (M.J.N. Correia).

0003-2670/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2007.02.050

separation, the biodiesel phase is contaminated with mono, di and triglycerides, methanol, catalyst, free glycerol and soaps and has to be purified to comply with the European Standard EN 14214 [4]. The washing of the esters phase with water followed by vacuum drying is the most commonly used process for biodiesel purification [5]. Since it is possible to produce biodiesel from several different feedstocks and technologies, the quality control of the final product is of great concern and the European Standard EN 14214 [4] establishes 25 parameters that have to be analysed to certify biodiesel quality. Among these, the contents in water and methanol are two important parameters [3]. In fact, the use of biodiesel contaminated with water can cause corrosion in the engine or the reaction with the glycerides to produce soaps and glycerine. The EN imposes, therefore, a maximum content of 0.05% (m/m) of water in fuels. Methanol is responsible for metal corrosion, particularly of aluminium, as well as for the decreasing of the fuel flash point. As such, a maximum content of 0.2% (m/m) of methanol in biodiesel is specified [4]. Biodiesel analyses are very expensive and time consuming and

108

P. Felizardo et al. / Analytica Chimica Acta 595 (2007) 107–113

NIR spectroscopy appears as a cheaper and faster alternative to perform the quality control of biodiesel [6–9]. The use of NIR spectroscopy in combination with multivariate data analysis for the analysis of biofuels and other complex matrices has been reported in recent papers [3–9]. NIR spectroscopy is a well-established analytical technique based on the absorption of electromagnetic energy in the region from 700 to 2500 nm. This technique enables the analysis of multicomponent samples in a fast and non-destructive way, without requiring complex pre-treatments. The use of partial least squares (PLS) or principal components regression (PCR) allows the development of calibration models between spectral and analytical data [10–13]. This work analyses the effect of applying several commonly used pre-processing techniques, prior to the application of PLS and PCR, in the quality of the calibration models developed to relate the near infrared spectrum of biodiesel and its content of methanol and water. 2. Experimental Industrial-scale and laboratory-scale samples of biodiesel produced from soybean, mixtures of soybean and palm, and from waste frying oils were prepared according to the procedure presented elsewhere [3,9]. Industrial samples of biodiesel produced from soybean, palm and waste frying oils were supplied by two Portuguese industrial companies. The reference method for water determination was the Karl Fisher titration [14] that was performed in a Metrohm 682 titroprocessor, while methanol content was analysed by Head Space Gas Chromatography [15] using a HP 5890 equipped with a PoraPlot Q packed column (3 m long). The near-infrared diffuse transflectance spectra of the biodiesel samples were acquired using an ABB BOMEM MB160 spectrometer equipped with an InGaAs detector and a transflectance probe from SOLVIAS. Spectra were recorded in duplicate for each sample at room temperature (22–24 ◦ C), with the aid of the Galactic Grams software package, in the wave number range of 12,000–4000 cm−1 , with a spectral resolution of 16 cm−1 . The average of the two measurements was used for model development. 2.1. Data analyses and calibration development All calculations were carried out using Matlab Version 6.5 (MathWorks, Natick, MA) and the PLS Toolbox Version 3.0 (Eigenvector Research Inc., USA) for Matlab. Partial least squares (PLS) and principal components regressions (PCR) were used to develop the calibration models with the spectral and the analytical data. Both methods search for linear combinations, named as factors or components, of the original X-values and use only these linear combinations in the regression equation of the model that relates the spectra X with a given property of interest, y, in this case the content of water or methanol in biodiesel [11]. However, the approach followed in each case to derive the model is different. In fact, PCR selects the components according to their ability to account for the variability in X, without using information about y. On the other hand, instead of

using components, PLS uses factors determined by employing both X and y in estimation; for PLS regression each component is obtained by maximizing the covariance between y and all possible linear combinations of the columns of X. This leads to components, which are more directly related to the variability in y than the principal ones of the PCR approach [11]. Prior to PCR or PLS regressions, various widely used pre-processing techniques described in literature [12], were applied to the data. This work presents the calibration results obtained from untreated data (identified in the tables below as none) and from pre-treated values, using the following methods: Mean Centering (MC); Mean Scattering Correction (MSC) followed by MC (MSC + MC); Standard Normal Variate scaling (SNV) plus MC (SNV + MC); first and second order Savitsky–Golay derivative followed by mean centering (SV1 + MC and SV2 + MC, respectively); MC followed by the Orthogonal Signal Correction method (MC + OSC) and, finally, the SV1 + MC and SV2 + MC followed by OSC (SV1 + MC + OSC, SV2 + MC + OSC, respectively). The Orthogonal Signal Correction is a method developed to reduce the data variance in the spectra (X) due to light scatter effects and to more general types of interferences that have no correlation with the measured property y (water or methanol content). The idea is that all the information in the spectrum related to y should be considered rather than removed [11]. One of the most important steps in developing a reliable calibration model between the NIR spectrum and the analytical data is the selection of the optimum number of latent variables (LV) or principal components (PC) to be used. There are several methods described in the literature to select this number, such as the akaike information criterion [16], bootstrap [17,18], cross validation [11,17,19], ICOMP criterion for PCR [20] and conditional model dimensionality test for PLS [21]. Among these, the method used in this work, LOOCV or leave one out crossvalidation, is one of the most commonly used criteria. Therefore, all the PCR and PLS regressions were developed using the LOOCV method to determine the optimum number of latent variables or principal components. Additionally, the conditional model dimensionality test was also used for the PLS regressions to check any possible overfitting of the data. For both water and methanol models, this test confirmed the choice of the latent variables number obtained by applying the cross-validation method, whereas the calculated risk of over-fitting was lower than 1%. The detection of outliers was performed based on the leverage values, Q-residuals, and Studentized y-residuals. Thus, a sample was considered to be an outlier if its leverage value was twice as large as the average leverage value (given by 2(1 + LV)/N where LV is the number of latent variables and N the number of samples), or if its Q- residual falls above the 95% confidence limits for the considered model, or yet if y-residual of the sample was larger than twice the residual standard deviation [12]. The calibration models were developed taking into account the number of LV or PC, and by calculating the root mean square errors of the cross-validation, RMSECV, and of the external validation, RMSEP and the determination coefficients, Q2Y , between the predicted and the measured values [10,12]. The later coefficient, calculated using Eq. (1), quantifies the amount of variance

P. Felizardo et al. / Analytica Chimica Acta 595 (2007) 107–113

in y predicted by the model.   (y − yˆ )T (y − yˆ ) 2 × 100 QY = 1 − yT y

(1)

The samples were randomly split several times using the Shuffle function of Matlab. Thus, after the random reorder of the matrix rows, the complete data set was divided into the calibration (the first 3/4 of the matrix’ rows) and the external validation sets (the last 1/4 of the matrix’ rows). This procedure was repeated several times to obtain different calibration/validation sets that were submitted to PLS and PCR regression. The differences between the average errors of these models and the highest and the lowest ones were smaller than 2 mg kg−1 for water, and 6 mg kg−1 for methanol. Therefore, the final calibration models were developed using the data split that allowed the calculation of the values of RMSECV and RMSEP to be equal to the mean values of these errors calculated from the results obtained after the several runs of the Shuffle function. 3. Results and discussion 3.1. Calibration model for water Fifty industrial and laboratory scale samples of biodiesel produced from soybean oil, mixture of soybean and palm oils, and waste frying oils were analysed. The content of water and methanol in the samples ranged from 218 to 1859 and 106 to

109

1283 mg kg−1 , respectively. The measured data were used for the development of the calibration models for both chemicals. It is worth emphasizing that the contents of methanol and water in these samples were unrelated: some samples had a high (or low) content of both components whereas others have high water and low methanol contents (or vice versa). After excluding the noise and non-informative ranges of the spectra, respectively, <4500 and >9000 cm−1 , only the region between 9000 and 4500 cm−1 was used for calibration. From the 50 samples, 38 were used for calibration and 12 for validation. After the removal of three outliers (two laboratory scale samples produced from soybean and waste frying oils and one industrial sample produced from WFO), the PCR and PLS models were developed by applying the above-mentioned pre-processing methods. 3.1.1. PCR models development The influence of using pre-processing methods prior to PCR on the quality of the calibration models developed for water is presented in Table 1. As seen in Table 1, the untreated data (identified as none) as well as the mean centred (MC) and the first order derivative Savitsky–Golay filtered (SV1 + MC) pre-treatments produced similar values for the validation (RMSECV) and prediction (RMSEP) errors. On the contrary, the use of the Mean Scattering Correction (MSC + MC) and Standard Normal Variation (SNV + MC) methods led to higher prediction errors. The application of the second order derivative Savitsky–Golay method,

Table 1 Cross validation and external validation results for the prediction of water concentration using PCR models after several pre-processing techniques (spectral region: 9000–4500 cm−1 ) Parameter

None

MC

MSC + MC

SV1 + MC

SV2 + MC

SNV + MC

MC + OSC

SV1 + MC + OSC

SV2 + MC + OSC

Latent variables Filter width R2 Cross validation R2 External validation RMSECV (mg kg−1 ) RMSEP (mg kg−1 ) Q2Y Cross validation (%) Q2Y External validation (%)

5 – 0.972 0.954 74 88 97.21 95.35

5 – 0.977 0.958 68 87 97.65 95.70

5 – 0.975 0.945 71 99 97.52 94.39

5 25 0.975 0.953 71 90 97.50 95.27

6 15 0.979 0.966 64 77 97.93 96.53

5 – 0.975 0.945 70 99 97.52 94.39

2 – 0.976 0.956 69 87 97.59 95.53

3 25 0.975 0.957 70 87 97.52 95.65

5 15 0.979 0.965 64 77 97.91 96.49

Table 2 Variance captured by the best PCR models for the prediction of water concentration (the selected PC number for the model is highlighted) PC

Using pre-processing SV2 + MC X

1 2 3 4 5 6 7 8 9 10

Using pre-processing SV2 + MC + OSC y

X

y

Var./PC

Cum.

Var./PC

Cum.

Var./PC

Cum.

Var./PC

Cum.

56.43 36.02 3.81 2.53 0.52 0.42 0.10 0.05 0.04 0.02

56.43 92.45 96.26 98.79 99.31 99.74 99.84 99.89 99.93 99.95

5.04 88.68 1.26 0.79 1.05 1.88 0.02 0.02 0.01 0.01

5.04 93.72 94.98 95.78 96.83 98.71 98.73 98.75 98.77 98.77

83.04 8.55 5.68 1.17 0.97 0.23 0.12 0.08 0.04 0.03

83.04 91.59 97.27 98.44 99.41 99.63 99.75 99.83 99.88 99.91

93.84 1.25 0.78 1.20 1.68 0.02 0.02 0.01 0.01 0.03

93.84 95.09 95.87 97.07 98.75 98.77 98.79 98.81 98.82 98.85

110

P. Felizardo et al. / Analytica Chimica Acta 595 (2007) 107–113

Table 3 Cross validation and external validation results for the prediction of water concentration using PCR models after several pre-processing techniques (spectral region: 9000–4500 cm−1 ) Parameter

None

MC

MSC + MC

SV1 + MC

SV2 + MC

SNV + MC

MC + OSC

SV1 + MC + OSC

SV2 + MC + OSC

Latent variables Filter width R2 Cross validation R2 External validation RMSECV (mg kg−1 ) RMSEP (mg kg−1 ) Q2Y Cross validation (%) Q2Y External validation (%)

5 – 0.975 0.957 70 85 97.52 95.56

4 – 0.974 0.960 71 83 97.44 95.82

4 – 0.976 0.949 74 95 97.26 95.48

4 25 0.977 0.955 68 88 97.71 95.27

5 15 0.979 0.967 65 75 97.87 96.61

4 – 0.973 0.949 74 95 97.26 94.48

1 – 0.974 0.958 72 85 97.37 95.57

2 2515 0.976 0.959 69 84 97.62 95.70

4 15 0.979 0.967 65 75 97.87 96.56

with or without the Orthogonal Signal Correction (OSC), reduced the RMSEP values by about 10 mg kg−1 , when compared with the untreated data. In this case, the use of the OSC enabled the reduction of the number of PC from 6 to 5. Using the pre-processing SV2 + MC, the lowest value for RMSECV was obtained with 5 PC and, as shown in Table 2, most of the variance of the data is captured with the first 2 PC. However, since the increase from 2 to 6 PC led to the capture of more than 98% of the data variance and reduces the external prediction error of the model, it was decided to consider 6 PC in the model. From Table 2 it may be concluded that the first PC obtained after the pre-processing SV2 + MC is poorly correlated with the water-dependence variance of the data, since 56% of the variance in the spectra (X) corresponds to only 5% of the y variance. As previously mentioned, this is a direct consequence of the PCR regression, which selects the components according to their ability to account for the variability in X, without using information about y. As reported elsewhere [9], in this case PC1 is related with matrix effects associated with the feedstock oil used in biodiesel production, whereas only PC2 accounts for the water-dependence variance of the data. When the OSC is used this behaviour disappears, and a decrease in the number of PC that have to be considered to capture the variance of the data is observed. In this case, the variance in X (83%) corresponds to a water variance of almost 94%. According to the data presented in Table 2, the PCR models were developed, respectively, with 6 and 5 PC.

3.1.2. PLS models development The effect of the spectral pre-processing prior to PLS regression on the performance of the models is reported in Table 3. As presented above for PCR calibration, the lowest values of the validation and prediction errors were obtained using the second order derivative Savitsky–Golay filter with 4 or 5 LV, depending on the use of OSC. However, in spite of the decrease of the number of latent variables, the use of OSC did not allow a significant improvement of the calibration models because it led to similar values of the validation and prediction errors. The percentage of variance captured by the latent variables using the SV2 + MC pre-treatments, with and without OSC, is presented in Table 4. From this table it may be concluded that 99.26% of data variance in the spectra (X) and 98.73% of the data variance concerning the water content of the samples (y) were captured with five latent variables. Fig. 1 presents the performance of the PLS model after the SV2 + MC pre-treatment of the data and shows that there is an excellent agreement between the water content of the samples analysed by Karl Fisher and predicted by NIR spectroscopy. It is worth noting that the results presented in Tables 2 and 4 illustrate the differences between the PCR and PLS approaches. In fact, the latter singles out the components that are more strongly related to the variability in y, because, even without OSC, the first LV of the PLS captured already 92.72% of y variance, whereas PC1 captured only 5.04%. Thus, as mentioned by Martens et al. [12], the PLS regression is more efficient in extracting the information in X that is strongly correlated in y.

Table 4 Variance captured by the best PLS models for the prediction of water concentration (the selected number of LV for the model is highlighted) LV

Using pre-processing SV2 + MC X

1 2 3 4 5 6 7 8 9 10

Using pre-processing SV2 + MC + OSC y

X

y

Var./LV

Cum.

Var./LV

Cum.

Var./LV

Cum.

Var./LV

Cum.

38.51 53.69 3.65 1.77 1.63 0.46 0.09 0.06 0.03 0.03

38.51 92.20 95.85 97.62 99.26 99.72 99.81 99.86 99.90 99.93

92.72 1.69 2.22 1.41 0.70 0.04 0.11 0.15 0.16 0.10

92.72 94.41 96.63 98.03 98.73 98.76 98.88 99.03 99.19 99.28

83.03 7.69 4.00 3.61 1.02 0.20 0.13 0.09 0.06 0.05

83.03 90.72 94.72 98.33 99.35 99.55 99.68 99.77 99.83 99.88

94.26 2.41 1.40 0.70 0.04 0.15 0.17 0.13 0.08 0.10

94.26 96.67 98.07 98.77 98.82 98.96 99.13 99.25 99.34 99.44

P. Felizardo et al. / Analytica Chimica Acta 595 (2007) 107–113

111

tration of both contaminants in the samples is unrelated. Only the region between 9000 and 4500 cm−1 was used for calibration. 72 samples were initially used for calibration and 29 for model validation. However, because outlier detection performed according to the procedure explained above excluded five samples (two industrial samples produced from soybean oil, one from WFO and two laboratory scale samples), the calibration model was based on the remaining 67 values. PLS and PCR models were developed after the pre-processing of the data.

Fig. 1. PLS regression and external validation for the water content in biodiesel using the pre-processing SV2 + MC. (x) calibration set; () validation set.

By comparing the results presented in Tables 1 and 3 it may also be concluded that the performance of the PLS and PCR models for water determination is similar, both models allowing the prediction of the water content in biodiesel with an error less than 77 mg kg−1 . The results presented above clearly indicate that the calibration models developed to predict the water concentration in biodiesel from NIR spectra perform very well, thus meaning that the amount of water in industrial and lab-scale biodiesel samples produced from different feedstock oils may be easily determined by NIR spectroscopy. Furthermore, the use of the orthogonal signal correction in the pre-processing of the data led only to the reduction of the number of principal components and latent variables, respectively, of the PCR or PLS regressions for water determination. 3.2. Calibration model for methanol The methanol calibration models were developed using 101 samples of biodiesel both from industrial and laboratory scale production. These samples contain between 2 and 2860 mg kg−1 of methanol and 180 and 1859 mg kg−1 of water. As previously mentioned for water calibration, the variation of the concen-

3.2.1. PCR models development As previously described, the influence of the pre-processing techniques was tested in the development of methanol calibration models using PCR. The results presented in Table 5 show that the use of data pre-processing prior to PCR is very important. In fact, without the pre-treatment of the data, 10 PC are required to capture the variance of the data and, even so, the errors of cross-validation (RMSECV) and external validation (RMSEP) are high. In spite of the reduction of the number of PC, the use of MC, MSC and SNV data pre-processing methods did not allow to improve significantly the calibration models. The application of the first and second order Savitsky–Golay derivative (SV1 + MC and SV2 + MC) of the data before PCR yields similar results and both allowed decreasing the number of PC from 10 to 5, with an increase of the model prediction errors from 75 to 119 mg kg−1 . Table 5 also shows that for methanol calibration, the application of the orthogonal signal correction allowed a significant reduction of the cross and external validation errors, as well as of the number of PC used for modelling. It can also be concluded that with OSC the first order derivative (SV1 + MC + OSC) gives better results than the second order one (SV2 + MC + OSC). The benefits of using OSC for pre-processing the data set in methanol calibration models are also shown in Table 6. In fact, without OSC it is necessary to have 5 PC to capture 98.27% of the variance of the methanol concentration. Furthermore, ≈99% of the X variance (4 PC) corresponds only to ≈23% of the abovementioned variance. As for water, the first principal component is capturing the variance of the spectra without using the information about the content of methanol in biodiesel, y, and it is probably related to the type of oil used for biodiesel production. The necessity of having 5 PC is eliminated by using the OSC filter, which allowed capturing 99.6% of the variance in y with only 1 PC. Moreover, the use of the OSC allowed an impor-

Table 5 Cross validation and external validation results for the prediction of methanol concentration using PCR models after several pre-processing techniques (spectral region: 9000–4500 cm−1 ) Parameter

None

MC

MSC + MC

SV1 + MC

SV2 + MC

SNV + MC

MC + OSC

SV1 + MC + OSC

SV2 + MC + OSC

Latent variables Filter width R2 Cross validation R2 External validation RMSECV (mg kg−1 ) RMSEP (mg kg−1 ) Q2Y Cross validation (%) Q2Y External validation (%)

10 – 0.984 0.989 92 75 98.4 98.91

9 – 0.981 0.987 99 82 98.12 98.65

8 – 0.983 0.987 96 83 98.27 98.71

5 25 0.979 0.973 105 119 97.91 97.32

5 25 0.979 0.974 105 118 97.91 97.40

8 – 0.983 0.987 96 82 98.27 98.71

3 – 0.975 0.973 118 116 97.45 97.25

1 25 0.992 0.991 64 67 99.23 99.08

1 25 0.988 0.988 79 76 98.81 98.83

112

P. Felizardo et al. / Analytica Chimica Acta 595 (2007) 107–113

Table 6 Variance captured by the best PCR models for the prediction of methanol concentration (the selected PC number for the model is highlighted) PC

Using pre-processing SV1 + MC X

1 2 3 4 5 6 7 8 9 10

Using pre-processing SV1 + MC + OSC y

X

y

Var./PC

Cum.

Var./PC

Cum.

Var./PC

Cum.

Var./PC

Cum.

69.20 27.64 1.44 0.76 0.58 0.17 0.08 0.04 0.03 0.01

69.20 96.84 98.29 99.05 99.62 99.79 99.87 99.91 99.94 99.95

9.82 0.03 1.35 11.36 75.71 0.04 0.36 0.04 0.21 0.16

9.82 9.86 11.20 22.57 98.27 98.32 98.68 98.72 98.92 99.08

74.31 14.39 7.40 1.74 0.81 0.38 0.28 0.16 0.13 0.11

74.31 88.70 96.11 97.85 98.66 99.04 99.32 99.48 99.61 99.72

99.61 0.03 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00

99.61 99.65 99.66 99.66 99.66 99.67 99.67 99.67 99.67 99.68

Table 7 Cross validation and external validation results for the prediction of methanol concentration using PLS models after several pre-processing techniques (spectral region: 9000–4500 cm−1 ) Parameter

None

MC

MSC + MC

SV1 + MC

SV2 + MC

SNV + MC

MC + OSC

SV1 + MC + OSC

SV2 + MC + OSC

Latent variables Filter width R2 Cross validation R2 External validation RMSECV (mg kg−1 ) RMSEP (mg kg−1 ) Q2Y Cross validation (%) Q2Y External validation (%)

7 – 0.977 0.977 110 106 97.70 95.69

6 – 0.975 0.976 114 111 97.54 97.50

6 – 0.982 0.987 97 84 98.22 98.58

4 11 0.982 0.98 97 104 98.21 97.8

4 21 0.974 0.972 117 117 97.42 97.22

6 – 0.982 0.987 97 83 98.23 98.58

2 – 0.975 0.974 117 114 97.41 97.36

1 11 0.995 0.993 50 61 99.52 99.25

1 21 0.990 0.986 74 85 98.98 98.53

tant reduction of the cross-validation and prediction errors of methanol concentration. 3.2.2. PLS models development The results presented in Table 7 reinforce the importance of the pre-treatment in the performance of the calibration model for methanol. Unlike PCR, the PLS regression takes the y vector (methanol or water content) into account when developing the models, leading to the reduction of the number of latent variables that are needed to describe the model when compared to PCR. Different behaviours were also observed for the different data pre-treatment methods, with the untreated and mean cen-

tred data yielding the worst results. The PLS models derived after the pre-treatment of the data with MSC and SNV followed by mean centring gave reasonable results with low RMSECV and RMSEP values, although six latent variables were necessary to develop these models. The models obtained using the first and second order Savitsky–Golay filters followed by a mean centring gave similar values with only four latent variables. As presented before for PCR, the use of the orthogonal signal correction prior to PLS allowed a significant decrease of the number of latent variables. Concerning the cross-validation and external validation errors, very good results were obtained using the first order Savitsky–Golay filters followed by a mean cen-

Table 8 Variance captured by the best PCR models for the prediction of methanol concentration (the selected number of LV for the model is highlighted) LV

Using pre-processing SV1 + MC X

1 2 3 4 5 6 7 8 9 10

Using pre-processing SV1 + MC + OSC y

X

y

Var./LV

Cum.

Var./LV

Cum.

Var./LV

Cum.

Var./LV

Cum.

71.40 5.07 20.93 0.65 1.12 0.29 0.31 0.04 0.03 0.03

71.40 76.47 97.40 98.05 99.17 99.46 99.76 99.81 99.84 99.87

11.40 66.80 13.65 6.70 0.34 0.36 0.18 0.21 0.15 0.04

11.40 78.20 91.85 98.55 98.89 99.24 99.43 99.64 99.78 99.83

73.80 7.12 10.92 2.64 3.10 0.38 0.46 0.22 0.24 0.21

73.80 80.92 91.84 94.48 97.58 97.96 98.42 98.65 98.89 99.11

99.80 0.06 0.00 0.01 0.00 0.01 0.01 0.02 0.02 0.01

99.80 99.87 99.87 99.87 99.88 99.89 99.89 99.91 99.92 99.93

P. Felizardo et al. / Analytica Chimica Acta 595 (2007) 107–113

113

water. The results indicate that, when compared with the first and second order Savitsky–Golay derivative followed by mean centering alone, the combination of these pre-processing methods together with the orthogonal signal correction, prior to PLS and PCR regressions, allowed to decrease the number of latent variables. The effect of a pre-processing like OSC is particularly important in the development of PCR models and for methanol calibration. For the latter, the use of orthogonal signal correction prior to PCR and PLS regressions allowed the reduction of the number of latent variables from four to one and of the errors of prediction from ≈100 to 60 mg kg−1 . Acknowledgements

Fig. 2. PLS regression and external validation for the methanol content in biodiesel using the pre-processing SV1 + MC + OSC. (x) calibration set; () validation set.

tering and OSC. In fact, the model uses only 1 LV and allows to obtain errors of only 61 mg kg−1 in the prediction of methanol concentration in biodiesel. This value is comparable to the error of 50 mg kg−1 mentioned in the European Standard 14110, for the determination of methanol by gas chromatography [22]. Table 8 presents the variance captured by the latent variables of two PLS models using the first order derivative Savitsky–Golay filter and mean centered data with and without the orthogonal signal correction. Table 8 confirms the importance of using the orthogonal signal correction as pre-processing before developing the calibration for methanol. In fact, without OSC, the variance of the spectra (X) captured by the first LV is poorly related to the variance of the methanol concentration, whereas for the second LV a small variation in the spectra (X) represents an important variation in methanol concentration (y). On the contrary, using the OSC pre-processing, the first LV captured a variance of 73.80% of the spectra and 99.80% of the methanol concentration variance. Fig. 2 presents the performance of the PLS model, after SV1 + MC + OSC pre-processing of the data, developed for the prediction of methanol concentration in real industrial and labscale biodiesel samples. As seen in this figure, the model has a very good predictive ability for a wide range of methanol concentrations in biodiesel. 4. Conclusions The aim of this work was to analyse the effect of the preprocessing technique used prior to the application of partial least squares and principal components regressions in the quality of the calibration models developed to relate the near infrared spectra of a sample of biodiesel and its content of methanol and

Thanks are due to Iberol and Space for supplying industrial samples of biodiesel and oils for the lab-scale prepared samples. Pedro Felizardo would also like to thank Funda¸ca˜ o para a Ciˆencia e Tecnologia (SFRH/BDE/15566/2005) and Space for his Ph.D. financial support. References [1] Y. Ulusoy, Y. Tekin, M. Cetinkaya, F. Karaosmanoglu, Energy Sources 26 (2004) 927–932. [2] M.P. Dorado, E. Ballesteros, J.M. Arnal, J. G´omez, F.J. L´opez, Fuel 82 (2003) 1311–1315. [3] P. Felizardo, M.J. Neiva Correia, I. Raposo, J.F. Mendes, R. Berkemeier, J.M. Bordado, Waste Manage. 26 (5) (2006) 487–494. [4] European Standard EN 14214, CEN – European Committee for Standardization, Brussels, Belgium, 2003. [5] Y. Zhang, M.A. Dub´e, D.D. McLean, M. Kates, Bioresour. Technol. 89 (2003) 1–16. [6] G. Knothe, J. Am. Oil Chem. Soc. 76 (7) (1999) 795–800. [7] G. Knothe, J. Am. Oil Chem. Soc. 77 (5) (2000) 489–493. [8] G. Knothe, J. Am. Oil Chem. Soc. 83 (10) (2006) 823–833. [9] P. Felizardo, P. Baptista, M.S. Uva, J.C. Menezes, M.J.N. Correia, J. Near Infrared Spectrosc. 15 (2007) 97–105. [10] A.P. Ferreira, T.P. Alves, J.C. Menezes, Biotechnol. Bioeng. 91 (2005) 474–481. [11] T. Næs, T. Isaksson, T. Fearn, T. Davies, A User Friendly Guide to Multicariate Calibration and Classification, Nir Publications, Chichester, 2002. [12] H. Martens, T. Næs, Multivariate Calibration, John Wiley & Sons, New York, USA, 1991. [13] M. Otto, Chemometrics: Statistics and ComputerApplication in Analytical Chemistry, Wiley-VCH, New York, 1999. [14] European Standard EN 12937, CEN – European Committee for Standardization, Brussels, Belgium, 2000. [15] European Standard EN 14110, CEN – European Committee for Standardization, Brussels, Belgium, 2003. [16] B. Li, J. Morris, E.B. Martin, Chemom. Intell. Lab. Syst. 64 (2002) 79–89. [17] M.C. Denham, J. Chemom. 14 (2000) 351–361. [18] Vitor V. Lopes, Jos´e C. Menezes, Chemom. Intell. Lab. Syst. 78 (2005) 1–10. [19] Q.S. Xu, Y.Z. Liang, Chemom. Intell. Lab. Syst. 56 (2001) 1–11. [20] X. Capron, B. Walczak, O.E. deNoord, D.L. Massart, J. Chemometrics 19 (2005) 308–316. [21] N.M. Faber, R. Rajk´o, Spectrosc. Eur. 18 (2006) 24–28. [22] European Standard EN 14110, Fat and Oil Derivatives – Fatty Acid Methyl Esters (FAME) – Determination of Methanol Content, CEN – European Committee for Standardization, Brussels, Belgium, 2003.