Analytica Chimica Acta 642 (2009) 6–11
Contents lists available at ScienceDirect
Analytica Chimica Acta journal homepage: www.elsevier.com/locate/aca
Contribution of external parameter orthogonalisation for calibration transfer in short waves—Near infrared spectroscopy application to gasoline quality S. Amat-Tosello a,∗ , N. Dupuy b , J. Kister b a b
SP3H, Domaine du Petit Arbois, Bât Laënnec, 13545 Aix-en-Provence Cedex 4, France UMR CNRS 6263 ISM2 , Equipe AD2 EM, groupe Systèmes Chimiques Complexes, case 451, Université Paul Cézanne, 13397 Marseille Cedex 20, France
a r t i c l e
i n f o
Article history: Received 5 August 2008 Received in revised form 23 December 2008 Accepted 5 January 2009 Available online 9 January 2009 Keywords: Calibration transfer Near infrared External parameter orthogonalisation Gasoline Octane number Research Octane Number Motor Octane Number
a b s t r a c t The octane number rating of a gasoline gives an indication of the gasoline performances, under various engine conditions. Two different ratings are included: Research Octane Number (RON) and Motor Octane Number (MON). The standard laboratory method for octane number determination is the knock engine method in which a gasoline is burned and its combustion characteristics compared to known standards. This method is time consuming and labor intensive, and provides no ability for real time control of production. NIR can be applied in real time directly in process monitoring or as a laboratory procedure. Near infrared spectra of gasoline samples were collected thanks to four different short wavelengths near infrared analysers, built with strictly the same technology. The aim of this study was to transfer the calibration built on one spectrometer to the other ones. We applied the external parameter orthogonalisation (EPO) correction to get rid of the apparatus influence on information contained in spectra. By this method, we managed to improve prediction values of two major gasolines’ properties, i.e. Research and Motor Octane Number. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Engine fuels specifications are getting more and more stringent all over the world. In the European Union for example, beside the reduction of their maximum sulfur content from 50 ppm since 2005 to 10 ppm in 2009, their total aromatics content is also being reduced from 42% (v/v) to 35% (v/v) in gasolines, which results in lower emission of regulated pollutants in the exhaust gas [1–6]. In an industrial point of view, there are a number of fuels properties that must be analysed before the products can be commercialized, either due to vehicles motors specifications or environmental requirements. Concerning gasolines, throughout the last years, directives have been approved about the limits of the Research Octane Number (RON) [7], Motor Octane Number (MON) [8] and benzene. Octane number is a fuel-performance property of gasoline that indicates the resistance of a motor fuel to knock. These methods are usually time consuming, require relatively large samples, and are not suitable for on-line and in situ operation. Indeed, they usually require laboratory analysis by skilled technicians. For example, RON and MON measurements are carried out by a knocking engine and could
∗ Corresponding author. E-mail addresses:
[email protected] (S. Amat-Tosello),
[email protected] (N. Dupuy),
[email protected] (J. Kister). 0003-2670/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2009.01.003
lead to error procedure that might take half an hour or longer. Other non-spectrometric methods are also available. These methods, which are based on combustion measurements, have been reviewed by Clevett [9]. They require smaller samples, but are still time consuming. Multivariate techniques combined with vibrational spectroscopic data such as NIR [10–12], FTIR [13] and FT-Raman [14] spectra have already provided reliable predictions of many physical properties of petroleum products. The NIR region of the electromagnetic spectrum (800–2500 nm) is related to overtone and combination bands of molecular vibrations. In addition, there is a reference in the literature to the use of NIR as a quality control tool for the analysis of raw ingredients during the production of rocket propellant [3]. It was therefore considered that NIR spectroscopy had the potential to offer a solution to this quality control issue. In this paper we analysed gasoline provided in a little area around the first French refining site of Fos-sur-mer. NIR has been found to be a suitable method for non-destructive evaluation of numerous analytes in many matrices including moisture in powdered milk and aromatics in Diesel fuel. In NIR spectroscopy, NIR radiation is guided into the product, and some of the backscattered radiation is captured and related to variables of interest via multivariate statistical techniques. Because of their high acquisition speed, diode array systems are more suitable than Fourier Transform instruments to be mounted on high speed grading. The calibration model developed on one instrument
S. Amat-Tosello et al. / Analytica Chimica Acta 642 (2009) 6–11
cannot be directly used on another one, even between two devices of the same type, which is a significant limitation of this technique. This seriously hampers widespread application as this would mean that the calibration model has to be constructed again for every spectrophotometer. As other authors have pointed out before [4–6], there are three main causes which introduce variation in newly recorded spectra that have not been considered in the calibration of the equipment: (i) changes in physical and/or chemical compounds of the samples, (ii) changes in the instrumental response function (different instrument, ageing of sources, replacement of some parts, etc.) and (iii) changes in the environment of the instrument over time (temperature, humidity). Some approaches are also used to solve the transfer problems that eliminate the need of standardization. Robust calibrations models that are not sensitive to instrumental responses can be made by using appropriate pre-treatment techniques (derivates, multiplicative scatter correction (MSC) [15], orthogonal signal correction (OSC) [16], EPO (external projection orthogonalisation) [17], etc.), or including several instruments in the calibration set [18]. The aim of this study was to test the ability of EPO to transfer the PLS calibration obtained on physico-chemical parameter of gasoline between four diode array spectrophotometers. One primary spectrometer was used to calibrate data and the three others, named the secondary ones, were used to predict RON and MON values. Two different methodologies were tested. First, a global EPO correction was applied on all spectrometers to remove information relative to the apparatus in spectral data. Then, EPO correction was achieved for each of the secondary spectrometers compared to the primary one. 2. Experimental 2.1. Sampling Samples have been shared into three groups, thus three sets of measurements were obtained. The first set, Set A, was used to characterise the differences between analysers. We have chosen five samples whose physicochemical properties are quite different the ones from the others: an unleaded commercial gasoline, a commercial Diesel, n-heptane, cyclohexane and toluene. We decided to focus in this study on gasolines to determine if gasoline properties values, predicted by our four NIR spectrometers, can be improved by an orthogonal projection method. The second set, Set B, was used to calibrate the partial least squares (PLS) models for RON and MON. It is constituted by 49 samples, in which we placed a half of unleaded gasolines, and a half of blends from unleaded gasolines and ether (methyltertiobutylether or ethyltertiobutylether) or ethanol absolute. The final set, Set C, was used as a prediction set, to test the performances of PLS models. It gathered 13 commercial gasolines samples, different from those used in the calibration matrix, collected in different brand gas stations, and presenting a high dispersion of physico-chemical properties. Each sample was collected on all near infrared spectrometers. 2.2. Gasoline properties In this study, we focused on two important gasoline properties: RON [7] and MON [8]. The octane number is used to evaluate the resistance of a gasoline to knock. It is determined in a specific engine, called Cooperative Fuel Research (CFR) engine, whose compression ratio is variable. First, the compression ratio of the engine supplied with the fuel to evaluate is determined, until the beginning knock is observed. Then, without modifying the compression ratio,
7
the engine is supplied with a binary mixture iso-octane/n-heptane, and we vary the composition of iso-octane until the engine knock. The octane number is defined as the isooctane rate of the binary mixture which causes the appearance of this knock. Two operating conditions of the engine can be chosen, which correspond to two octane numbers: the RON measured at 600 rpm, and the MON measured at 900 rpm. Samples used cover a range from 79 to 102.5 for RON values and from 73 to 90.5 for MON values. The measurements errors, with the reference methods, are 0.2 on repeatability for RON and MON and, respectively 0.7 and 0.9 for RON and MON on reproducibility. 2.3. Near infrared spectrometers Four portable near infrared fuel analysers equipped with NIR emitting diodes, a detector and a fully integrated microprocessor were used. The absorbance is measured between 604 and 1045 nm, corresponding to 32 short wavelengths (SW), with a resolution about 2.5 nm. Spectra were collected and calculated thanks to our own specific software called Hydrocarbons Profiler [19–21], suitable for different NIR spectrometers. Samples were filled into a 200 mL jar made of ordinary glass. A background signal from the empty chamber was measured before collecting each spectrum, obtained in only 1 min. 2.4. Standard normal variate (SNV) The standard normal variate transform (SNV) was originally invented to reduce spectral noise and eliminate background effects of NIR data. From NIR technique non-specific scattering of radiation at the surface of particles, variable spectral path length through the sample and chemical composition of the sample typically cause baseline shifting or tilting. The influence is greater at longer wavelengths. Such multiplicative interference of scatter and particle size can be eliminated or minimized by applying a standard normal variate correction. Standard normal variate [22] algorithm is designed to work on individual sample spectra. The transformation centres each spectrum and then scales it by its own standard deviation (Eq. (1)): Aij (SNV) =
Aij − x¯ i SDev
(1)
where: i = spectrum counter; j = absorbance value counter of ith spectrum; Aij (SNV) = corrected absorbance value; Aij = measured absorbance value; xi = is the mean absorbance value of the uncorrected ith spectrum; SDev = standard deviation of the absorbance values of ith spectrum. Spectra treated have always zero mean value and variance equal to one and are thus independent of original absorbance values. 2.5. Principal component analysis (PCA) Principal component analysis (PCA) is an “unsupervised” method describing dataset without a priori knowledge of the data structure [23]. The procedure establishes a linear spectral model which allows converting original and correlated variables (absorbance) into uncorrelated variables called principal components or loading. These latent variables contain the main information and have been calculated from differences and similarities of spectra. Also, it reduces data without a significant loss of information: generally, a small number of principal components is sufficient to resume the available spectral information. Then every oil spectrum can be considered as a sum of principal components weighted by score. The representation of these components versus wavelength appears as spectral profile (spectral decomposition
8
S. Amat-Tosello et al. / Analytica Chimica Acta 642 (2009) 6–11
model) which has meaning for a spectroscopy user. PCA is oriented towards modelling the variance/covariance structure of the data matrix into a model which is based on the significant spectral differences (significant scores [24]) and considers noise as an error. The number of principal components depends on the model complexity but loadings have to represent the best of variance of spectral data and explain the total variance with a large percentage. Generally, the first component extracts the largest source of variance and the last one have above the random noise. 2.6. Partial least square regression (PLS) Partial least square regression was based on the relation between the signal intensity and the modification of the sample [25]. Interference and overlapping of the spectral information may be overcome by using powerful multicomponent analysis such as PLS [26,27]. This method allows a sophisticated statistical approach using the full or partial spectral region rather than unique and isolated analytical bands. The algorithm is based on the ability to mathematically correlate spectral data to a property matrix of interest while simultaneously accounting for all other significant spectral factors that perturb the spectrum. It is thus a multivariate regression method that uses the full spectral region selected and is based on the use of latent variables [28]. The model was built by block cross-validation method during the calibration developments. The evaluation of the errors in the calibration is estimated by computing the root mean square error of calibration (RMSEC) after comparing the real value with the computed one for each component. The formula for the standard error of calibration is
N 2 (Ci − C i) i=1 RMSEC = N−1−p
(2)
M
(3)
Ci is the known value, C i is the value calculated by the calibration equation, and M is the number of samples in the prediction set. The predictive ability of the model should be also expressed by the bias and the square of correlation coefficient (R2 ) also called determination coefficient. As the RMSEP is a standard deviation, the (RMSEP)2 can be considered as a variance and in a first approximation, can be analysed as such. The Fisher–Snedecor test enables to determine whether an RMSEP is significantly better than another RMSEP. If (RMSEP)20 is the smallest (RMSEP)2 , every RMSEP so that F=
(RMSEP)2 (RMSEP)20
External parameter orthogonalisation is a “specific” orthogonal projection method, dedicated to a given influence factor G [17,30]. This factor G represents a variability which was independent of the variable Y of interest. In our case, we have two factors G. The first one is the variability introduced by the use of four different near infrared spectrometers and the second one is the temperature of samples, which cannot be measured on-line. Four steps are needed: (i) The mean spectrum corresponding to m different samples is measured for q different levels in regard to the influence factor G. (ii) These q spectra are centred and affected in a D matrix. (iii) Performing a PCA on D, a base P, representing the q points, is identified as the k first components, with k < q. (iv) The calibration database is corrected using orthogonal projection to P: X˜ 0 = X0 (I − PPT )
(5)
The correction rank k corresponds to the dimension of P, i.e. the number of principal components chosen to describe D. X0 is the initial data matrix, built with all spectra of samples after an SNV pre-processing, and X˜ 0 represents the data matrix once the orthogonal projection applied. The chemometric applications were performed by The Unscrambler® 9.6 from CAMO (Computer Aided MOdelling, Trondheim, Norway) and by Matlab 7.0. 3. Results and discussion 3.1. Classification results
where Ci is the known value, C i is the calculated value, N the number of samples and p is the number of independent variables in the regression optimized by cross-validation. The root mean square error of prediction (RMSEP) gives an estimation of the prediction performance during the step of validation of the calibration equation:
M 2 (Ci − C i) i=1 RMSEP =
2.7. External parameter orthogonalisation (EPO)
(4)
where Fc is the critical value provided by the Fisher–Snedecor tables for the number of degrees of freedom of the RMSEP, is not statistically different (percent of risk) from the (RMSEP)0 . Fc depends on this percentage of risk, generally set to 5%. This test of variance allows a comparison between the predictive qualities of an n factor model and an (n + 1) factor model [29].
A PCA was performed on the whole raw NIR spectra from Set A of samples. Those spectra correspond to five different samples: an unleaded gasoline, a commercial Diesel and three pure products (heptane, cyclohexane and toluene) which can be present in fuels. Each sample was analysed 19 times on each NIR spectrometer, filled in the same jar, but at different temperatures, ranging from 10 to 30 ◦ C. We can clearly see in Fig. 1 five groups relative to the five samples considered. Moreover, each group is itself divided into four subgroups, according to the apparatus used. In those sub-groups the 19 replicates are located. The differences between scores values, and thus between the different spectrometers appear easily here, thanks to 96% of the total variance with PC1 and PC2. We decided to apply the EPO method to determine if this influence factor (here, the NIR spectrometer used) could be eliminated or at least significantly reduced. We considered at the beginning the X0 matrix, which gathered all spectra of all samples on all spectrometers. A mean spectrum of all raw spectra on each spectrometer was calculated. A PCA was then performed on this D matrix, built with the four mean spectra. Values of the loadings of the first principal components formed a P matrix. We applied the EPO correction and proceeded to a new PCA on the new matrix obtained. The result is represented in Fig. 2. The total variance explained with the two first principal components was 95%. We can observe on this representation that groups relative to each kind of sample still exist. Nevertheless, the sub-groups corresponding to each instrument are closer together, which demonstrates the efficiency of the EPO method to reduce the influence of the apparatus used on NIR spectra obtained. As the difference between the spectrometers was always visible and the spectrometers dedicated to analyse gasolines, we have tried
S. Amat-Tosello et al. / Analytica Chimica Acta 642 (2009) 6–11
9
Fig. 1. PCA result on raw NIR spectra (Set A).
Fig. 2. PCA result after global EPO correction (Set A).
to apply the EPO correction only on gasoline spectra. This could fit better to the future industrial application. Hereafter are the NIR spectra of one unleaded gasoline, collected on the four SW-NIR spectrometers (Fig. 3) and the spectra obtained after application of EPO correction (Fig. 4). Spectra profiles do not look like classical NIR ones because they are not continuous but built with 32 points, relative to the 32 diodes corresponding to the 32 wavelengths. Differences of the absorbance appearing in Fig. 3, relative to the instrument used, are very largely removed, the different new spectra being almost totally similar. Fig. 5 shows the PCA performed after this EPO correction, the total variance explained with the two first principal components was 97%. In this case, it clearly appeared that the apparatus influence was removed efficiently concerning
Fig. 3. NIR spectra of one gasoline sample, on the four analysers.
gasolines replicates, and offered good results on the other samples (toluene, Diesel, n-heptane and cyclohexane), as sub-groups relative to each analyser were much more closer than those obtained after the correction built on all samples, gasolines and others. 3.2. Prediction results As we noted that EPO allowed to lead to a better distribution of samples, once removed the influence of the instrument used, we evaluated the effect of this orthogonal correction on values predicted by PLS regression. We chose to test this correction on gasoline properties, using the EPO correction done on the gasoline samples. The maximum error on RON and MON values that
Fig. 4. NIR spectra of the same gasoline sample, after EPO correction on gasoline spectra, on the four analysers.
10
S. Amat-Tosello et al. / Analytica Chimica Acta 642 (2009) 6–11
Fig. 5. PCA result after EPO correction, only on gasoline samples.
can be acceptable for our systems was 1.5. We had to proceed to a PLS calibration transfer on physico-chemical parameters of gasoline obtained between those different devices. We began by choosing a primary spectrometer, which was our reference apparatus. The choice was analyser 2, as much more samples were analysed on it, which conduced to a better calibration. So, analysers 1, 3 and 4 became secondary’ spectrometers and we tried to transfer models established with the primary on them. The first way tested was a global transfer. The primary spectrometer was calibrated and we defined models for RON and MON (see Fig. 6). Then, we predicted those properties on all spectrometers, taking into account a global correction relative to all samples with all prototypes at the same time. The second way tested was a two by two transfer. The primary calibration was done the same way as the first time. The RMSEC values are very close so we could conclude that the EPO correction has no significant influence on the calibration quality. But, properties values were predicted on the secondary with a correction defined only between the primary and each of the secondary’ ones, it means primary with analyser 1, primary with analyser 3 and primary with analyser 4. Fig. 7 and Fig. 8 give a comparative view of the results obtained without correction, the ones obtained after global EPO correction and the ones obtained after two by two EPO correction. Obviously, this two by two correction gave the best results. We compared the results of the prediction on the RON and MON values on the four analysers, with and without the global EPO correction. This correction improved the prediction results. In this study, the RMSEP has 12 degrees of freedom. The critical value Fc
Fig. 7. RON values predicted before and after EPO correction.
Fig. 8. MON values predicted before and after EPO correction.
(12:12) given by the Fisher–Snedecor tables for 5% risk is 2.69. It is considered that the RMSEP is verifying F < 2.69 were statistically equivalent to these values. For the global EPO correction, the calculation of F (Table 1) and their comparison with the critical value Table 1 F values for RON and MON predictions (Fc = 2.69).
Fig. 6. Calibration results for RON and MON values obtained on the master spectrometer for each EPO correction.
RON F after global EPO correction F after two by two EPO correction MON F after global EPO correction F after two by two EPO correction
Analyser 1
Analyser 2
Analyser 3
Analyser 4
0.24
0.57
0.55
0.26
0.06
0.48
0.41
0.75
0.21
0.62
0.27
0.14
0.17
0.59
0.18
0.89
S. Amat-Tosello et al. / Analytica Chimica Acta 642 (2009) 6–11
led to the following result: the model built with the global EPO correction improved significantly the results for all spectrometers. Nevertheless, an error of prediction higher than 1.5 for analysers 1 and 3 on the RON values is remaining. We then proceeded to the two by two transfer, and compared results obtained without correction, after a global EPO correction and after this new two by two correction. For the two by two EPO correction, the calculation of F (Table 1) and their comparison with the critical value led to the following result: the model built with the two by two EPO correction improved significantly the results for secondary spectrometers 1 and 3. In this last case, the maximum error of 1.5 for RON and MON values was respected, excepted for the analyser 3, which would have been dismissed in a quality control context. 4. Conclusions We used four different NIR spectrometers, which have been developed the same way, strictly with the same technology, equipped with NIR emitting diodes covering a range of 32 wavelengths. Variations of the absorbance measured on each spectrum were observed. Once corrected by the EPO method, we obtained a better classification of samples, according to their nature and not relative to the instrument used for the analysis. Properties values predicted by a PLS regression before and after EPO correction lead us to notice an improvement of the prediction thanks to this correction, using two different ways of transfer, a global and a two by two one. Moreover, the best results were obtained by the two by two transfer, on all analysers. Such an improvement in those gasolines’s properties prediction leads to a largest study, concerning all physico-chemical properties of fuels, on different types of near infrared spectrometers.
11
References [1] G.F. Asselin, Isomerization of paraffins, Prepr. Pap. Am. Chem. Soc. Div. Pet. Chem. 17 (1972) B4–B18. [2] S. Chunshan, An overview of new approaches to deep desulfurization for ultraclean gasoline, diesel fuel and jet fuel, Catal. Today 86 (2003) 211. [3] E.S. David, Octane and the environment, Sci. Total Environ. 299 (2002) 37. [4] M.V. Twigg, Appl. Catal. B-Environ. 70 (2007) 2. [5] Z. Theodoros, Energy Policy 34 (2006) 1773. [6] J.P. William, Fuel Process. Technol. 71 (2001) 167. [7] N. Pasadakis, V. Gaganis, C. Foteinopoulos, Fuel Process. Technol. 87 (2006) 505. [8] D. Bradley, R.A. Head, Combust. Flame 147 (2006) 171. [9] K.J. Clevett, Process Analyzer Technology, Wiley, New York, 1986, p. 659. [10] Z. Sikora, W. Salacki, Fuel Energy 39 (1998) 98. [11] Y. Lee, H. Chung, N. Kim, Appl. Spectrosc. 60 (2006) 892. [12] D. Ozdemir, Petrol. Sci. Technol. 23 (2005) 1139. [13] C. Felicio, L. Bras, J. Lopes, L. Cabrita, J. Menezes, Chem. Intel. Lab. Sys. 78 (2005) 74. [14] J. Cooper, Chem. Intel. Lab. Sys. 46 (1999) 231. [15] J. Gabrielsson, J. Trygg, Crit. Rev. Anal. Chem. 36 (2006) 243. [16] S. Wold, H. Antti, F. Lindgren, J. Ohman, Chem. Intel. Lab. Sys. 44 (1998) 175. [17] J.M. Roger, F. Chauchard, V. Bellon-Maurel, Chem. Intel. Lab. Sys. 66 (2003) 191. [18] P. Tillmannn, T.-C. Reinhardt, C. Paul, J. Near Infrared Spec. 8 (2000) 101. [19] A. Lunati, J. Fournel, Patent WO/2006/100377 (2006). [20] A. Lunati, DEER, Detroit, USA (2007) http://www1.eere.energy.gov/ vehiclesandfuels/pdfs/deer 2007/session8/deer07 lunati.pdf. [21] J. Fournel, DEER Detroit, USA (2008) http://www1.eere.energy.gov/ vehiclesandfuels/pdfs/deer 2008/session8/deer08 fournel.pdf. [22] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Appl. Spectrosc. 43 (5) (1989) 737. [23] H. Martens, T. Naes, Multivariate Calibration, John Wiley and Sons, New York, 1989. [24] S. Millar, P. Robert, M.F. Devaux, R.C.E. Guy, P. Maris, Appl. Spectrosc. 9 (1996) 1134. [25] H. Martens, Anal. Chim. Acta 112 (1979) 423. [26] M. Fuller, P.R. Griffiths, Anal. Chem. 50 (1978) 1906. [27] Y.-L. Liang, O.M. Kvalheim, Chem. Intel. Lab. Sys. 32 (1996) 1. [28] D.M. Haaland, E.V. Thomas, Anal. Chem. 60 (1988) 1193. [29] C. Bauer, B. Amram, M. Agnely, D. Charmot, J. Sawatzki, N. Dupuy, J.P. Huvenne, Appl. Spectrosc. 54 (2000) 528. [30] S. Preys, J.M. Roger, J.C. Boulet, Chem. Intel. Lab. Sys. (2007).