Analytica Chimica Acta 537 (2005) 349–358
First order Rayleigh scatter as a separate component in the decomposition of fluorescence landscapes ˚ Asmund Rinnan a, ∗ , Karl S. Booksh b , Rasmus Bro a b
a Quality and Technology, Department of Food Science, Royal Veterinary and Agricultural University, Rolighedsvej 30, DK-1958 Frederiksberg C, Denmark Department of Chemistry and Biochemistry, Arizona State University, MC-1604, Tempe, AZ 85287, USA
Received 28 October 2004; received in revised form 20 January 2005; accepted 20 January 2005 Available online 17 February 2005
Abstract The application of chemometric tools to analyze multi-way data has become popular, especially for excitation–emission matrix (EEM) fluorescence spectroscopy where PARAFAC often is employed to resolve the pure component profiles of mixtures of fluorophores. There are, however, some features in a typical EEM that do not follow the required low-rank tri-linear structure required for PARAFAC to work optimally. The most significant of these features is the light scatter effects that form diagonal lines in the landscapes. These cannot be modeled by one (or a few) factor(s) in the decomposition step, and has so far been removed by subtracting a standard, inserting missing values, or weighting these areas down. This paper suggests a novel method, which models the first order Rayleigh scatter as a separate set of factor(s) in the decomposition step by shifting the Rayleigh scatter into becoming low-rank bi-linear. This method is easier to implement than current methods and provides good results; both requirements for increasing the use of fluorescence spectroscopy by the non-expert user. © 2005 Elsevier B.V. All rights reserved. Keywords: EEM; Scatter model; Multi-way; Fluorescence spectroscopy; PARAFAC; PCA
1. Introduction During recent years, the use of fluorescence spectroscopy has increasingly focused on total luminescence spectra, or excitation–emission matrices (EEM). The use of chemometrics and more specifically parallel factor analysis (PARAFAC) [1–3] has become recognized as a good and reliable tool for extracting chemical information from EEM spectra. By using PARAFAC to extract more information from the collected data, it is of vital importance that the data is low-rank tri-linear. This is the case for the behavior of all fluorophores in a sample, as long as no quenching or inner-filter effects are present. However, light scatter effects in fluorescence do not conform to the required tri-linearity. Instead ∗ Corresponding author. Present address: FOSS A/S, Product Management, Slangerupgade 69, P.O. Box 260, DK-3400 Hillerød, Denmark. Tel.: +45 48208563. ˚ Rinnan). E-mail address:
[email protected] (A.
0003-2670/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2005.01.044
these spectral features lie on a diagonal of the EMM landscape. The light scattering effects are called Rayleigh (first and second order are most common) and Raman scatter. The first order Rayleigh scatter line is centered at the emission equals excitation line, second order Rayleigh scatter at the emission equals twice the excitation, and the Raman is at a certain energy difference from the first order Rayleigh scatter line. This energy difference is dependent upon the solvent of the sample. First order Rayleigh scatter is the most intense of the light scattering, and therefore has a higher influence on the PARAFAC model. If the signal from the fluorophores themselves lie away from these scatter lines, the scatter can simply be omitted from the data passed to the PARAFAC algorithm. However, e.g. in many samples of biological origin, the signal from the fluorophores of interest lies close to, or on top of, one or more of the scatter effects. If these scatter effects are not accounted for in the modeling process they may lead to bias in the decomposition of these spectra. Several ways
350
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
of handling the scatter effects have been proposed in literature: subtraction of a standard [4–6], inserting missing values [7–10], constraints in the PARAFAC decomposition [11,12], weighted PARAFAC [13–15] and replacing the scatter areas by three-dimensional Delaunay interpolation [16]. None of these methods though, have been generally adopted and no consensus exists as to which to prefer in a given situation. In this paper a novel way of treating the first order Rayleigh scatter is proposed, namely modeling it as a separate component in the decomposition step. This is done through reorganization of the data so that the first order Rayleigh scatter line is low-rank bi-linear. The following bi-linear Rayleigh scatter is either modeled by PCA (one model per sample) or PARAFAC (one model for all samples).
2. Experimental 2.1. Data sets For analyzing the ability of the new method, three different fluorescent data sets are used. 2.1.1. Data set 1 Fifteen mixtures of five fluorophores were recorded, with three or four fluorophores present in each sample. The spectra of these fluorophores are highly overlapping. The fluorophores are: catechol, hydroquinone, indole, tryptophane and tyrosine. All samples are a mixture of between two and four fluorophores. Every sample was measured with an excitation range of 230–320 nm (every 5 nm) and an emission range of 230–500 nm (every 2 nm). The instrumental setup and design of the data is explained in more detail by Rinnan and Andersen [15]. 2.1.2. Data set 2 This data set contains EEMs of 16 samples containing different concentrations of four fluorophores with rather similar spectral properties [17]. The four compounds are: 3,4-dihydroxyphenylalanine (DOPA), 1,4dihydroxybenzene (hydroquinone), phenylalanine and tryptophan. They were measured in the excitation range 200–315 nm (every 5 nm) and emission range 241–481 nm (every nm). 2.1.3. Data set 3 Sugar samples taken from the final unit operation from a sugar plant in Scandinavia were obtained as described by Bro [12]. In total 268 samples were obtained. They were dissolved in un-buffered water (2.25 g/15mL). Fluorescence was measured on a Perkin Elmer LS50 B fluorescence spectrofluorometer with emission ranging from 275 to 560 nm (recorded every 0.5 nm). Seven excitation wavelengths were used. These were 230, 240, 255, 290, 305, 325 and 340 nm.
2.2. Reduction of original data The goal of this work was to enable reliable PARAFAC calibration models that are resistant to the effects of first order Rayleigh scatter interferences. Therefore, the parts of the fluorescence landscape that was affected by second order Rayleigh scatter, or other artifacts were removed prior to any analysis. Little fluorescence information was lost because no significant overlap with the second order Rayleigh scattering occurred. Data set 1 was reduced from 15 × 136 × 19 (samples × emission × excitation) down to 15 × 106 × 19, by removing the 30 last emission wavelengths due the presence of second order Rayleigh scatter. Data set 2 was reduced from 16 × 243 × 24 down to 16 × 101 × 18, the six first excitation wavelengths due to high noise-level, and the 20 last emission wavelengths due to the presence of second order Rayleigh scatter. It was further reduced by removing every second emission wavelength. Data set 3 was reduced from 268 × 571 × 7 down to 194 × 165 × 7. The last 242 points was removed due to the presence of second order Rayleigh. The data set was further reduced by removing every second column, indexed by emission wavelength, from all EEM profiles. Seventy one samples were removed because of intensities reaching max in the scatter areas, and three samples were removed as outliers. The algorithm to solve the problem of saturation of the detector in the scatter area has later been included to the algorithm used in this work and is available at http://www.model. kvl.dk/source [18], see also under “Software”. 2.3. Constraints For all data sets, both models without constraints and models with non-negativity constraints were investigated. Fluorescence spectra can readily be decomposed by PARAFAC, and an unconstrained solution is often also in accordance with the physical/chemical premises of the data. Sometimes it may, however, be valuable to enforce some constraints on the PARAFAC solution. In the data used in this investigation, there were no missing values, and thus no danger of artifacts in the areas with missing values. Constraints such as non-negativity [19,20] can be used in order to ensure that estimated concentrations and spectra are strictly positive. The effect of non-negativity constraints in all modes in the PARAFAC model was tested together with the parameters mentioned above. 2.4. Methods to model Rayleigh Three different techniques of modeling the first order Rayleigh scatter have been investigated: 1. Shifting the different emission spectra according to the excitation wavelength, cutting off the part away from the Rayleigh scatter line, making the Rayleigh scatter lowrank bi-linear.
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
2. Rotating the spectra into a new coordinate system such that the Rayleigh scatter makes a line in the landscape and not a diagonal. This was done by the use of a rotation matrix for each sample. Linear interpolation between measured points was employed to find the values in the new coordinate system.
351
3. Modeling the first order Rayleigh peak at each emission spectrum for each sample by a Gauss-Lorentz curve fitting. However, the two latter methods did not work satisfactory and will thus not be mentioned further in this paper.
Fig. 1. The EEM of sample 1 from data set 1 seen as 19 emission spectra, before (a) and after (b) the shifting procedure. The grey area in (b) is the area which is removed prior to modeling the Rayleigh by PARAFAC or PCA.
352
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
The Rayleigh scatter is modeled by either PCA or PARAFAC. By using PCA on each individual EEM, the Rayleigh scatter of the different samples are allowed to have different shapes, while by the use of PARAFAC on the total three-way array, all the samples are approximated as having similar shapes of the Rayleigh scatter. The estimated Rayleigh model is then reshaped into the original data matrix and subtracted from the original data. In the total procedure (described below), the estimation of the scatter peaks is done on the residuals of the PARAFAC model of the chemical data. 2.5. Shifting the EEM For the shifting method (technique number 1 above) to work it is essential that the uncertainty in the excitation
and the emission wavelengths are negligible. The uncertainty mentioned is the uncertainty in the calibration of the excitation and emission axis in the instrument. This is because this method is based on a simple idea: the Rayleigh scatter peak should be centered at the excitationequals-emission diagonal. As it is only necessary to find the excitation–emission pairs, the shifting method is optimization free. The idea is to shift the Rayleigh scatter peak for each excitation wavelength, transforming the excitation–emission line from a diagonal in the raw excitation–emission matrix into a straight line in the shifted matrix. This will cause the Rayleigh scatter peak to become low-rank bi-linear, and thus possible to model by methods such as PCA or PARAFAC. However, if there are significant uncertainties in the excitation and emission wavelengths, this shifting is
Fig. 2. Excitation–emission matrix of one of the samples from data set 3. (a) The raw spectra, and (b) the raw spectra with the Rayleigh scatter model subtracted.
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
not reasonable, and the scatter line will not be low-rank bilinear. The step by step procedure of the shifting method is as follows: • For each emission row in an EEM, the theoretical position of the Rayleigh peak is located where the emission wavelength equals the excitation wavelength. Each row in the spectrum is shifted such that the Rayleigh scattering forms a vertical structure in the matrix. The smaller wavelength spectra are shifted to the right (higher wavelengths), while the larger wavelength spectra are shifted to the left (shorter wavelengths), as shown in Fig. 1. • This procedure is repeated for all the samples in the data set. • The shifted landscape is truncated by removing points far away from the Rayleigh scatter line, shown as the grey area in Fig. 1b. • Either PCA or PARAFAC is used to model the Rayleigh scatter line. • The modeled Rayleigh line is shifted back to the original matrix indexing. 2.6. The complete model The complete model was calculated in an iterative fashion. First the Rayleigh scatter was modeled by either PCA or PARAFAC. Then the Rayleigh scatter model was subtracted from the original data, and this Rayleigh corrected data was then modeled by PARAFAC for the fluorophores. This fluorophore model was again subtracted from the original data, and the fluorophore corrected data was then put through the Rayleigh modeling step again. This procedure was repeated until convergence. The convergence criterion was set to 10−6
353
relative change in the residual between two consecutive models – one model being both the Rayleigh and the fluorophore part. An example of how the data looks after the first order Rayleigh scatter is modeled can be seen in Fig. 2 for one sample of data set 3. The choice of whether to start the iterations by first modeling the scatter or the fluorophores was tested and it was found that it was best to start with the scatter model. The reason is that it is vital to get a good initial estimation of both the fluorophores and the Rayleigh scatter for the algorithm to reach a good solution. The starting model of the scatter is less influenced by the fluorophores than the fluorophores are influenced by the scatter, because of the reduced data area used for modeling the scatter (see Fig. 1b). An option for doing this latter method is to use hard weights with weights set to 0 for the Rayleigh area. This, however, did not improve the results, and time before convergence was higher than for starting with modeling the Rayleigh scatter peak, which requires no weighting scheme. 2.7. Number of factors to model the Rayleigh scatter It was questioned whether the modeling of the Rayleigh scatter line should be performed with one or more factors in the PCA or PARAFAC model. This was investigated by re-computing the same model of the entire data set with a different number of factors for the Rayleigh scatter model. Doing this, the number of factors explaining the fluorophores was kept constant. The criterion to decide the right number of components for the modeling of the Rayleigh scatter was the minimum cumulative residual of the modeled Rayleigh scatter compared to the measured Rayleigh scatter peak. In other words, only the part of the EEM containing the Rayleigh scatter peak was used to estimate the number of factors needed to
Fig. 3. The bias of the Rayleigh scatter model compared to the true Rayleigh scatter for data set 2 using PARAFAC to model the Rayleigh scatter.
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
354
model the Rayleigh peak. As the number of estimated factors increases, the summed residual (bias) of the raw Rayleigh minus the modeled Rayleigh flattens or increases when the optimal number of factors is exceeded, see Fig. 3. 2.8. Quality of a model The stability of each of the models was investigated by the use of a jack-knife like procedure [21,22] for all the data sets. Twenty models were built on the data, with a special form of jack-knifing for all the data sets. While in normal jack-knifing every sample is removed once, here each sample is left out a predefined number of times. For data set 1, each sample was removed four times, for data set 2, five times and data set 3, once each. The jack-knife estimates of the standard deviations of the excitation and emission parameters were pooled to one overall standard deviation. In addition to the stability of a model it is of interest to determine how well the model estimates the pure spectra. For data sets 1 and 2 this was quantified by comparing the result from one model with the pure spectra. There were no known spectra for data set 3, and instead the best model with hard-weights, constraints and a band of missing values from Rinnan and Andersen [15], which is a refined decomposition of what Bro shows in [12], was used as the reference. As by Rinnan and Andersen [15] the quality is defined using the Q2 -statistics: • The mean of the between the normalized factors in the model and the normalized reference is calculated – both emission and excitation modes are taken into account. The loadings of the tested model and the reference are compared in order to find the best match. Q2 is defined as follows: (yi − yˆ i )2 2 Qm,f = 1 − (yi − y¯ )2 Q2
where y is one of the loadings of one factor under investigation, yˆ is the reference loading for this factor, while y¯ is the average of the loading under investigation. The m and f indexes on the Q2 denote the mode and factor number investigated (m = 2 indicates the emission-loadings, m = 3 the excitation-loadings). If Q2m,f equals one, the two loadings are equal, if it is zero or less, then there is no similarity between the loading under investigation and the loadings of the reference model. If Q2m,f is below zero, it is set to zero. • The average Q2 -value for all the factors and loadings (excitation and emission) is calculated, giving a number between zero and one. In addition the standard deviation of the Q2 -values was calculated. Thus if a high average value is accompanied by a low standard deviation the model is good at resolving all the factors. For example, for a four factor PARAFAC model, the average and the standard deviation would be calculated from 4 × 2 = 8 different Q2m,f -values.
Twenty models are computed for each of the three data sets on different subsets of the original data leaving out one or more samples in each subset as described above. This gives 20 pairs of average and standard deviation values for each method, based on the Q2 -values. The grand mean and the average standard deviation (calculated as the square root of the mean variance) can be found from these. In order to get a better picture of the uncertainties involved, both the standard deviation of the average and the standard deviation of the standard deviations are calculated, and reported together with the grand mean and the average standard deviation. It should be noted that there are no known spectra for data set 3 (see above). Thus the standard deviation of the Q2 values is of more importance than the mean Q2 -value in order to define a stable, good model. 2.9. Software MATLAB (The Mathworks, Natick, MA), Version 6.5 was used during the calculations. The algorithms in use were from the N-way toolbox Version 2.10 together with the algorithm of the novel method for decomposing fluorescence spectra [18].
3. Results and discussion The first step was to decide the number of factors to use in order to model the Rayleigh scatter. The number of factors to use varied between the data sets and are given in Table 1. Two factors can be used as a default but as can be seen from Table 1 this can be optimized. However, the quality of the chemical part of the model does not greatly improve with minor changes in the scatter model complexity. In order to evaluate the results obtained here, a reference method is needed. As a reference method, a “MILES” PARAFAC [13] with insertion of zeros below the first order Rayleigh scatter [23] was used, as explained by Rinnan and Andersen [15]. The width of the Rayleigh peak was estimated to 10, 20 and 40 nm for the three data sets, respectively. The PARAFAC model was both computed without any constraints and with non-negativity constraints in all modes. The results for the MILES weighting and modeling the Rayleigh scatter peak as a separate factor is given in Table 2. Table 1 Number of components for Rayleigh modeling by shifting Modeling Rayleigh
PARAFAC PARAFAC PCA PCA
Constraints
– NN – NN
Data set 1
2
3
2 2 3 1
1 2 2 2
2 1 1 2
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
355
Table 2 Stability and quality results from decomposing the fluorescence data by MILES weighting and modeling the Rayleigh scatter peak as a separate component (mean with the standard deviation in brackets) Reference
Modeling Rayleigh
MILES
Data set 1 JK 1 − Q2 Std(Q2 ) Data set 2 JK 1 − Q2 Std(Q2 ) Data set 3 JK 1 − Q2 Std(Q2 )
PARAFAC
–
NNa
93 (43) 78 (4) 144 (7)
PCA
–
NNa
–
NNa
83 (45) 71 (4) 129 (3)
109 (64) 79 (5) 142 (6)
91 (57) 75 (5) 127 (3)
93 (46) 77 (4) 143 (6)
78 (50) 78 (5) 126 (4)
222 (99) 109 (15) 113 (18)
160 (80) 82 (7) 93 (6)
264 (129) 117 (19) 114 (18)
161 (78) 81 (7) 92 (4)
210 (104) 109 (15) 111 (17)
151 (82) 81 (7) 89 (5)
134 (41) 933 (55) 981 (45)
75 (36) 713 (131) 1592 (337)
223 (56) 710 (50) 895 (35)
186 (41) 187 (14) 169 (20)
1699 (1322) 1494 (691) 1319 (582)
1809 (1873) 1381 (553) 2120 (803)
Jack-knife (JK), 1 − Q2 and Std(Q2 ) is given as 10−4 . a Non-negativity in all modes.
Data sets 1 and 2 behave very similar for the reference method and the new method. It seems that modeling the Rayleigh peak as a separate factor gives as good results as with using weights and inserting zeros into the data matrix – the best method in literature. The results from data set 3 indicate that the best method to model the Rayleigh scatter is by PARAFAC and constraining the scores and loading from modeling the fluorophores by non-negativity constraints. However, this conclusion is largely dependent upon the reference spectra chosen for this data set. The reference spectra are from a constrained PARAFAC solution with hard weights, with non-negativity on the scores and the excitation wavelength, while the emission wavelengths are constrained to uni-modality. The MILES method and modeling the Rayleigh scatter by PARAFAC without any constraints both give very similar results, both different from the reference. All these emission loadings are shown in Fig. 4. Since these data are from a sugar process, the pure spectra are not known; only that they should not contain any Rayleigh scatter. Neither MILES nor modeling the Rayleigh shows any sign of the Rayleigh scatter peaks, and thus it can be concluded that it is removed successfully. However, deciding which of the methods is correct is a more difficult task. The loadings found by modeling the Rayleigh scatter with PARAFAC and constraining the result to non-negativity gives four spectral estimates that resemble the chosen reference spectra. There are few differences between the resolved spectral estimates and the reference spectra such as some artifacts around 350 nm and along the baseline. For both the two other methods shown in Fig. 4c and d, the loadings are almost identical, indicating that this may also be a reasonable solution; both with some artifacts at the baseline and a large negative area for the black dotted loading. The main goal of this work, however, is not to decide what solution gives the
correct intrinsic spectral profiles for this data, but rather to show that this new method of modeling the Rayleigh scatter peak as a separate factor is feasible. Modeling the Rayleigh scatter by PCA or PARAFAC gives similar results for data sets 1 and 2. For data set 3, however, PARAFAC is more successful in making a stable model. Modeling the Rayleigh scatter by PARAFAC instead of PCA enforces equality between the basic shape of the Rayleigh scatter peaks of the different samples, while PCA treats them all separately. It seems that constraining the Rayleigh scatter peak to behave similarly across the samples is a better method, and does not disturb the tri-linearity of the data as can be seen in all the parameters in Table 2 for data set 3. The shifting method is straightforward, fast and stable. The above tests indicate it to be a reliable and good method for modeling the Rayleigh scatter as the results are of equal quality as for the reference method. The time needed before convergence is between 4 and 10 times longer for the shifting methods, but on the other hand, there are fewer metaparameters to estimate. For the MILES algorithm to work optimally, the Rayleigh width and the bandwidth of missing values are both important to estimate correctly in order to get good decompositions, as was shown by Rinnan and Andersen [15]. On the other hand, for the shifting method, the only parameter to estimate is the width of the Rayleigh scatter. Further, it is not as important to estimate this accurately as for the MILES method. It is required that the estimated width is at least the width of the Rayleigh scatter, but it is not sensitive to moderate overestimation of the Rayleigh scatter width. This suggests that making an automatic estimation for the width of the Rayleigh scatter peak is easier for modeling the Rayleigh scatter than for the MILES method where this estimate should be quite accurate. Therefore the total time consumption can be even less than for the optimal case by the use of MILES.
356
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
Fig. 4. Emission loadings from modeling data set 3. (a) Reference spectra found by uni-modality constraints in the emission loading. (b) Loadings from Rayleigh modeled by PARAFAC with non-negativity constraints. (c) Loadings from Rayleigh modeled by PARAFAC without constraints. (d) Loadings from PARAFAC by using MILES.
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
357
Fig. 4. (Continued ).
4. Conclusion The shifting procedure suggested for modeling Rayleigh scatter has been shown to be a simple and feasible approach for handling scatter induced artifacts. For modeling the Rayleigh scatter line, PARAFAC is a better method than PCA, although the difference is not large, unless the data is of a difficult nature, as is the case for data set 3. This work has shown that modeling the first order Rayleigh scatter line as a separate factor in the decomposition with the shifting method gives as good results as MILES – the best method found in the literature. The shifting method is,
model to model, slower than MILES, but taking into account the optimization steps necessary for MILES, equal or even less time is consumed for the shifting method. Further the parameter estimates for modeling the first order Rayleigh scatter is less sensitive for overestimation than the weights in MILES. It is thus also easier to automate the estimation of the width of the first order Rayleigh scatter peak. The shifting method is therefore a better method than the best method found in the literature. Thanks to the fewer meta-parameters to be estimated, the method is also easy to apply for the nonexpert chemometric user. The algorithm used in this paper is available at http://www.models.kvl.dk/source.
358
˚ Rinnan et al. / Analytica Chimica Acta 537 (2005) 349–358 A.
Modeling the second order Rayleigh scatter by the same method is a straight forward extension. Modeling of the Raman scatter on the other hand is not straight forward, as a transformation is needed from wavelength domain to wavenumber domain, followed by the modeling and a transformation back into the wavelength domain. However, Raman scatter is usually less of an issue because of lesser magnitude and because it can be partially removed by subtracting measurements of a blank sample.
Acknowledgement Rinnan wishes to thank the STVF (Danish Research Council) for financial support through project 1179.
References [1] [2] [3] [4]
R. Bro, Chemomet. Intell. Lab. Syst. 38 (1997) 149. J.D. Carrol, J.-J. Chang, Psychometrika 35 (1970) 283. R.A. Harshman, UCLA Working Papers in Phonetics 16 (1970) 1. C.N. Ho, G.D. Christian, E.R. Davidson, Anal. Chem. 50 (1978) 1108. [5] C.N. Ho, G.D. Christian, E.R. Davidson, Anal. Chem. 52 (1980) 1071.
[6] D.M. McKnight, E.W. Boyer, P.K. Westerhoff, P.T. Doran, T. Kulbe, D.T. Andersen, Limnol. Oceanogr. 46 (2001) 38. [7] J. Christensen, V.T. Povlsen, J. Sørensen, J. Dairy Sci. 86 (2003) 1101. [8] L. Munck, L. Nørgaard, S.B. Engelsen, R. Bro, C.A. Andersson, Chemomet. Intell. Lab. Syst. 44 (1998) 31. [9] M.J. Rodriguez-Cuesta, R. Boque, F.X. Rius, D. Picon Zamora, M. Martinez Galera, F.A. Garrido, Anal. Chim. Acta 491 (2003) 47. [10] M.G. Trevisan, R.J. Poppi, Anal. Chim. Acta 493 (2003) 69. [11] C.M. Andersen, R. Bro, J. Chemomet. 17 (2003) 200. [12] R. Bro, Chemomet. Intell. Lab. Syst. 46 (1999) 133. [13] R. Bro, N.D. Sidiropoulos, A.K. Smilde, J. Chemomet. 16 (2002) 387. [14] R.D. Jiji, K.S. Booksh, Anal. Chem. 72 (2000) 718. ˚ Rinnan, C.M. Andersen, Chemomet. Intell. Lab. Syst., in press. [15] A. [16] R.G. Zepp, W.M. Sheldon, M.A. Moran, Marine Chem. 89 (2004) 15. [17] D. Baunsgaard, The Royal Veterinary and Agricultural University, Internal Report, 1999. http://www.models.kvl.dk/research/ data/dorrit/dorrit.pdf. [18] January 2005. http://www.models.kvl.dk/source. [19] R. Bro, S. de Jong, J. Chemomet. 11 (1997) 393. [20] R. Bro, N.D. Sidiropoulos, J. Chemomet. 12 (1998) 223. [21] H. Martens, M. Martens, Multivariate Analysis of Quality – An Introduction, Wiley, Chichester, 2001, 445 pp. [22] J. Riu, R. Bro, Chemomet. Intell. Lab. Syst. 65 (2003) 35. ˚ Rinnan, S. Barsberg, J.K.S. Møller, Chemomet. [23] L.G. Thygesen, A. Intell. Lab. Syst. 71 (2004) 97.