EEMizer: Automated modeling of fluorescence EEM data

EEMizer: Automated modeling of fluorescence EEM data

Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory S...

1MB Sizes 0 Downloads 44 Views

Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c h e m o l a b

EEMizer: Automated modeling of fluorescence EEM data Rasmus Bro a,⁎, Maider Vidal b a b

Department of Food Science, Quality and Technology, Faculty of Life, Sciences, University of Copenhagen, Rolighedsvej 30, DK-1958, Frederiksberg C, Denmark Departamento de Química Aplicada, Facultad de Química, Universidad, del País Vasco, Apdo. 1072, 20080 San Sebastián, Spain

a r t i c l e

i n f o

Article history: Received 14 February 2010 Received in revised form 10 May 2010 Accepted 15 June 2010 Available online 25 June 2010 Keywords: Fluorescence EEM PARAFAC Multi-way Tensor

a b s t r a c t For many years it has been known that PARAFAC offers a very attractive approach for modeling fluorescence excitation–emission matrices. Due to the uniqueness of the PARAFAC model and analogy between the structure of fluorescence data and the PARAFAC model, it is apparent that PARAFAC can resolve overlapping signals into pure spectra and relative concentrations under mild conditions. There are hundreds of applications exemplifying this, but still the use of PARAFAC has not spread from chemometrics to more mainstream analytical chemistry. Many reasons can be offered to explain this, but one seems to be that in practice it is difficult for chemometric novices to make use of PARAFAC. Selection of wavelengths, handling of scatter and of outliers are all issues that must be dealt with in order to build a good PARAFAC model. In this paper, a new algorithm called EEMizer is developed that aims to automate the use of PARAFAC. Through several examples it is shown how this algorithm can provide appealing PARAFAC models of data that would otherwise be hard to model. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Fluorescence excitation–emission measurements are used in many different fields such as skin analysis [1], fermentation monitoring [2–4], environmental analysis [5–7], food [8] and clinical analysis [9]. Common to many of these fields is that the spectroscopic measurements are performed directly on complex mixtures rather than on simple purified samples. This has posed severe problems in the subsequent analysis of the data, because the contributions of the individual fluorophores usually overlap. Multivariate methods have made it possible to develop calibration models for specific properties but not to extract the full information available in the data. The introduction of tensor-based modeling through the use of PARAFAC has revolutionized the information-retrieval from such data [10–13]. Using PARAFAC, it is possible under some circumstances, to perform so-called mathematical chromatography; that is, to separate the mixture measurements into the contributions from the underlying individual chemical analytes. For each analyte, the pure excitation and emission spectra are obtained as well as the relative concentration. Even though PARAFAC has improved the possibilities for extracting information from EEM data, the use of PARAFAC is by no means simple. Skilled operators are needed, that have a thorough understanding of fluorescence data and of tensor modeling. Naturally, this puts a limit on how widely applied PARAFAC is.

There are several issues that data analysts have to take into account when analyzing a specific data set. Some of these issues are fairly trivial but a number of decisions on the route to a good model are taken on complex grounds and based on specific experience. Hence developing an automated tool for building a sound model is not a trivial task. Some of the issues that must be addressed is handling of scattering; especially Rayleigh scattering, possible exclusion of non-informative or misleading variables and decision on possible outlying samples. Furthermore, the number of PARAFAC components has to be determined. As all these issues affect each other, the end point; a sound PARAFAC model, is not easy to achieve and it is likely that an inexperienced data analyst fails in building a good model even though the data supports it. Some papers describe how to achieve good PARAFAC models of EEM data, but still leave it to the analyst to make the needed choices to produce a good model [14,15]. It is the purpose in this paper, to develop a tool that can automate the PARAFAC model building in such a way that PARAFAC can be more widely applied and especially so that PARAFAC can be accessible to chemists with less background in data analysis. 2. Theory 2.1. Requirements In order to be able to automate the modeling, it is necessary to make certain assumptions on the data available. These are listed below:

⁎ Corresponding author. E-mail address: [email protected] (R. Bro). 0169-7439/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2010.06.005

• A sufficient number of adequate samples are measured • A sufficient resolution is used both for excitation and emission

R. Bro, M. Vidal / Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

• Appropriate measures have been taken to ensure that the EEM data is meaningful in the context of Beers law. The above list seems somewhat fluffy in its requirements but this is purposely so. What constitutes a sufficient number of samples depends highly on the complexity of the problem. For very welldefined and simple problems where the fluorophores are reasonably different with respect to excitation and emission spectra and where their signal-to-noise ratio is high, as few as two samples can be sufficient. In other cases, many more samples may be required to be able to accurately estimate the signals from the underlying fluorophores. It is a premise for this paper and the developed algorithm that it is assumed that it is possible to estimate a meaningful PARAFAC model and this is left for the experimenter to ensure. The term ‘adequate samples’ refers to the fact that the samples should be such that the underlying fluorophores are varying independently in the samples. Imagine a situation where hundreds of samples are supplied but these samples are all identical. Such a data set would not allow PARAFAC to estimate the underlying variation. Hence, ‘adequate samples’ implies that the samples are chosen in such a way that all fluorophores vary independently. This may be difficult to ensure in practice, but if the conditions are not met it is expected that the outcome of the algorithm will not be chemically meaningful. It is then the responsibility of the experimenter to conclude that the requirements were not fulfilled and take proper actions. Proper actions could e.g. be to use spiking of samples to provide independent variation. As with the number and sort of samples needed, similar statements can be made about the spectral resolution. Ideally, even down to two excitation and two emission wavelengths can be sufficient, but in practice, more wavelengths will allow separating more signals. It is a general experience that having many more than ten wavelengths is often useful and that having many hundreds of wavelengths is typically not critical. Still, even within these limits, the nature and similarity of the fluorophores is what determines what a sufficient number of wavelengths is. A prerequisite for PARAFAC to provide meaningful results is that Beers law is approximately obeyed, e.g. by assuring that no or little inner filter effect occurs, that pH and temperature is kept constant etc. [15]. It is assumed in the following that the data can be meaningfully described by a low number of PARAFAC components. 3. Overall factors to investigate Given that the above requirements are fulfilled or at least approximately fulfilled, the decision on what is an appropriate PARAFAC model includes at least the following decisions: • • • •

Wavelengths to include How to handle Rayleigh scattering Number of components to use Samples to exclude.

These four factors all depend on each other, so they cannot be chosen sequentially. For now, a fairly costly approach is adopted where many combinations of each are investigated. It may likely be possible to speed up computations by using more intelligent designs to investigate different combinations. 3.1. Included wavelengths It is characteristic of many fluorescent compounds that the spectra are quite broad relative to the spectral range measured. Therefore, it is mostly not necessary to consider variable selection in much detail. The only part of an EEM that can sometimes beneficially be removed is the low excitation wavelength area. Due to the characteristics of many instruments, excitations below approximately 240 nm can be very noisy and have a high variance which can disturb modeling. Therefore it is

87

reasonable to try to eliminate the lowest excitations and verify if this improves the model. The parameter to optimize is called LOWEXREMOVE and indicates the number of excitations to remove (starting from the lowest). 3.2. Rayleigh Rayleigh (and Raman) scattering is inconsistent with PARAFAC and has to be dealt with. Many approaches have been developed for this [16–21]. Here we adopt a simple approach setting the scatter bands to missing and inserting zeros at emission below the excitation wavelength (below the ridge of missing values). The only choice to be made is then how wide the band of missing values should be. For the given paper, only Rayleigh is considered and the parameter to optimize is called SCATWIDTH which gives the extent in nanometers around the diagonal of equal excitation and emission which is set to missing. For many applications, any possible Raman scattering can be handled by simply subtracting a blank measurement from all samples. 3.3. Components In general PARAFAC models are tested from smaller number of components and upwards. This is so because using excessive number of components in PARAFAC can give meaningless results for mathematical reasons [22]. To determine the number of components, models with increasing number of components are tested and the one with the highest number that is still providing a good model is chosen. The parameter FAC determines the number of components. 3.4. Outliers For any data set and choice of parameters (such as the above), outlying samples must be identified and handled in order for the model to be meaningful. The following approach is taken to identify outliers. For a particular model, samples with a high leverage or high sum-squared residual are removed one by one until no samples are assessed as outliers. Leverages for the samples are defined as the elements on the diagonal of the hat matrix of the score matrix. The removal is done by converting both leverage and sum-squared residuals into relative measures by dividing with the median value of these. For a sample set with e.g. 70 samples, there will be 70 leverages and these are all divided with the median of these 70 values. A level is defined (default five) and a sample with either a relative leverage or sum-squared residual above the level (five) is a candidate for removal. The one with the highest relative measure is removed and the analysis repeated. If there are very few samples in the data set (default 20) then the level is increased in order to allow more variability. Also, if more than a certain fraction of the samples (default 10%) have been removed, the level is increased to avoid too many samples being removed. These settings may seem fairly conservative in assessing when a sample is considered outlying, but in practice they ensure that very extreme, hence completely dissimilar, samples are identified. At the same time, samples are retained that would otherwise be deemed extreme e.g. when having a designed data set (where all samples may be extremes to some extent). More modest outliers are often not crucial to exclude as they help span similar variation, and hence just lead to extreme score values but an otherwise sound model. The main factor to optimize is LEVEL that defines when a sample is considered an outlier. In the following though, this factor is kept constant at five, but it is perfectly possible to optimize LEVEL. 4. Objective An objective is needed in order to automatically decide which possible models are good and which are bad. There are several

88

R. Bro, M. Vidal / Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

possibilities for evaluating if a model is good but here the following three measures are chosen: fit-values, core consistency and split-half quality. FIT values are given as a relative measure:

J

I

K

2

∑ ∑ ∑ eijk

FIT = 1−

i=1 j=1 k=1 J

I

K

∑ ∑ ∑ x2ijk

i=1 j=1 k=1

where eijk is the residual of sample i at emission j and excitation k. Core consistency is calculated as described in the original article [22] and called COREC here. The core consistency is calculated as 0

2 1 F F F  ∑ ∑ ∑ gdef −tdef C B d=1 e=1 f =1 B C COREC = 100B1− C @ A F

another model on samples [C|D]. The alternative split is to use samples [A|C] and [B|D]. These two alternative splits are both investigated and only the best one is used. This is reasonable because an unlucky split may lead to bad results due to splitting whereas it is unlikely that an otherwise bad model accidentally gives a good split-half result. The split-half quality is determined as the product of Tucker congruence [24] between the loadings of the two alternative models. For two competing models estimated on two halves of the data set, a set of loadings is obtained for each model in each mode. In the emission mode, the loadings are held in matrix B1 for one half of the data and B2 for the other half. The columns of the matrices are normalized and the Tucker congruence is calculated as the diagonal of BT1B2 upon matching the components in the two models so that similar components appear as similar columns. The same is done for the excitations loadings (C1 and C2) and the element-wise product of these two congruence vectors is then multiplied to provide the splithalf quality. This measure is called SPLTH. All three measures vary between 0 and 1, where zero is bad and one is perfect. An overall measure is defined from the product of these three measures as EEMqual = FIT*COREC*SPLTH:

where F is the number of components, tdef is an element of the target core array of a PARAFAC model which is a superdiagonal array with ones on the superdiagonal. The element gdef is the core estimated in a least squares sense using the PARAFAC components as loadings (see ref. [22] for more details). Split-half quality is determined from the similarity of the loadings obtained when fitting a PARAFAC model to different subsets of the data. In practice this is done by splitting the data into the first half and last half of the samples and comparing the loadings [23]. If the correct number of components is used, the loadings should be the same in these two sample sets given that the samples reflect the same underlying variation. Note that this assumes that the samples represented in the first and the last half are approximately equally representative which may require some reshuffling of the samples depending on the design of the data. In practice, one alternative split is also investigated where the sample set is split into four parts ([A|B|C| D]). The above split corresponds to using one model of [A|B] and

Thus, when EEMqual is close to one, all three measures are high and the model successful. In some cases, it may happen that a data set does not support e.g. splithalf validation for example because certain fluorophores occur only in one or a few samples. Then the overall measure EEMqual may be persistently low in which case, suitable models can be found from scrutinizing the three individual measures that make up the final measure. 5. Algorithm EEMizer Having defined the parameters to optimize and an overall measure of the quality of the model, it is possible to devise an algorithm for finding a good PARAFAC model. The algorithm takes as input the maximal number of components to test, maxFAC as well as the widths of windows where Rayleigh scatter is tentatively removed. Typically this is set to 5, 10, 15 and 25 nm, but others can be tested as well if more computational power is available or if the amount of scatter is deemed broader or more narrow.

Fig. 1. Typical examples of EEMs from each of the three data sets.

R. Bro, M. Vidal / Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

89

Above, the step of determining outliers proceeds as follows: FUNCTION: Determine optimal model by removing outliers • • • •

Estimate PARAFAC model with FAC factors and scatter removed Determine fit Determine core consistency Do split-half analysis and determine split-half quality.

From the estimated model, determine sample leverage and sumsquared residuals relative to respective median values. Find sample with highest leverage or sum-squared residual Adjust critical level

Fig. 2. EEMQual for Dorrit data.

ALGORITHM EEMizer For FAC = 1 to maxFAC For SCATWIDTH = [5, 10, 15, 25] For LOWEXREMOVE = 1 to maxValue Determine optimal model by removing outliers end end end

• If N10% of samples have been removed adjust the critical level using !2 current 1− NNoriginal fracfactor = , where Ncurrent is the current number of 0:10 samples and Noriginal is the original number of samples. • If original number of samples is lower than 20, then adjust the 30 critical level using samplenumberfactor = N current • CRITICAL LEVEL = LEVEL*samplenumberfactor*fracfactor Find sample with highest leverage or sum-squared residual. If higher than the critical level, remove it. Repeat whole procedure until no more samples are removed. Upon running EEMizer, the best model is chosen for each number of components as the one yielding the highest EEMqual value. The

Fig. 3. Emission and excitation loadings for a four-component PARAFAC model before and after EEMizer.

90

R. Bro, M. Vidal / Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

Fig. 6. EEMQual for fermentation data.

Fig. 4. Result of EEMizer on DOM data.

results are presented in terms of this value as a function of number of components and the user then simply has to pick the highest number of components for which the quality measure is sufficiently high. For very well-defined data, this will indeed be a simple task, but certain data sets can still provide somewhat ambiguous results. It may be that a PARAFAC model needs five components, but the fifth component has a very low signal-to-noise. In such a case, the five component model may yield an intermediate value and it may be necessary that the user compares competing models with intermediate quality levels to find the appropriate model.

The second data set consists of 65 EEMs which have been spectrally corrected for instrument biases, corrected for inner filter effects and Raman calibrated. As a result, all of the Rayleigh scatter and most of the Raman scatter has been removed. The data consists of water samples collected from a cruise on RV Gunnar Thorson in the Kattegat and Belt Sea region (at the entrance to the Baltic Sea) in August 2001. The data are part of a tutorial on doing PARAFAC on samples for characterizing DOM (dissolved organic matter) fluorescence [14] which provides a practical guideline for achieving manually what this paper tries to do automatically. Finally, the third data set is an example of low-resolution EEM data; in this case containing 338 samples of at-line measurements of fermentation broth [4] with 15 excitation and 15 emission wavelengths.

6. Materials and methods

7. Results and discussion

The EEMizer algorithm has been implemented in Matlab and is available at www.models.life.ku.dk (May, 2010). The function depends on functions from PLS_Toolbox ver. 5.5 (www.eigenvector.com, April 2010). Apart from the above-mentioned algorithmic settings, PARAFAC is implemented using nonnegativity constraints in all modes in EEMizer. Three data sets have been analyzed with EEMizer. Detailed information on the data is available in the original papers. The first data set is called Dorrit and is a set of 27 samples containing various mixtures of hydroquinone, tryptophan, phenylalanine and dopa [25]. What is characteristic of this data set is that even though it seems like a simple data set, it is typically hard for inexperienced users to find a good model because of noisy variables as well as several outliers. A typical example of an EEM from this data set is shown in Fig. 1.

7.1. Dorrit For the Dorrit data, EEMizer provides the overall result shown in Fig. 2. The results may be interpreted in the following way. Up to four components are clearly supported and anything beyond five seems to be too many components. Looking into the five component model, it is clear that this model is not adequate (results not shown). One of the five components has spectral loadings that do not look sound and hence, the fourcomponent model is chosen. For this model, EEMizer has removed the seven lowest excitations, used a scatter width of 15 nm and retained all samples. As can be seen in Fig. 3, the emission loadings look sound. Furthermore, the scores are perfectly correlated to the

Fig. 5. Example of EEM (left) illustrating the low signal-to-noise level in these data and the PARAFAC model estimate of the EEM (right) where a lot of the noise is effectively filtered off.

R. Bro, M. Vidal / Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

91

Fig. 7. Emission and excitation loadings for a six-component PARAFAC model before and after EEMizer.

reference concentrations (not shown). It is noteworthy, that EEMizer was able to find a good model without removing any samples. This was not the case in the original analysis of the data [25].

publication. This is not the case, when fitting the PARAFAC model to the raw data (Fig. 7).

7.2. DOM

8. Discussion

The DOM data set is complicated because of very low signals and hence a large proportion of noise. The result of EEMizer on the DOM data set is shown in Fig. 4. It seems that three components are supported. Looking into the results, the three-component model looks excellent from a chemical point of view. No samples or excitations are removed, so the result is essentially identical to what would be obtained immediately on the raw data (where Rayleigh scatter signal was already removed). It is interesting to see that the model of the data provides a significant noise-filtering of the original data (Fig. 5).

EEMizer is an attempt at developing an automatic PARAFAC model builder for EEM data. It provides good results for a number of applications of quite diverse nature which is very encouraging. The process of determining a good PARAFAC model is significantly simplified compared to manual determination of optimal settings. Even for data as diverse as shown here, building a PARAFAC model was simplified to at most assessing two competing models. Further refinements can definitely be envisioned and most of all a general speedup of the algorithm would be feasible. This can be achieved in various ways. For example, distributed computing seems like an obvious approach as the overall problem is easy to parallelize.

7.3. Fermentation The fermentation data is different from the above examples in that the instrument used is a low-resolution process instrument. Due to the selected filters in the instrument, there is no scattering as all emissions are above the excitation wavelengths. Running EEMizer provides the results shown in Fig. 6. It seems that definitely five and perhaps six components are supported. In the original publication, five components were suggested, but in fact the six-component model found by EEMizer appears valid and reasonable. Twelve samples are removed by EEMizer and the spectral loadings are reasonable and consistent with what was found in the original

Acknowledgments The VILLUM KANN RASMUSSEN foundation as well as the Danish Agency for Science, Technology and Innovations is gratefully thanked for financial support to the work reported here. Maider Vidal acknowledges financial support from Basque Government in the form of a scholar fellowship. The original contributors of the data used here are also thanked. The data sets are available at www.models.life.ku.dk (March, 2010).

92

R. Bro, M. Vidal / Chemometrics and Intelligent Laboratory Systems 106 (2011) 86–92

References [1] R.K. Lauridsen, H. Everland, L.F. Nielsen, S.B. Engelsen, L. Norgaard, Exploratory multivariate spectroscopic study on human skin, Skin Res. Technol. 9 (2003) 137–146. [2] A. Hagedorn, R.L. Legge, H. Budman, Evaluation of spectrofluorometry as a tool for estimation in fed-batch fermentations, Biotechnol. Bioeng. 83 (2003) 104–111. [3] E. Morel, K. Santamaria, M. Perrier, S.R. Guiot, B. Tartakovsky, Application of multiwavelength fluorometry for on-line monitoring of an anaerobic digestion process, Water Res. 38 (2004) 3287–3296. [4] P.P. Mortensen, R. Bro, Real-time monitoring and chemical profiling of a cultivation process, Chemom. Intell. Lab. Syst. 84 (2006) 106–113. [5] R.M. Cory, D.M. McKnight, Fluorescence spectroscopy reveals ubiquitous presence of oxidized and reduced quinones in dissolved organic matter, Environ. Sci. Technol. 39 (2005) 8142–8149. [6] G.J. Hall, J.E. Kenny, Estuarine water classification using EEM spectroscopy and PARAFAC-SIMCA, Anal. Chim. Acta 581 (2007) 118–124. [7] C.A. Stedmon, S. Markager, R. Bro, Tracing dissolved organic matter in aquatic environments using a new approach to fluorescence spectroscopy, Mar. Chem. 82 (2003) 239–254. [8] J. Christensen, L. Nørgaard, R. Bro, S.B. Engelsen, Multivariate autofluorescence of intact food systems, Chem. Rev. 106 (2006) 1979–1994. [9] E. Koller, O. Quehenberger, G. Jürgens, O.S. Wolfbeis, H. Esterbauer, Investigation of human plasma low density lipoprotein by three-dimensional fluorescence spectroscopy, FEBS Lett. 198 (1986) 229–234. [10] R.D. Jiji, G.A. Cooper, K.S. Booksh, Excitation–emission matrix fluorescence based determination of carbamate pesticides and polycyclic aromatic hydrocarbons, Anal. Chim. Acta 397 (1999) 61–72. [11] L. Moberg, G. Robertsson, B. Karlberg, Spectrofluorimetric determination of chlorophylls and pheopigments using parallel factor analysis, Talanta 54 (2001) 161–170. [12] E. Alm, R. Bro, S.B. Engelsen, B. Karlberg, R. Torgrip, Vibrational overtone combination spectroscopy (VOCSY)—a new way of using IR and NIR data, Anal. Bioanal. Chem. 388 (2007) 179–188.

[13] R. Bro, Review on multiway analysis in chemistry—2000–2005, Critical Reviews In Analytical Chemistry 36 (2006) 279–293. [14] C.A. Stedmon, R. Bro, Characterizing dissolved organic matter fluorescence with parallel factor analysis: a tutorial, Limnol. Oceanogr. 6 (2008) 572–579. [15] C.M. Andersen, R. Bro, Practical aspects of PARAFAC modelling of fluorescence excitation–emission data, J Chemom 17 (2003) 200–215. [16] D.I.C. Kells, J.D.J. O'Neil, T. Hofmann, A method for eliminating Rayleigh scattering from fluorescence spectra, Anal. Biochem. 139 (1984) 316–318. [17] R.D. Jiji, K.S. Booksh, Mitigation of Rayleigh and Raman spectral interferences in multiway calibration of excitation–emission matrix fluorescence spectra, Anal. Chem. 72 (2000) 718–725. [18] L.G. Thygesen, Å. Rinnan, S. Barsberg, J.K.S. Møller, Stabilizing the PARAFAC decomposition of fluorescence spectra by insertion of zeros outside the data area, Chemom. Intell. Lab. Syst. 71 (2004) 97–106. [19] A. Rinnan, C.M. Andersen, Handling of first-order Rayleigh scatter in PARAFAC modelling of fluorescence excitation–emission data, Chemom. Intell. Lab. Syst. 76 (2005) 91–99. [20] A. Rinnan, K. Booksh, R. Bro, First order Rayleigh as a separate component in the decomposition of fluorescence landscapes, Anal. Chim. Acta 537 (2005) 349–358. [21] M. Bahram, R. Bro, C.A. Stedmon, A. Afkhami, Handling of Rayleigh and Raman scatter for PARAFAC modeling of fluorescence data using interpolation, J Chemom 20 (2007) 99–105. [22] R. Bro, H.A.L. Kiers, A new efficient method for determining the number of components in PARAFAC models, J Chemom 17 (2003) 274–286. [23] R.A. Harshman, W.S. De Sarbo, An application of PARAFAC to a small sample problem, demonstrating preprocessing, orthogonality constraints, and split-half diagnostic techniques, in: H.G. Law, C.W. Snyder, J.A. Hattie, R.P. McDonald (Eds.), Research Methods for Multimode Data Analysis, Praeger Special Studies, New York, 1984, pp. 602–642. [24] G. Lorho, F. Westad, R. Bro, Generalized correlation loadings — extending correlation loadings to congruence and to multi-way models, Chemom. Intell. Lab. Syst. 84 (2006) 119–125. [25] J. Riu, R. Bro, Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models, Chemom. Intell. Lab. Syst. 65 (2003) 35–49.