Post processing methods (PLS–CCA): simple alternatives to preprocessing methods (OSC–PLS)

Post processing methods (PLS–CCA): simple alternatives to preprocessing methods (OSC–PLS)

Chemometrics and Intelligent Laboratory Systems 73 (2004) 199 – 205 www.elsevier.com/locate/chemolab Post processing methods (PLS–CCA): simple altern...

197KB Sizes 0 Downloads 45 Views

Chemometrics and Intelligent Laboratory Systems 73 (2004) 199 – 205 www.elsevier.com/locate/chemolab

Post processing methods (PLS–CCA): simple alternatives to preprocessing methods (OSC–PLS) Honglu Yu, John F. MacGregor * Department of Chemical Engineering, McMaster Advanced Control Consortium, McMaster University, 1280 Main St., West, JHE-374, Hamilton, ON, Canada L8S 4L7 Received 29 September 2003; received in revised form 2 April 2004; accepted 6 April 2004 Available online 21 July 2004

Abstract Orthogonal signal correction (OSC) methods have been proposed as a way of preprocessing data prior to performing PLS regression. The purpose is generally not to improve the prediction but to remove variation in X that is uncorrelated with Y in order to simplify both the structure and interpretation of the resulting PLS regression model. This paper introduces an alternative approach based on post-processing a standard PLS model with canonical correlation analysis (CCA). It is shown that this is only one of a class of post-processing methods which have certain advantages over most preprocessing approaches using OSC. D 2004 Elsevier B.V. All rights reserved. Keywords: Partial least squares; Canonical correlation analysis; Orthogonal signal correction

1. Introduction The objective of PLS is to model the variations in both the X space and the Y space. In the situation that the X space contains large amount of variations not related to the Y space, PLS often requires many components to achieve good prediction, and this may lead to difficulty in interpreting the model. To reduce the number of PLS components, orthogonal signal correction (OSC) was introduced by Wold et al. [1]. Several other OSC algorithms have been presented since then [2– 6]. The concept behind all OSC algorithms is to perform OSC as a preprocessing step to eliminate much of the variation in X that is not correlated with Y. A PLS model is then built to relate the remaining X space and the Y space. However, this raises an interesting question: rather than using OSC following by PLS why not directly perform canonical correlation analysis (CCA). CCA is the most widely known multivariate statistical regression approach and is covered in essentially every textbook on multivariate statistics (e.g. Mardia et al. [7], Johnson and Wichem [8] and Anderson [9]). Unlike PLS, CCA inherently ignores the * Corresponding author. Tel.: +1-905-525-9140; fax: +1-905-5211350. E-mail address: [email protected] (J.F. MacGregor). 0169-7439/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2004.04.006

variations in X that are uncorrelated with Y and directly maximizes the correlation between the two spaces. As will be shown in this paper, if a sufficient number of OSC components are removed, the resulting PLS model on the residuals should approach the result of CCA. Evidently, since the X variables are often highly correlated, directly applying CCA to the raw X space and the Y space will lead to an ill-conditioned regression problem because the solution of CCA involves the calculation of (XTX) 1. However, one can build a PLS model for the raw (X,Y) data and then apply CCA to the score space of the significant PLS components (T) and the Y space. In this way, a parsimonious model with the same prediction ability as the standard PLS model can be obtained. It has been pointed out by Trygg and Wold [6] and Svensson et al. [10] that the main advantage of OSC is not on the improvement of prediction, but rather the improvement of the interpretability of the model. However, the subspace of X uncorrelated to Y is often high dimensional and it is not at all clear what part of this subspace is removed by different OSC algorithms with different numbers of OSC components. As a result, any interpretation of the OSC components may be different. The objective of this paper is to reveal the relation between OSC and CCA, and to show that PLS + CCA can be used as an alternative to OSC + PLS and that it is generally more

200

H. Yu, J.F. MacGregor / Chemometrics and Intelligent Laboratory Systems 73 (2004) 199–205

direct and robust. In fact, what we infer is more general, namely, that a whole class of methods based on relating T (where T is obtained from an initial regression method such as PLS) and Y are simple and robust alternatives to OSC + PLS. The studies begin with the theoretical analysis of the subspaces explained by OSC and CCA components and then a generated data set is used to illustrate the ideas.

2. Analysis of subspaces

The following analysis will show that the space explained by the OSC components is part of HYˆX and the subspace explained by the CCA components is part of GYˆX, implying that CCA is the opposite operation to OSC (see Fig. 1). 2.2.1. OSC+PLS When using an OSC + PLS approach, at first a number of OSC components (T?P?T) are removed from X and the residual of X after removing the OSC components is given by:

2.1. Notations It is assumed that X(K  NX) is the matrix of predictors to be regressed against Y(K  NY). Both X and Y are column-wise mean-centered. PLS, OSC and CCA are all latent variable methods. For each method, the score matrix of the extracted orthogonal latent variables is denoted as T. W is the weight matrix from which T is computed (with respect to the raw X matrix). P and Q are the corresponding loading matrices for X and Y.

Xrosc ¼ X  T? PT? A PLS model is then built to regress Xrosc against Y. The model combining both the OSC step and the PLS step is: X ¼ T? PT? þ T* PT* þ E ˆ ¼ T QT Y * * where the subscript ‘?’ represents the matrices of the OSC components and ‘*’ represents the matrices of the PLS components. Notice that T* is orthogonal to T?. Denote L? as the number of OSC components and L* as the number of PLS components.

T ¼ XW PT ¼ ðTT TÞ1 TT X QT ¼ ðTT TÞ1 TT Y ˆ =TQT. GYˆ is The prediction of Y is given by Y defined as the orthogonal projector onto the column ˆ , i.e. GYˆ = Y ˆY ˆ + (Y ˆ + is the Moore – Penrose space of Y ˆ inverse of Y) and HYˆ is defined as orthogonal component of that space or the anti-projector with respect to the ˆ space: HYˆ = IGYˆ =IY ˆY ˆ +. Y 2.2. Analysis The space of X can be divided into two parts: the ˆ , GYˆX, and the residual part, HYˆX, projection of X onto Y that is

Property 1. The space explained by the OSC components (T?P?T) is part of HYˆX. Proof: To prove that the space explained by the OSC components (T?P?T) is part of HYˆX is equivalent to proving that T?P?T is orthogonal to GYˆX. ˆ = T?TT*QT = 0, then Since T?TY ˆY ˆ þX ¼ 0 ðT? PT? ÞT GYˆ X ¼ P? TT? Y Therefore, T?P?T is orthogonal to GYˆX and is part of HYˆX. HYˆX is a high dimensional space and therefore it is interesting to see which part of this space is removed by the OSC components and which part of the space is left.

X ¼ GYˆ X þ HYˆ X ˆ is the model prediction, Y ˆ = TQT. Only the first where Y part of X, GYˆX, can contribute to the prediction, and therefore it contains the information most related to Y. ˆY ˆ +, where Y ˆ + is dependent on the rank of Y ˆ. GYˆ = Y ˆ Consider K>NY, then rank (Y) = min(NY, L), where L is the ˆ . It is easy to number of the latent variables used to obtain Y show that (see Appendix A):

ˆY ˆþ ¼ GYˆ ¼ Y

ð2Þ

8 ˆ Y ˆ T YÞ ˆ 1 Y ˆT < Yð

L > NY

:

LN Y

T

1

TðT TÞ T

T

ð1Þ Fig. 1. Summary of the subspaces explained by OSC and CCA.

H. Yu, J.F. MacGregor / Chemometrics and Intelligent Laboratory Systems 73 (2004) 199–205

ˆY ˆ +)X, and from Eqs. (1) and (2) it can be HYˆX=(I  Y shown that:when L*>NY, ˆ 1 Y ˆ T ÞðT? PT þ T PT þ EÞ ˆ Y ˆ T YÞ HYˆ X ¼ ðI  Yð ? * *

f

ˆ TðTT TÞ1 TT Y ðYT YÞ1 YT Tc Y ˆ 1 Y ˆ T Tc  ˆ T Tc  Z ðYT YÞ1 YT Tc ¼ ðY ˆ T YÞ ¼Y

ð6Þ

Substituting Eq. (6) into Eq. (5),

HYˆ X ¼ ðI  T* ðTT* T* Þ1 TT* ÞðT? PT? þ T* PT* þ EÞ ð4Þ

TðTT TÞ1 TT YðYT YÞ1 YT Tc¼ Tc  Yˆ

Therefore, when L*>NY after removal of the OSC ˆ components from X space, the remaining space is (IY ˆ TY ˆ )1Y ˆ T)T*P*T+E+GYˆX, which still contains much of (Y the Y-uncorrelated information; when L*VNY the remaining space is E+GYˆX and almost all the uncorrelated information has been removed. 2.2.2. PLS+CCA In the PLS + CCA procedure, a standard PLS model is first built between the raw X space and Y to get the predictions: ˆ ¼ TPT X ˆ ¼ TQT Y A CCA is then applied between T and Y. A subscript ‘c’ is used to represent CCA components. The objective of CCA is to extract, in each dimension, a pair of latent variables (one from the T space and one from the Y space) that have the maximum correlation, subject to the condition that the latent variables for each space are orthogonal to those in earlier dimensions. A general comparison of latent variable methods (PCA, PLS, CCA and reduced rank regression (RRR)) from an objective function framework is given in Burnham et al. [11]. The maximum number of CCA components (Lc) is min(L,NY), where L is the number of components in the standard PLS model. Property 2. The space explained by the CCA components (TcPTc ) is part of GYˆX. Proof: Since in most cases L>NY, for simplicity only this situation is considered here. However, it is also easy to prove Property 2 when L V NY. To prove Property 2 it is sufficient to prove that TcPTc is orthogonal to HYˆX. The canonical covariates Tc are the solution to the following eigenvalue problem [7 –9], where  is a diagonal matrix with the corresponding eigenvalues. TðTT TÞ1 TT YðYT YÞ1 YT Tc ¼ Tc 



ð3Þ

when L* z NY,

¼ T? PT? þ E

where it is assumed here that the Y variables are independent, and so YTY is invertible. ˆ T, Left multiplying on both sides of Eq. (5) by Y

f f

ˆ 1 Y ˆ T ÞT PT þ E ˆ Y ˆ T YÞ ¼ T? PT? þ ðI  Yð * *

201

ð5Þ

ˆ 1 Y ˆ T Tc  ˆ T YÞ ðY

ˆ 1 Y ˆ T Tc  ¼ Tc  ˆ Y ˆ T YÞ Z Yð ˆ 1 Y ˆ TÞ ¼ 0 ˆ Y ˆ T YÞ Z TTc ðI  Yð

ð7Þ

Use Eq. (7), we obtain ˆ 1 Y ˆ T ÞX ¼ 0 ˆ Y ˆ T YÞ ðTc PTc ÞT HYˆ X ¼ Pc TTc ðI  Yð Therefore, TcPTc is orthogonal to HYˆX and is thus part of GYˆX. GYˆX is often a low dimensional space with rank Lc = min(L,NY). When using all Lc CCA components, the space explained by the CCA components is GYˆX and as shown in Appendix B, the model prediction (TcQTc ) is the same as the ˆ ). Therefore, an Lprediction of the standard PLS model (Y component PLS model can be simplified to an Lc-component CCA model with the same prediction power. 2.3. Comments The above discussion shows that CCA is essentially the opposite operation to OSC. OSC components remove part of the subspace HYˆX while CCA components extract part of the subspace GYˆX. The subspace GYˆX is only dependent on the model ˆ. Y ˆ obtained by OSC + PLS and standard PLS prediction Y are generally not identical but similar when a proper number of latent variables are used (i.e. OSC components do not cause an overfitting problem, when an adequate number of PLS components are selected). In fact, when using the OPLS algorithm in the case of a single y variable, it has been theoretically proved that the model prediction obtained by removing L  1 O-PLS components and then using 1 PLS component is identical to the prediction obtained by L standard PLS components [12]. Trygg and Wold [12] suggest that the interpretability can be improved with OPLS by analyzing the correlated variation (obtained by the PLS components) and the uncorrelated variation (obtained by the O-PLS components) separately. The same idea may be adopted when using other OSC methods but only with extreme caution. The correct interpretation can only be obtained when the OSC components properly capture the Y-unrelated variation. As shown in Eqs.

202

H. Yu, J.F. MacGregor / Chemometrics and Intelligent Laboratory Systems 73 (2004) 199–205

(3) and (4), only when L* V NY do the OSC components remove most of the uncorrelated subspace HYˆX (only residual E left). In the literature, many OSC papers suggest removing only 1 or 2 OSC components to avoid over-fitting problems. In this situation, the OSC components may only capture part of the uncorrelated subspace. Misleading interpretation results could be obtained if one tries to analyze the space explained by OSC components to find the source of the uncorrelated variation. Generally, there is no deterministic relation between the number of OSC components and the required number of PLS components. Therefore, as a pre-processing step, it is difficult to choose a proper number of OSC components that can ensure that the following PLS model will give both good prediction results and satisfy the good interpretability condition L* V NY. The only exception is O-PLS and Fearn’s OSC in the case of a single y variable, where the required number of O-PLS components can be easily decided to satisfy both conditions. On the other hand, CCA is a post processing step and therefore there is no concern about an overfitting problem. The number of CCA components required to extract the entire subspace GYˆX is much more deterministic: Lc = min(L,NY). Furthermore, the dimension of uncorrelated space is generally much larger than the dimension of correlated space, therefore the computation for CCA is often much simpler than OSC. For example, consider the single y case where 5 PLS components are needed for a standard PLS model. If O-PLS is used, 4 O-PLS components need to be removed while if CCA is used, only a 1component CCA model needs to be calculated. For the multiple y’s case, no matter which OSC algorithm is used it will always have a risk of over fitting. A general strategy to avoid over fitting is to perform PLS regression and cross validation after removal of every OSC component. The computation of PLS + CCA is much simpler since only one PLS model with cross validation needs to be performed. Moreover, in general the PLS – CCA algorithm is more robust to potential overfitting. If a few too many small PLS ˆ and components are used, it will have almost no effect on Y hence little effect on the results obtained by CCA. On the other hand, a few too many OSC components can have a large effect on OSC – PLS algorithms. It has to be pointed out that CCA is not the only choice for a post processing step to obtain a more parsimonious model and to extract the correlated subspace GYˆX. Using reduced rank regression (RRR) can give similar results. Another approach, called principal components for predictions (PCP) recently presented by Langsrud and Næs [13], can also be used as an alternative. In this approach, a SVD ˆ (which is obtained by a standard or PCA is performed on Y PLS model): PCA=SVD ˆ! Y Tp QTp

where the subscript ‘p’ represents PCP components. The loading matrix for X space is computed as: PTp ¼ ðTTp Tp Þ1 TTp X In Ref. [13], scatter plots of PCP scores (Tp) and loadings (Pp and Qp) are used for interpretation. From another point of view, when the number of PCP components ˆ , then is equal to the rank of Y ˆ ¼ Tp QT Y p

ð8Þ

Eq. (8) can be seen as a parsimonious prediction model. ˆY ˆ + = Tp(TpTTp) 1TpT, therefore the In this situation, GYˆ = Y space explained by PCP components (TpPpT) is Tp PTp ¼ Tp ðTTp Tp Þ1 TTp X ¼ GYˆ X When there is only a single y, the scores obtained by CCA, RRR or PCP are identical. For multiple Y variables, the scores are not exactly the same but will be rotated versions of one another (as long as all the CCA, RRR or PCP components are used).

3. Case study In this section, a case study is presented to illustrate the ideas discussed above. The data sets were generated using the following equations: X ¼ YAT þ %BT þ E Y¼Yþf where X contains 100 variables and Y only contains 1 variable (i.e. NX = 100 and NY = 1). Y and % are independent factors, and E and f are normally distributed random variables whose dimension and standard deviation level are shown in Table 1. A is a 100  1 vector and B is a 100  4 matrix, containing the influence of the factors (Y and %) on X. A total of K = 120 observations are generated. Consider a multivariate observation vector x that can be divided into two parts: the part that is related to Y, xY-related = YAT, and the part that is unrelated to Y, xY-related = %BT + e. Table 1 Dimension and standard deviation of the factors

y % E f a

Dimension

Standard deviation

120  1 120  4 120  100 120  1

3 ½2 4 3 0.02 0.5

The standard deviation for each column.

2 a

H. Yu, J.F. MacGregor / Chemometrics and Intelligent Laboratory Systems 73 (2004) 199–205

A standard PLS model is performed to regress X against Y and, as expected, 5 PLS components are obtained based on cross validation. CCA is then applied on the PLS score matrix T and Y. Only one CCA component is obtained since there is only one y. The part explained by this CCA component is denoted as xcca and the remaining part of x is denoted as xrcca = x  xcca. From Trygg and Wold [12], 4 O-PLS components need to be removed to obtain a single component PLS model while keeping the same prediction result as the standard PLS model with 5 components. Let the part removed by the 4 O-PLS components be denoted as xopls and the part left as xropls = x  xopls. It is expected that xcca c xropls will both be estimates of xY-related = YAT and xrcca c x  xopls will both be estimates of xY-related = %BT + e. Fig. 2 plots the results for one observation vector (x). From this figure, it can be seen that xcca (line ‘4’ in (a)) and xropls (line ‘4’ in (b)) provide similar estimates of xY-related (line ‘2’ in both (a) and (b)) except that xcca is smooth and xropls contains noise. This is understandable because OSC is a preprocessing step and noises contained in the original X matrix will not be removed by OSC components. Another PCA or PLS needs to be performed to obtain smoother x spectra. On the other hand, CCA is a post-processing step that is performed on the space explained by PLS components which contain much less noise compared to the original X space. Therefore xcca is a smoother estimate of xY-related than is xropls. Estimates of unrelated xY-related (line ‘3’ in both (a) and (b)) are provided by xrcca (line ‘5’ in (a)) or xopls (line ‘5’ in (b)). The error in the estimates of xY-related and xY-unrelated in Fig. 2 is because this system contains noise and because the factors are not completely orthogonal due to using a finite number of observations. In this situation, the words ‘Y-orthogonal’ and ‘Y-unrelated’ are not the same. Therefore care needs to be taken when interpreting the models, especially when the system contains high level of noise or has very few observations.

203

As discussed in Section 2, OSC components remove most of the uncorrelated subspace HYˆX only when the number of PLS components is equal to (or less than) the number of Y variables (L* V NY). If this condition is not satisfied, the space after removing OSC components still contains some of the uncorrelated information. Line ‘6’ in Fig. 2(b) is the x variable residual (xropls) after only removing 2 O-PLS components (3 PLS components are then required after removing 2 O-PLS components). Evidently, it is not as good an estimation of xY-related (line ‘2’ in Fig. 2(b)). This exposes a weakness of OSC methods: because OSC is a pre-processing procedure, it is unknown how many PLS components will be required. In the single y case, O-PLS and Fearn’s OSC does not have this weakness; however, for other OSC methods or for multiple Y variables, this problem still exists.

4. Conclusions In this paper, an alternative to using OSC + PLS is presented. Instead of using OSC as a pre-processing step to remove the orthogonal information from X, a CCA is performed on Y and the score matrix T that is obtained by an initial PLS model. Here the CCA is used as a postprocessing step. An analysis of subspaces shows that OSC and CCA are two opposite operations with OSC components removing part of the HYˆX space and CCA components extracting part of the GYˆX space. Both OSC + PLS and PLS + CCA can be used to obtain a parsimonious model and to improve interpretability when X contains a large amount of Y-unrelated variation. However, PLS + CCA has several advantages over OSC + PLS. First, in the PLS + CCA approach, there is little risk of over fitting while when using OSC + PLS one must always guess the number of OSC components, except when using O-PLS with a

Fig. 2. Results from PLS + CCA and OSC + PLS. (a) PLS + CCA (1: x, 2: xY-related, 3: xY-unrelated, 4: xcca, 5: xrcca); (b) OSC + PLS (1: x, 2: xY-related, 3: xY-unrelated, 4: xropls [4 O-PLS components], 5: xopls [4 O-PLS components], 6: xropls [2 O-PLS components]).

H. Yu, J.F. MacGregor / Chemometrics and Intelligent Laboratory Systems 73 (2004) 199–205

ˆ ) = L we have (2) when L V NY, rank(Y ˆ ¼ T QT ¼ U  VT Y KNY

KL LL LNY

KL LNY T

Z T Q

V

¼ U  VT V

KL LNY NY L

KL

LL

LNY NY L

f

single y. Second, the computation of PLS + CCA is often simpler than that of OSC + PLS because the number of CCA components required is often smaller than the number of OSC components. Moreover, contrary to OSC + PLS (where PLS regression is often needed after removing each OSC component to avoid over fitting), the proposed approach only requires building a single cross-validated PLS model. Third, the space explained by the CCA components is a better estimate of Y-related information than the space of X after the OSC components have been removed (since the latter will still contain the noise and the OSC component may capture only part of the Y-unrelated information). These advantages are more evident when building a model with multiple y’s. It has been also shown in the paper that CCA is not the only choice for a post processing step to obtain a more parsimonious model and to extract the GYˆX space. Similar results can be given by using RRR or PCP.

I

Z T ¼ U  ð QT V Þ1 ¼ U Z KL

KL

LL LNY NY L

KL LL

f

204

Z

Z U ¼ TZ1

ð11Þ

Using Eqs. (9) and (11), we have ˆY ˆ þ¼ UðUT UÞ1 UT ¼ TZ1 ½ðTZ1 ÞT ðTZ1 Þ 1 ðTZ1 ÞT Y ¼ TðTT TÞ1 TT

Appendix B This appendix is to show that when using all the CCA components, the model prediction (TcQTc ) is the same as the ˆ ) when the correct prediction of the standard PLS model (Y number of components have been used. ˆ can be expressed as Proof: X

Appendix A This Appendix is to show the validity of Eq. (1) 8 ˆ Y ˆ T YÞ ˆ 1 Y ˆ T L > NY < Yð ˆY ˆþ ¼ GYˆ ¼ Y : : 1 T T TðT TÞ T LN Y

ˆ ¼ TPT ¼ Tc PT þ Tr PT X c r

Proof: Suppose the rank of a matrix A is R and AKN ¼ UKR RR VTRN is the singular value decomposition (SVD) of A. It is well known (Golub and van Loan [14]) that A+, the Moore –Penrose inverse of A, can be obtained by þ

1

where TrPTr is obtained by performing PCA on the subspace after removal of CCA components. Tr is orthogonal matrix and TTc Tr = 0. Since TcPTc = GYˆXTr is orthogonal to GYˆX, ˆ Y ˆ T YÞ ˆ 1 Y ˆ T X ¼ 0 Z TT Y ˆ ¼0 TTr Yð r

ð12Þ

T

A ¼ V U

We have

Since VTV = I and UTU = I, we have AAþ ¼ UðUT UÞ1 UT

ˆ c Wr ¼ TPT ½Wc Wr ½Tc Tr ¼ X½W ð9Þ

ˆ )=NY. Performing SVD on Y ˆ , we have (1) when L>NY, rank(Y have

Notice PT ½Wc Wr is a square matrix. Therefore 02 ˆ ¼ TðTT TÞ1 TT Y ¼ ½Tc Tr @4 Y

TTc

3

5½Tc Tr A 4

TTr KNY

¼ U  VT KNY

NY NY NY NY

f

ˆ Y

ˆ 1 ¼ U Z Z U ¼ YZ

ð10Þ

11 2

¼ Tc ðTTc Tc Þ1 TTc Y þ Tr ðTTr Tr Þ1 TTr Y

TTc

3 5Y

TTr ð13Þ

KNY NY NY

Z

Left multiply TTr to the both sides of Eq. (13) and combine Eq. (12), we obtain

Using Eqs. (9) and (10), we have ˆ ¼ TT Y ¼ 0 TTr Y r ˆY ˆ þ¼ UðUT UÞ1 UT ¼ YZ ˆ 1 ½ðYZ ˆ 1 ÞT ðYZ ˆ 1 Þ 1 ðYZ ˆ 1 ÞT Y ˆ 1 Y ˆT ˆ Y ˆ T YÞ ¼ Yð

Substitute Eq. (14) to Eq. (13), ˆ ¼ Tc ðTT Tc Þ1 TT Y ¼ Tc QT Y c c c

ð14Þ

H. Yu, J.F. MacGregor / Chemometrics and Intelligent Laboratory Systems 73 (2004) 199–205

References ¨ hman, Orthogonal signal cor[1] S. Wold, H. Antti, F. Lindgren, J. O rection of near-infrared spectra, Chemometrics and Intelligent Laboratory Systems 44 (1998) 175 – 185. [2] J. Sjo¨blom, O. Svensson, M. Josefson, H. Kullberg, S. Wold, An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra, Chemometrics and Intelligent Laboratory Systems 44 (1998) 229 – 244. [3] C.A. Andersson, Direct orthogonalization, Chemometrics and Intelligent Laboratory Systems 47 (1999) 51 – 63. [4] T. Fearn, On orthogonal signal correction, Chemometrics and Intelligent Laboratory Systems 50 (2000) 47 – 52. [5] J.A. Westerhuis, S. de Jong, A.K. Smilde, Chemometrics and Intelligent Laboratory Systems 56 (2001) 13 – 25. [6] J. Trygg, S. Wold, Orthogonal projections to latent structures (OPLS), Journal of Chemometrics 16 (2002) 119 – 128. [7] K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate Analysis, Academic Press, New York, 1979.

205

[8] R.A. Johnson, D.W. Wichem, Applied Multivariate Statistical Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1988. [9] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, New York, 1984. [10] O. Svensson, T. Kourti, J.F. MacGregor, An investigation of orthogonal signal correction algorithms and their characteristics, Journal of Chemometrics 16 (2002) 176 – 188. [11] A.J. Burnham, R. Viveros, J.F. MacGregor, Frameworks for latent variable multivariate regression, Journal of Chemometrics 10 (1996) 31 – 45. [12] J. Trygg, S. Wold, Some new properties of the components of O-PLS method, Journal of Chemometrics (in press). [13] Ø. Langsrud, T. Næs, Optimised score plot by principal components of predictions, Chemometrics and Intelligent Laboratory Systems 68 (2003) 61 – 74. [14] G.H. Golub, C.F. van Loan, Matrix Computations, third edition, Johns Hopkins University, Baltimore, MD, 1996.