Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables

Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables

Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory ...

NAN Sizes 0 Downloads 20 Views

Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c h e m o l a b

Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables Hiroyuki Yamamoto a, Hideki Yamaji b,⁎, Yuichiro Abe b, Kazuo Harada c, Danang Waluyo c, Eiichiro Fukusaki c, Akihiko Kondo b, Hiromu Ohno b, Hideki Fukuda a,d a

Department of Molecular Science and Material Engineering, Graduate School of Science and Technology, Kobe University, 1-1 Rokkodai, Nada, Kobe 657-8501, Japan Department of Chemical Science and Engineering, Graduate School of Engineering, Kobe University, 1-1 Rokkodai, Nada, Kobe 657-8501, Japan Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita 565-0871, Japan d Organization of Advanced Science and Technology, Kobe University, 1-1 Rokkodai, Nada, Kobe 657-8501, Japan b c

a r t i c l e

i n f o

Article history: Received 28 August 2008 Received in revised form 17 April 2009 Accepted 26 May 2009 Available online 2 June 2009 Keywords: Dimensionality reduction Principal component analysis Partial least squares Fisher discriminant analysis Smoothing Kernel method Metabolomics

a b s t r a c t Dimensionality reduction is an important technique for preprocessing of high-dimensional data. Because only one side of the original data is represented in a low-dimensional subspace, useful information may be lost. In the present study, novel dimensionality reduction methods were developed that are suitable for metabolome data, where observation varies with time. Metabolomics deal with this type of data, which are often obtained in microorganism fermentation processes. However, no dimensionality reduction method that utilizes information from the original data in a positive manner has been reported to date. The ordinary dimensionality reduction methods of principal component analysis (PCA), partial least squares (PLS), orthonormalized PLS (OPLS), and regularized Fisher discriminant analysis (RFDA) were extended by introducing differential penalties to the latent variables in each class. A nonlinear extension of this approach, using kernel methods, was also proposed in the form of kernel-smoothed PCA, PLS, OPLS, and FDA. Since all of these methods are formulated as generalized eigenvalue problems, the solutions can be computed easily. These methods were then applied to intracellular metabolite data of a xylose-fermenting yeast in ethanol fermentation. Visualization in the low-dimensional subspace suggests that smoothed PCA successfully preserves the information about the time course of observations during fermentation, and that RFDA can produce high separation among different strains. © 2009 Elsevier B.V. All rights reserved.

1. Introduction Recent active study has focused on comprehensive analyses such as gene expression analysis from microarray and metabolome analysis. These analyses often yield high-dimensional data, especially N ≪ p type data, where the number of variables (p) exceeds the number of observations (N). A dimensionality reduction technique is required as a preprocessing step to visualize the high-dimensional data. Ordinary multivariate analyses, such as principal component analysis (PCA), partial least squares (PLS) for discrimination [1], and Fisher discriminant analysis (FDA) [2], have been applied for dimensionality reduction. However, because dimensionality reduction can only represent one side of the original data in a low-dimensional subspace, useful information may be lost. For fermentation processes, metabolome analysis has been performed to trace the time courses of intracellular metabolites.

⁎ Corresponding author. Tel./fax: +81 78 803 6200. E-mail address: [email protected] (H. Yamaji). 0169-7439/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2009.05.006

Such experimental data include information where the observation varies with time. However, no dimensionality reduction method that utilizes such information in a positive manner has been developed to date. In the present study, the ordinary dimensionality reduction methods of PCA, PLS, orthonormalized PLS (OPLS) [3], and regularized FDA (RFDA) [4] were extended by introducing a differential penalty in each class. Extension using smoothing methods, such as smoothed PCA [5,6], has previously been proposed in the context of functional data analysis [7]. However, these methods are designed for data where the variables, but not the information, vary with time. In the present study, metabolome data were collected to clarify the intracellular xylose metabolism in xylose-fermenting yeast [8] for ethanol production. In ethanol fermentation from xylose, cell growth and ethanol production have reportedly improved by adaptation under anaerobic conditions [9]. However, the time courses of changes in intracellular metabolites during fermentation have not been analyzed. The main objective of the present study was to develop dimensionality reduction methods suitable for metabolome data where the observation varies with time, and to extract maximal information from the data.

H. Yamamoto et al. / Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

Finally, smoothed PCA is written as the following generalized eigenvalue problem of the weight vector w:

2. Theory This section explains smoothing methods using differential penalties: dimensionality reduction methods such as PCA, PLS, OPLS, and RFDA; extensions thereof through the introduction of differential penalties; and, nonlinear extensions using kernel methods. These methods are formulated as a generalized eigenvalue problem. 2.1. Differential penalty Eilers [10] introduced smoothing using differential penalties as the ‘Whittaker smoother.’ This problem is formulated as a minimization problem, as follows: 2

2

min ‖s−t‖ + κ‖Ds‖ :

ð1Þ

The observation series t is fitted to the smoothed series s. The first and second differential matrix D is set, as follows: 2

1 60 D =6 4⋮ 0 0

137

−1 1 ⋮ 0

0 −1 ⋮ 0

⋯ ⋯ ⋱ ⋯

3 2 0 0 1 −2 1 6 0 0 7 00 7; D = 6 0 1 −2 4⋮ ⋮ ⋮ 5 ⋮ ⋮ 1 −1 0 0 0

0 ⋯ 1 ⋯ ⋮ ⋱ 0 ⋯

0 0 0 0 ⋮ ⋮ 1 −2

3 0 07 7 ⋮5 1

κ is a parameter of smoothness. The first term in Eq. (1) is the squared error between s and t, and the second term is the differential penalty of the smoothed series s. As the smoothing parameter κ becomes smaller, the squared error between s and t becomes more important. As κ becomes larger, the smoothness of s strengthens. The minimization of Eq. (1) is a trade-off between the squared error and the smoothness. The smoothness is controlled by the smoothing parameter κ. 2.2. Linear multivariate analysis with smoothness

1 ′ ′ ′ X Xw = λðI + κX D DXÞw N where λ is a Lagrange multiplier. wi is a weight vector that corresponds to the i-th largest eigenvalue. The i-th latent variable ti can be calculated as ti = Xwi. The weight vector should be normalized according to the constraint condition as follows: w←w =

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w′ w + κw′ X′ D′ DXw:

2.2.2. Partial least squares (PLS) PLS is explained as the optimization problem of maximizing the covariance between the latent variable t = Xwx, which is a linear combination of the explanatory variable X, and the latent variable s = Ywy, which is a linear combination of the response variable Y: max covðt; sÞ 0 0 s:t: wx wx = 1 and wy wy = 1: X and Y are mean-centered. Finally, PLS is written as the following eigenvalue problem of the weight vector wx: 1 0 0 X YY Xwx = λwx : N2

ð2Þ

The differential penalty of the latent variable t is added to the constraint condition of PLS. Smoothed PLS is formulated as follows: max covðt; sÞ 0

0

0

s:t: wx wx + κðDtÞ ðDtÞ = 1 and wy wy = 1: In a previous study, smoothed PCA [5,6] was introduced in the context of functional data analysis [7], and penalized PLS was also proposed [11]. These methods are suitable for data where the variables vary with time. In the present study, a differential penalty was introduced to the latent variables in each class of PCA, PLS, OPLS, and RFDA. These methods are suitable for data where the observation varies with time. The differential matrix D is set for the g class classification problem as follows: 2

D1 D=4 0 0

0 ⋱ 0

3 0 0 5: Dg

Finally, smoothed PLS is written as the following generalized eigenvalue problem of the weight vector wx: 1 ′ ′ 0 0 X YY Xwx = λðI + κX D DXÞwx : N2 I denotes the identity matrix. The weight vector should be normalized according to the constraint condition as follows: wx ←wx =

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wx 0 wx + κwx 0 X0 D0 DXwx :

2.2.1. Principal component analysis (PCA) PCA is explained as the optimization problem of maximizing the variance of the latent variable t, which is the linear combination of the explanatory variable X:

2.2.3. Orthonormalized partial least squares (OPLS) OPLS, like PLS, is explained as the optimization problem of maximizing the covariance between the latent variable t and s. The constraint condition of OPLS is different from that of PLS:

max varðtÞ

max covðt; sÞ 0

0

s:t: w w = 1

s:t: wx wx = 1 and s s = 1:

where w is a weight vector. X is mean-centered. PCA is written as the following eigenvalue problem:

Finally, OPLS is written as the following eigenvalue problem of the weight vector wx:

1 ′ X Xw = λw: N

1 0 0 −1 0 X YðY YÞ Y Xwx = λwx : N2

The differential penalty of the latent variable t is added to the constraint condition of PCA. Smoothed PCA is formulated as follows:

The differential penalty of the latent variable t is added to the constraint condition of OPLS. Smoothed OPLS is formulated as follows:

max var ðtÞ

max covðt; sÞ

0

0

0

s:t: w w + κðDtÞ ðDtÞ = 1:

0



ð3Þ

0

s:t: wx wx + κðDtÞ ðDtÞ = 1 and ðsÞ = 1 and s s = 1:

138

H. Yamamoto et al. / Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

Finally, smoothed OPLS is written as the following generalized eigenvalue problem of the weight vector wx:

2.3. How to set dummy variables to response variables for supervised dimensionality reduction

1 0 0 −1 0 0 0 X YðY YÞ Y Xwx = λðI + κX D DXÞwx : N2

In discrimination by PLS, OPLS, and RFDA, the response variable Y is set using dummy variables as follows: 2

The weight vector should be normalized according to the constraint condition as follows: wx ←wx =

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wx 0 wx + κwx 0 X0 D0 DXwx :

The relationship between PLS and OPLS should be noted. When the number of samples within the class is the same, the matrix (Y′Y)− 1 in Eq. (3) of OPLS becomes the identity matrix multiplied by the constant. In this case, PLS is equal to OPLS. 2.2.4. Regularized Fisher discriminant analysis (RFDA) FDA is equal to canonical correlation analysis (CCA) when the response variables are a certain formulation of dummy variables [1]. FDA is explained, by using the formulation of CCA, as the optimization problem of maximizing the correlation between the latent variable t and s: max corrðt; sÞ: This conditional equation is rewritten as follows: max cov ðt; sÞ s:t: varðtÞ = 1 and varðsÞ = 1: However, FDA is not applicable to N ≪ p type data for X or Y. In the case in which the explanatory variable X is N ≪ p type data and the l2norm regularized term of the weight vector wx, ||w||2, is introduced to FDA, FDA with a regularized term, regularized FDA (RFDA), is formulated as follows: max cov ðt; sÞ 0

s:t: ð1−τÞvarðtÞ + τwx wx = 1 and varðsÞ = 1: τ is a regularized parameter (0 ≤ τ ≤ 1). Finally, RFDA is written as the following generalized eigenvalue problem of the weight vector wx:   1 0 1 0 0 −1 0 X YðY YÞ Y Xwx = λ ð1−τÞ X X + τI wx : 2 N N The differential penalty of the latent variable t is added to the constraint condition of RFDA. Smoothed RFDA is formulated as follows:

0n1 1n2 ⋮ 0ng

⋯ ⋯ ⋱ ⋯

3 0n1 0n2 7 7: 1ng−1 5 0ng

Barker and Rayens [1] suggested that FDA is equal to CCA when the response variables are dummy variables and defined OPLS as PLS for discrimination. OPLS is explained as a special case of FDA that only maximizes the scatter among classes. The difference between PLS and OPLS is a prior probability of class. The prior probability of PLS is the square of ni/N, while that of OPLS is ni/N. The ni is the number of observations within the i-th class. The prior probability of PLS was recently reported in detail [12]. 2.4. Nonlinear multivariate analysis using a kernel method with smoothness In this section, the authors propose kernel-smoothed PCA, PLS, OPLS, and FDA, which are the nonlinear extensions of smoothed PCA, PLS, OPLS, and RFDA, respectively, using kernel methods. Kernel methods start by mapping the original data onto a high-dimensional feature space corresponding to the reproducing kernel Hilbert space (RKHS). The inner product of samples ϕx and ϕy in the feature space bϕx, ϕyN can be replaced by the kernel function k(x, y), which is positive definite: 〈ϕx ; ϕy 〉 = kðx; yÞ: This formulation is called the “kernel trick.” Multivariate analysis can be extended to nonlinear multivariate analysis using the kernel trick, without explicit knowledge of the mapping in the feature space. The polynomial kernel and Gaussian kernel have been widely used as the kernel functions. A detailed theory of kernel methods has been outlined previously [4]. One of the advantages of formulation using kernel methods is that nonlinear multivariate analysis can be easily computed, because the computational costs for N ≪p data, especially leave-oneout cross-validation, are reduced by using the linear kernel [13]. 2.4.1. Kernel PCA Kernel PCA [14] is explained as the optimization problem of maximizing the variance of the latent variable t = Φw, which is the linear combination of the explanatory variable Φ in the feature space. It is assumed that there exists a coefficient α which satisfies w = Φ′α and denotes K = ΦΦ′. The latent variable t can be calculated as t = Kα. max varðtÞ 0

s:t: α Kα = 1:

max cov ðt; sÞ 0



s:t: ð1−τÞvarðtÞ + τwx wx + κðDtÞ ðDtÞ = 1 and varðsÞ = 1:

Finally, smoothed RFDA is written as the following generalized eigenvalue problem of the weight vector wx:   1 0 1 0 0 −1 0 0 0 X X YðY YÞ Y Xw = λ ð1−τÞ X + τI + κX D DX wx : x N N2 The weight vector should be normalized according to the constraint condition as follows: wx ←wx =

1n1 6 0n2 Y=6 4 ⋮ 0ng

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1−τÞð1 = NÞwx 0 X0 Xwx + τwx 0 wx + κwx 0 X0 D0 DXwx :

K is mean-centered. Finally, kernel PCA is written as the following generalized eigenvalue problem of the coefficient vector α: 1 2 K α = λKα: N when K is rank-deficient, K on the right hand of the above equation can be approximated to be K +εI where ε is a small positive value. The differential penalty of the latent variable t is added to the constraint condition of kernel PCA. Kernel-smoothed PCA is formulated as follows: max varðtÞ 0

0

s:t: α Kα + κðDtÞ ðDtÞ = 1:

H. Yamamoto et al. / Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

Finally, kernel-smoothed PCA is written as the following generalized eigenvalue problem of the coefficient vector α: 1 2 0 K α = λðK + κKD DKÞα: N αi is a coefficient vector corresponding to the i-th largest eigenvalue. The latent variable can be calculated as ti = Kαi. The coefficient vector should be normalized according to the constraint condition as follows:

139

The differential penalty of the latent variable t is added to the constraint condition of kernel-smoothed OPLS. Kernel-smoothed OPLS is formulated as follows: max cov ðt; sÞ 0

0

0

s:t: αx Kα x + κðDtÞ ðDtÞ = 1 and s s = 1: Finally, kernel-smoothed OPLS is written as the following generalized eigenvalue problem of the coefficient vector αx:

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi α←α = α 0 Kα + κα 0 KD0 DKα :

1 0 −1 0 0 KYðY YÞ Y Kα x = λðK + κKD DKÞα x : N2 2.4.2. Kernel PLS Rosipal and Trejo [15] introduced kernel PLS using a nonlinear iterative partial least squares (NIPALS) algorithm. Like PLS, kernel PLS is explained as the optimization problem of maximizing the covariance between the latent variable t, which is a linear combination of the explanatory variable Φ in the feature space and the latent variable s, which is a linear combination of the response variable Y: max covðt; sÞ 0

0

s:t: αx Kα x = 1 and wy wy = 1: Finally, kernel PLS is written as the following generalized eigenvalue problem of the coefficient vector αx: 1 0 KYY Kα x = λKα x : N2

0

s:t: ð1−τÞvarðtÞ + τα x Kα x = 1 and varðsÞ = 1: Finally, kernel FDA is formulated as the following generalized eigenvalue problem of the coefficient vector αx:

max covðt; sÞ 0

0

s:t: αx Kα x + κðDtÞ ðDtÞ = 1 and wy wy = 1: Finally, kernel-smoothed PLS is written as the following generalized eigenvalue problem of the coefficient vector αx: 1 0 0 KYY Kx = λðK + κKD DKÞα x : N2 The coefficient vector should be normalized according to the constraint condition as follows: αx ← αx =

2.4.4. Kernel FDA Akaho [16] and Bach and Jordan [17] introduced kernel CCA, which is a kernelized version of regularized CCA (RCCA). Kernel FDA [18] was introduced by using the l1-norm penalty to solve the sparse solution for the coefficient vector. The present study proposes kernel FDA that is different from the kernel FDA of Mika et al. [18]. In this section as well as in Section 2.2.4, kernel FDA is explained by using the formulation of kernel CCA. Kernel FDA is formulated as follows: max covðt; sÞ

The differential penalty of the latent variable t is added to the constraint condition of kernel PLS. Kernel-smoothed PLS is formulated as follows:

0

The coefficient vector should be normalized according to the constraint condition as follows: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi α x ← α x = α x0 Kα x + κα x0 KD0 DKα x :

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi α x0 Kα x + κα x0 KD0 DKα x :

2.4.3. Kernel OPLS Like kernel PLS, kernel OPLS is explained as the optimization problem of maximizing the covariance between the latent variable t, which is a linear combination of the explanatory variable Φ in the feature space, and the latent variable s, which is a linear combination of the response variable Y. The constraint condition of kernel OPLS is different from that of kernel PLS: max covðt; sÞ

  1 1 2 0 −1 0 K KYðY YÞ Y Kα = λ ð1−τÞ + τK αx : x N N2 The differential penalty of the latent variable t is added to the constraint condition of kernel FDA. Kernel-smoothed FDA is formulated as follows: max covðt; sÞ 0



0

s:t: ð1−τÞvarðtÞ + τα x Kα x + κðDtÞ ðDtÞ = 1 and wy wy = 1:

Finally, kernel-smoothed FDA is written as the following generalized eigenvalue problem of the coefficient vector αx:   1 1 2 0 −1 0 0 K KYðY YÞ Y Kα = λ ð1−τÞ + τK + κKD DK αx : x N N2 The coefficient vector should be normalized according to the constraint condition as follows:

αx ← αx =

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1−τÞð1 = NÞα x0 K2 α x + τα x0 Kα x + κα x0 KD0 DKα x :

s:t: αx Kα x = 1 and s s = 1:

All these generalized eigenvalue problems can be computed easily using Cholesky decomposition and singular value decomposition (see Appendix).

Finally, kernel OPLS is written as the following generalized eigenvalue problem of the coefficient vector αx:

2.5. Software

1 0 −1 0 KYðY YÞ Y Kα x = λKα x : N2

The computer programs in the present study were developed inhouse by the authors with the personal computer version of MATLAB

0

0

140

H. Yamamoto et al. / Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

Fig. 1. First latent variable by principal component analysis (PCA) and smoothed PCA for artificial dataset. (A) Result of PCA. (B) Result of smoothed PCA (κ = 0.1).

7 (Mathworks, Natick, MA, USA). The computer system used to run the programs has a 2.4 GHz CPU and a 512 MB memory. 3. Data 3.1. Artificial dataset An artificial dataset that contains five samples and twenty variables was synthesized. The first variable is equivalent to the sample number [1 2 3 4 5], and the others are uniform random variables. This dataset is designed to validate a proposed method that can reduce the dimensions, without losing information in which the observation of a certain variable in the dataset varies with time.

3.2. Experimental dataset The Saccharomyces cerevisiae strain MT8-1/Xyl, constructed in a previous study [8] by integrating genes for the intercellular expressions of xylose reductase and xylitol dehydrogenase from Pichia stipitis, and xylulokinase from S. cerevisiae, was adapted to xylose as the sole carbon source. The strain was pre-cultured in 5 ml SD medium (20 g/l glucose and 6.7 g/l yeast nitrogen base without amino acids [Difco Laboratories, Detroit, MI, USA], and appropriate supplements), and the cells were washed twice with sterile water. Adaptation was carried out by incubating the cells three times in 50 ml SX medium (20 g/l xylose, 6.7 g/l yeast nitrogen base without amino acids, and appropriate supplements). Ethanol fermentation

Fig. 2. Results of PCA (A), partial least squares (PLS) (B), orthonormalized PLS (OPLS) (C), and regularized Fisher discriminant analysis (RFDA) (τ = 0.1) (D). Symbols: (○) native strain; (▽) strain adapted under aerobic conditions; (□) strain adapted under anaerobic conditions.

H. Yamamoto et al. / Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

141

Fig. 3. Results of smoothed PCA (A), PLS (B), OPLS (C), and RFDA (τ = 0.1) (D). Symbols as in Fig. 2. Symbols within each class are connected from 0 h to 96 h observations by the dotted line.

was carried out with three strains: the native strain, MT8-1/Xyl, and strains adapted to xylose under aerobic conditions and anaerobic conditions. After pre-culture in 5 ml SD medium, these strains were cultivated in 500 ml SDC medium (SD medium with 20 g/l casamino acids [Difco]). The cells were washed twice and ethanol fermentation was carried out in 100 ml SXC medium (50 g/l xylose, 6.7 g/l yeast nitrogen base without amino acids, appropriate supplements, and 20 g/l casamino acids). Culture samples were collected in triplicate at 0, 8, 16, 36, 60, and 96 h. The intracellular metabolites were analyzed by CE (Beckman Coulter, Fullerton, CA, USA)–MS (Applied Biosystems, Foster City, CA, USA) and GC (Agilent Technologies, Wilmington, DE, USA)–TOFMS (Leco, St. Joseph, MI, USA). Almost all the intracellular metabolites were identified and the concentration of each metabolite was collected in the data matrix with each element. There were 53 observations and 114 variables. Each sample was categorized into three classes: native strain (Class 1), the strain adapted to xylose under aerobic conditions (Class 2), and the strain adapted under anaerobic conditions (Class 3). The number of samples in each class was 18, 18, and 17, respectively, as one sample was lost through experimental error.

4.2. Visualization results of PCA, PLS, OPLS, and RFDA with smoothness Visualization results using PCA, PLS, OPLS, and RFDA, in which the high-dimensional data is projected onto a two-dimensional subspace, are shown in Fig. 2. The regularized parameter of RFDA was set to τ = 0.1 (Fig. 2(D)). The methodology for setting the regularized parameter τ is described in the following section. Supervised dimensionality reduction, PLS, OPLS, and RFDA, achieved sufficient separability between classes compared with PCA. Perfect separability between the native strain and the adapted strains was achieved by RFDA in the first axis, which can therefore be interpreted as the axis concerned with adaptation. In ethanol fermentation from xylose, it has been reported that cell growth and ethanol production were increased by adaptation under anaerobic

4. Results and discussion 4.1. Comparison of PCA and smoothed PCA for artificial dataset PCA and smoothed PCA were applied to the artificial data described in Section 3.1. The first latent variable, using PCA (Fig. 1(A)) and smoothed PCA (Fig. 1(B)), was plotted against the sample number. The smoothing parameter κ was set to 0.1 and the first differential matrix D′ was used. An approximately linear relationship existed between the first latent variable of smoothed PCA and the number of samples (Fig. 1(B)). These results suggest that smoothed PCA successfully preserves the information from the original data in which the observation varies with time.

Fig. 4. Plot of leave-one-out cross-validation (LOOCV).

142

H. Yamamoto et al. / Chemometrics and Intelligent Laboratory Systems 98 (2009) 136–142

conditions [9]. In the second axis, the separability between the three strains was satisfactory (Fig. 2(D)). The results of smoothed PCA, PLS, OPLS, and RFDA are shown in Fig. 3. The smoothing parameter κ was set to 0.1 and the second differential matrix D″ was used in Fig. 3(D). Compared with the results using the original methods, shown in Fig. 2, the effect of smoothness is apparent, especially in smoothed PCA. The first axis in smoothed PCA (Fig. 3(A)) can be interpreted as the axis concerned with time change. The visualization results suggest that the two adapted strains have similar tendencies, compared with the native strain. Smoothed PCA thus seems to preserve the information from the original data, in which the observation varies with time. The results shown in Figs. 2 and 3 suggest that there is a clear difference among strains developed by adaptation. In the present study, additional useful information was not extracted using kernel methods, as compared with linear methods (results not shown). In a previous study [13], it was found that in the case of regression, the calculation time for RCCA, which corresponds to RFDA, was significantly reduced compared to kernel CCA with a linear kernel. Length of calculation time mainly depends on the generalized eigenvalue problem for N ≪ p type data [13]. Smoothed PCA, PLS, OPLS, and RFDA are all formulated as generalized eigenvalue problems. The kernel formulation is, therefore, useful to reduce the computational time for a linear method as well as for a nonlinear method. 4.3. Determination of regularized parameter of RFDA The regularized parameter τ in RFDA was determined using leaveone-out cross-validation (LOOCV). One sample was removed as a test sample and the lower dimensional subspace was computed using the rest of the samples. In this subspace, Mahalanobis distance was used to determine the class in which a test sample belongs. This operation was iterated for all the samples. The sum of the number of misclassifications against the regularized parameter τ is plotted in Fig. 4. The regularized parameter corresponding to the minimum value of the sum of the number of misclassifications is optimal. The LOOCV error by PLS, OPLS, and RFDA was 17, 17, and 12, respectively. This result suggests that the separability of RFDA was higher than that of PLS and OPLS. 5. Conclusion In the present study, the ordinary dimensionality reduction methods of PCA, PLS, OPLS, and RFDA were extended by the introduction of a differential penalty to the latent variables in each class. These methods were evaluated by their application to metabolome data of intracellular metabolites of a xylose-fermenting yeast during ethanol fermentation. A nonlinear extension of these methods, using a kernel method, was also proposed. The visualization results in the low-dimensional subspace suggest that smoothed PCA successfully preserves the information about the time course in changes of observations during fermentation, and supervised dimensionality reduction, especially RFDA, can produce high separation among classes. These methods may be useful when employed appropriately according to the data, or for the purpose of visualization of high-dimensional data. Acknowledgement This work was financially supported by the Special Coordination Funds for Promoting Science and Technology, Creation of Innovation

Centers for Advanced Interdisciplinary Research Areas (Innovative Bioproduction Kobe), MEXT, Japan. Appendix A The computation of a generalized eigenvalue task is sometimes unstable numerically. To avoid this problem, in the present study the generalized eigenvalue problem is simplified to the eigenvalue problem as follows: Ax = λBx: The matrices A and B are to be decomposed by Cholesky decomposition to P′P and Q′Q: 0

0

P Px = λR Rx: The inverse matrix of R′ is multiplied by the left side: 0 −1

ðR Þ

0

0 −1

P Px = λðR Þ

0

R Rx = λRx:

y is set to as x = R− 1y: 0 −1

ðR Þ

0

−1

P PR

−1

y = λRR

y = λy:

The singular value decomposition (SVD) to PR− 1 is performed, and y is the singular vector corresponding to the maximum singular value. Finally, x is determined by x = R− 1y. References [1] M. Barker, W. Rayens, Partial least squares for discrimination, J. Chemom. 17 (2001) 166–173. [2] K. Fukunaga, Introduction to Statistical Pattern Recognition2nd ed., Academic Press, Boston, 1990. [3] R. Rosipal, N. Krämer, Overview and recent advances in partial least squares, in: C. Saunders, M. Grobelnik, S. Gunn, J. Shawe-Taylor (Eds.), Subspace, Latent Structure and Feature Selection Techniques, Springer, New York, 2006, pp. 34–51. [4] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004. [5] B.W. Silverman, Smoothed functional principal components analysis by choice of norm, Ann. Stat. 24 (1996) 1–24. [6] Z.P. Chen, J.H. Jiang, Y. Li, H.L. Shen, Y.Z. Liag, R.Q. Yu, Determination of the number of components in mixtures using a new approach incorporating chemical information, Anal. Chim. Acta 13 (1999) 15–30. [7] J.O. Ramsay, B.W. Silverman, Functional Data Analysis2nd ed., Springer, New York, 2005. [8] S. Katahira, A. Mizuike, H. Fukuda, A. Kondo, Ethanol fermentation from lignocellulosic hydrolysate by a recombinant xylose- and cellooligosaccharideassimilating yeast strain, Appl. Microbiol. Biotechnol. 72 (2006) 1136–1143. [9] M. Sonderegger, U. Sauer, Evolutionary engineering of Saccharomyces cerevisiae for anaerobic growth on xylose, Appl. Environ. Microbiol. 69 (2003) 1990–1998. [10] P.H.C. Eilers, A perfect smoother, Anal. Chem. 75 (2003) 3631–3636. [11] N. Krämer, A.L. Boulesteix, G. Tutz, Penalized partial least squares based on Bsplines transformations, Chemom. Intell. Lab. Syst. 94 (2008) 60–69. [12] U. Indahl, H. Martens, T.J. Næs, From dummy regression to prior probabilities in PLS-DA, J. Chemom. 21 (2007) 529–536. [13] H. Yamamoto, H. Yamaji, E. Fukusaki, H. Ohno, H. Fukuda, Canonical correlation analysis for multivariate regression and its application to metabolic fingerprinting, Biochem. Eng. J. 40 (2008) 199–204. [14] B. Schölkopf, A.J. Smola, K.R. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural. Comp. 10 (1998) 1299–1319. [15] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, JMLR 2 (2001) 97–123. [16] S. Akaho, A kernel method for canonical correlation analysis, International Meeting of Psychometric Society, 2001. [17] F.R. Bach, M.I. Jordan, Kernel independent component analysis, JMLR 3 (2002) 1–48. [18] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.R. Müller, Fisher discriminant analysis with kernels, in: Y.-H. Hu, J. Larsen, E. Wilson, S. Douglas (Eds.), Neural Networks for Signal Processing IX, August, 1999, Madison, USA, IEEE, New York, 1999, pp. 41–48.