Robustness of PARAFAC and N-PLS regression models in relation to homoscedastic and heteroscedastic noise

Robustness of PARAFAC and N-PLS regression models in relation to homoscedastic and heteroscedastic noise

Chemometrics and Intelligent Laboratory Systems 88 (2007) 35 – 40 www.elsevier.com/locate/chemolab Robustness of PARAFAC and N-PLS regression models ...

432KB Sizes 0 Downloads 15 Views

Chemometrics and Intelligent Laboratory Systems 88 (2007) 35 – 40 www.elsevier.com/locate/chemolab

Robustness of PARAFAC and N-PLS regression models in relation to homoscedastic and heteroscedastic noise T. Khayamian ⁎ Department of Chemistry, Isfahan University of Technology, Isfahan 84154, Iran Received 30 March 2006; received in revised form 8 October 2006; accepted 20 October 2006 Available online 11 December 2006

Abstract In this study, the robustness of the parallel factor analysis (PARAFAC) and N-way partial least squares (N-PLS) regression models were investigated in relation to homoscedastic and heteroscedastic noise (fliker noise) for the data, Claus, with the simulated noise. The Claus data, loaded from the N-way toolbox for MATLAB [C.A. Andersson, R. Bro. The N-way Toolbox for MATAB Chemom. Intell. Lab. Sys. 52 (2000) 1.]. The data consisted of five samples, 201 emission wavelengths and 61 excitation wavelengths. The simulated homoscedastic and heteroscedastic noise were added to the original data and the predictive ability of the models was studied. The results showed that the data and the models are robust with respect to these types of noise (without correlation). One of the reasons for robustness of the models might be attributed to the large number of the data points in the original data. This possibility was examined by constructing three-way arrays from the original data with the lower dimensions in the excitation wavelengths. Three-way arrays were created with dimensions of 5 × 201 × 31, 5 × 201 × 16, 5 × 201 × 8 and 5 × 201 × 4. The performance of each model was evaluated by calculating the root mean squared errors of cross validation (RMSECV) for the analytes using the leave one sample out method. The results of the N-PLS models showed that the RMSECV values were enhanced by decreasing the dimensions of both the original data and the same data with the simulated noise. However, the RMSECV changes for the noisy data are much larger than the original data. The results of N-PLS models in different three-way arrays with or without noise are better than the PARAFAC models. © 2006 Elsevier B.V. All rights reserved. Keywords: Homoscedastic and heteroscedastic noise; PARAFAC; N-PLS

1. Introduction Noise or random error consists of extraneous information that is unwanted because it degrades the accuracy and precision [2]. Noise can be classified as being homoscedastic or heteroscedastic. If noise is independent of the signal, variables and samples with normal distribution and a constant variance it is considered as a homoscedastic noise. Otherwise, it is a heteroscedastic noise. In other words, if variances of a signal at two different variables, by repeating the measurements for one sample or several samples, are significantly different, the system has a heteroscedastic noise. Noise from different samples and/or variables can be correlated and in this case noise is not only heteroscedastic but also it is a correlated noise. In spec⁎ Tel.: +98 3113912351; fax: +98 3113912350. E-mail address: [email protected]. 0169-7439/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2006.10.006

troscopy, heteroscedastic noise can be classified as a flicker noise and a shot noise. When standard deviation of the noise is proportional to the magnitude of the signal, it is the flicker noise and when it is proportional to the square root of the magnitude of the signal, it is the shot noise. There are different methods such as ensemble averaging, Savitsky Golay smoothing filter [3], Fourier transform and wavelet transform for noise reduction [4]. However, it should be emphasized that the application of any noise reduction technique is dependent on the limiting noise of the system. In analytical chemistry noise is important for several reasons. First, noise affects precision and can reduce the detection limit. Second, noise can affect the predictive ability of a model, generated between the response and concentration. In this work, the effect of noise on performance of the constructed models as the predictive models has been investigated. Noise in analytical signals is often assumed to be a homoscedastic noise [5]. However,

36

T. Khayamian / Chemometrics and Intelligent Laboratory Systems 88 (2007) 35–40

in many cases, noise is not homoscedastic and this is the reason for using weighted linear regression instead of unweighted linear regression in zero order calibration and weighted principal component analysis and the maximum likelihood PCA (MLPCA) [6] in the first order calibration. The heteroscedasticity of noise has also been incorporated in constructing models for the second order calibration using models such as positive matrix factorization (PMF3) [7] and maximum likelihood parallel factor analysis (MLPARAFAC) [8]. In this work, we investigated the robustness of the conventional PARAFAC and N-PLS regression models in relation to homoscedastic and heteroscedastic noise (without correlation). The robustness means that the relative errors or RMSECV of the analytes were not changed significantly when the simulated noise was added to the original data. The investigation was performed on the fluorescence spectroscopy data with the name of Claus, loaded from the N-way toolbox for MATLAB [1] and the simulated noise was added to the original data. In addition, the effects of noise on robustness of the models with lower dimensional three-way arrays, which were created from the original data, were also investigated. 1.1. Theoretical considerations 1.1.1. Multi-way methods Parallel factor analysis (PARAFAC) and N-way partial least squares (N-PLS) are well known as trilinear models [9]. Each method has its own advantages relative to another. PARAFAC has the uniqueness property of the solution, so it is able to estimate

the true underlying profiles, whereas trilinear PLS uses the dependent and independent variables for constructing the model. Therefore, the predictive ability of the N-PLS model might be better than PARAFAC model. 1.1.2. Wavelet transform for denoising Wavelet can be used as a powerful tool for signal denoising [4]. The method is based on the decomposition of the signal in a wavelet domain by selecting a wavelet filter and a suitable level of the decomposition. Then, a threshold is selected and applied to the wavelet coefficients. Finally, the signal is reconstructed by inverse transformation to the native domain. Wavelets can potentially be utilized not only for the denoising of homoscedastic noise but also for the heteroscedastic noise using level independent or level dependent threshold, respectively. 2. Experimental 2.1. Data set and computation The fluorescence spectroscopy data with the name of Claus were loaded down from the website http://www.models.kvl.dk [1]. The dimension of the data is 5 × 201 × 61. The data consist of five samples and each sample has three analytes: tyrosine (analyte 1), tryptophane (analyte 2) and phenylalanine (analyte 3); 201 emission and 61 excitation wavelengths. In order to add noise to the data, first the data were unfolded to a two way matrix with dimensions of 5 × 201 × 61. Then, a simulated homoscedastic noise was constructed with the same matrix dimensions

Fig. 1. A) Excitation–emission landscape for sample 1, B) simulated homoscedastic noise and C) simulated heteroscedastic noise landscapes.

T. Khayamian / Chemometrics and Intelligent Laboratory Systems 88 (2007) 35–40

(5 × 12,261) of the original data, while the standard deviation of the simulated noise was 10, 15, 20 and 25% of the maximum value in the original Claus data. These two matrices were added together. The simulated heteroscedastic noise was constructed by creating a matrix of random numbers drawn from N (0, 1) with the same matrix dimensions of the original data. Matrices were then constructed, while their values were one, one and a half, two and two and a half tenth of the original data. Finally, these matrices were multiplied by the matrix of random numbers, element by element [8]. It should be noticed that the relative standard deviation of the simulated heteroscedastic noise at each emission wavelength is constant. All computation was carried out on a Pentium 4, 1.5 GHz PC computer. The PARAFAC and N-PLS algorithms were obtained from the N-way toolbox [1]. The wavelet filter coefficients were obtained from wavelet toolbox in MATLAB 7.0 [10] (Math Works). 3. Results and discussion Fig. 1 shows an example of landscapes for sample 1, simulated homoscedastic noise and heteroscedastic noise, which were added to the sample. For preliminary studies, two data sets were constructed from the original data set. The first data set consisted of sample numbers 1, 2, 3 and 5, and another data set consisted of sample numbers 1, 2, 3 and 4. The PARAFAC and N-PLS models were built for the two data sets and sample numbers 4 and 5 were selected as the test samples for the first and second data sets, respectively. Three factors were selected for all of the calculations for PARAFAC and NPLS models. No constraints were imposed on three modes of the Table 1 Relative errors for samples 4 and 5 using PARAFAC models Sample 4

Sample 5

Analyte

1

2

3

The original data

− 2.74

− 1.82

6.85

The original data with the simulated homoscedastic Noise standard deviations: − 3.11 − 2.12 6.71 10% Imax − 3.31 − 2.28 6.59 15% Imax 20% Imax − 3.54 − 2.40 6.43 25% Imax − 3.75 − 2.55 6.22 Denoised data of the 10% Imax 15% Imax 20% Imax 25% Imax

1

2

3

− 2.25

0.78

11.60

0.80 0.77 0.68 0.56

11.92 12.02 12.10 12.13

noise − 2.20 − 2.18 − 2.15 − 2.12

noisy data with the simulated homoscedastic noise − 3.08 − 2.10 6.32 − 2.18 0.86 11.56 − 3.27 − 2.22 6.39 − 2.15 0.92 11.89 − 3.47 − 2.34 6.46 − 2.11 0.89 12.24 − 3.67 − 2.52 6.43 − 2.07 0.84 12.49

The original data with the simulated heteroscedastic noise Noise at each emission channel: 10% I − 2.88 − 1.63 6.86 − 2.24 15% I − 2.94 − 1.54 6.86 − 2.24 20% I − 3.01 − 1.44 6.86 − 2.23 25% I − 3.08 − 1.37 6.85 − 2.24 Denoised data of the 10% I 15% I 20% I 25% I

0.81 0.83 0.84 0.85

11.55 11.53 11.50 11.47

noisy data with the simulated heteroscedastic noise − 2.87 − 1.68 6.80 2.24 0.80 11.50 − 2.94 − 1.60 6.78 − 2.24 0.83 11.45 − 3.01 − 1.51 6.72 − 2.24 0.84 11.42 − 3.0 − 1.43 6.65 − 2.24 0.85 11.40

37

Table 2 Relative errors for samples 4 and 5 using N-PLS model Sample 4

Sample 5

Analyte

1

2

3

1

2

3

The original data

− 2.87

− 0.43

4.86

− 1.66

0.44

11.24

− 0.01 − 0.24 − 0.48 − 0.73

11.86 12.11 12.31 12.48

The original data with the simulated homoscedastic noise Noise standard deviations: 10% Imax − 3.07 − 0.22 4.99 − 1.56 − 3.18 − 0.12 4.99 − 1.52 15% Imax 20% Imax − 3.31 − 0.03 4.95 − 1.48 25% Imax − 3.44 0.04 4.88 − 1.45 Denoised data of the 10% Imax 15% Imax 20% Imax 25% Imax

noisy data with the − 3.04 − 0.28 − 3.12 − 0.17 − 3.20 − 0.04 − 3.28 0.03

simulated homoscedastic noise 4.69 − 1.56 0.01 11.60 4.84 − 1.53 − 0.07 12.00 4.99 − 1.50 − 0.26 12.45 5.07 − 1.47 − 0.47 12.84

The original data with the simulated heteroscedastic noise Noise at each emission channel: 10% I − 2.99 − 0.27 4.89 − 1.61 15% I − 3.06 − 0.20 4.90 − 1.58 20% I − 3.12 − 0.12 4.91 − 1.56 25% I − 3.19 − 0.04 4.91 − 1.53 Denoised data of the 10% I 15% I 20% I 25% I

noisy data with the − 2.99 − 0.36 − 3.05 − 0.29 − 3.11 − 0.22 − 3.18 − 0.15

0.42 0.41 0.40 0.39

11.22 11.21 11.20 11.19

simulated heteroscedastic noise 4.87 − 1.61 0.41 11.18 4.87 − 1.58 0.41 11.15 4.83 − 1.56 0.39 11.14 4.79 − 1.54 0.36 11.15

PARAFAC models. The convergence value, 1 × 10− 6, and number of iterations, 2500, were not changed. The calculation of the analyte concentrations in test sample number 4 was conducted using PARAFAC model by the following procedure. First, the scores and loadings for the first data set were calculated. The result of the decomposition is three matrices A, B and C. Each column of the matrix A has score values for one of the analytes. Each column of the matrix A was plotted versus the analyte concentration of the corresponding compound. Therefore, three regression equations were obtained. Then by using the following equations [9], the score values for three analytes in sample 4 were calculated. Z ¼ kr ðC;BÞ

ð1Þ

Un ¼ reshape ðUn;12261;1Þ

ð2Þ

Score Un ¼ pinv ðZ ÞTUn

ð3Þ

where kr is Khatri–Rao product, Un is the result of converting matrix of sample 4 to a vector and pinv (Z) is pseudo inverse for Z. The calculated scores and the corresponding regression equations were used to calculate the analyte concentrations for each compound in sample 4. This procedure was also repeated for sample 5 using the second data set. The prediction of the analyte concentrations in samples 4 and 5 were also conducted using N-PLS algorithm. These procedures were repeated for the data with the homoscedastic and heteroscedastic noise as well as the denoised data. Wavelet was used for denoising and in order to apply wavelet it is required that the number of data points for each sample to be a power of two. Therefore, 4123 columns were appended to each row of the original matrix using mirror padding method for each sample [11]. Each sample has 16,384 (214) columns

38 Table 3 Root mean squared errors of cross validation of the analytes for different three-way arrays originated from the original data, original data with the simulated homoscedastic noise, original data with the simulated heteroscedastic noise and denoised of the data using N-PLS models Analyte (2)

Analyte (3)

RMSECV * 10+ 7

RMSECV * 10+ 7

RMSECV * 10+ 5

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

61 wavelengths 0.99 1.01 1.01 2.0 2.0

1.01 2.0

1.03 4.0

1.01 2.0

1.05 6.1

1.01 2.0

1.03 1.06 4.0

1.38 30.2

2.63 148.1

31 wavelengths 0.99 1.04 1.06 5.1 7.1

1.01 2.0

1.04 5.1

1.05 1.13 6.1 14.1

1.01 2.0

1.03 1.05 4.0

1.74 65.8

3.85 266.7

1.09 1.24 1.40 3.8 18.1 33.3

1.77 68.6

16 wavelengths 1.00 1.14 1.24 14.0 24.0

1.05 1.10 1.16 1.34 5.0 10.0 16.0 34.0

1.04 4.0

1.09 1.06 2.26 9.0 113.2

5.94 460.4

1.21 1.47 1.61 14.2 38.7 51.9

2.03 91.5

8 wavelengths 0.96 1.23 1.37 1.10 1.23 1.30 1.65 1.09 1.30 1.03 3.70 28.1 42.7 14.6 28.1 35.4 71.9 13.5 35.4 259.2

10.24 894.2

1.24 1.68 1.53 2.40 20.4 63.1 48.5 133.0

1.05 − 0.9

1.11 4.7

1.16 9.4

8

9

1.54 1.05 45.3 − 0.9

1

1.08 7.35 1.9

2 7.63 3.8

3

4

5

6

7

8

9

8.05 7.34 9.5 − 0.1

7.33 7.38 −0.3 0.4

7.75 7.33 7.30 5.4 −0.3 − 0.7

1.09 1.19 7.34 3.8 13.3

7.74 8.44 7.28 5.5 15.0 − 0.8

7.23 7.52 −1.5 2.5

7.93 7.27 7.17 8.0 −1.0 − 2.3

1.24 1.49 7.31 17.0 40.6

7.92 9.21 7.30 8.3 26.0 − 0.1

7.30 7.62 −0.1 4.2

8.14 7.30 7.24 11.4 −0.1 − 1.0

1.29 1.68 7.60 8.89 11.79 25.2 63.1 17.0 55.1

7.71 1.5

7.84 8.32 3.2 9.5

9.12 20.0

7.75 2.0

7.86 3.4

4 wavelengths 0.89 1.15 1.77 1.10 1.31 1.29 1.69 1.09 1.30 1.57 7.02 19.32 1.30 2.19 2.35 4.04 1.48 2.00 8.22 10.00 16.13 7.78 7.37 9.08 9.72 7.92 7.50 29.2 98.9 23.6 47.2 44.9 89.9 22.5 46.1 347.1 1130.6 −17.2 39.5 49.7 157.3 − 5.7 27.4 21.7 96.2 − 5.4 −10.3 10.5 18.3 −3.7 − 8.8 The description of integers from 1 to 9 indicates: 1. RMSECVof analytes for different three-way arrays originated from the original data. 2. RMSECVof analytes for different three-way arrays originated from the original data with the simulated homoscedastic noise (noise standard deviation = 10% Imax). 3. RMSECV of analytes for different three-way arrays originated from the original data with the simulated homoscedastic noise (noise standard deviation = 20% Imax). 4. RMSECVof analytes for different three-way arrays originated from the original data with the simulated heteroscedastic noise (noise level at each emission wavelength is 10% of its value). 5. RMSECV of analytes for different three-way arrays originated from the original data with the simulated heteroscedastic noise (noise level at each emission wavelength is 20% of its value). 6. RMSECVof analytes for different three-way arrays originated from the denoised data. The noisy data were constructed from the original data with the simulated homoscedastic (noise standard deviation = 10% Imax). 7. RMSECV of analytes for different three-way arrays originated from the denoised data. The noisy data were constructed from the original data with the simulated homoscedastic noise (noise standard deviation = 20% Imax). 8. RMSECVof analytes for different three-way arrays originated from the denoised data. The noisy data were constructed from the original data with the simulated heteroscedastic noise (noise level at each emission wavelength is 10% of its value). 9. RMSECV of analytes for different three-way arrays originated from the denoised data. The noisy data were constructed from the original data with the simulated heteroscedastic noise (noise level at each emission wavelength is 20% of its value).

T. Khayamian / Chemometrics and Intelligent Laboratory Systems 88 (2007) 35–40

Analyte (1)

Analyte (1)

Analyte (2) +7

RMSECV * 10 1

2

3

Analyte (3) +6

RMSECV * 10+ 4

RMSECV * 10 4

5

6

7

8

2

3

4

5

6

7

8

9

8.77 6.96 0.3

6.98 0.3

7.01 0.7

6.96 0.0

6.96 0.0

6.98 0.3

7.00 0.5

6.96 0.0

6.96 1.36 0.0

31 wavelengths 8.76 8.80 8.84 8.78 8.79 8.79 8.86 8.78 0.4 0.9 0.2 0.3 0.3 1.1 0.2

8.79 6.96 0.3

6.96 0.0

6.97 0.1

6.96 0.0

6.96 0.0

6.96 0.0

6.97 0.1

16 wavelengths 8.72 8.76 8.80 8.73 8.73 8.75 8.80 8.73 0.4 0.9 0.1 0.1 0.3 0.9 0.1

8.73 6.96 6.94 6.91 0.1 − 0.3 − 0.7

6.96 0.0

6.99 0.1

61 wavelengths 8.74 8.78 8.81 8.76 8.77 8.77 8.84 8.76 0.4 0.8 0.2 0.3 0.3 1.1 0.2

9

1

8 wavelengths 8.72 8.76 8.89 8.72 8.72 8.74 8.80 8.72 8.71 6.98 0.4 1.9 0.0 0.0 0.2 0.9 0.0 − 0.1 4 wavelengths 8.60 8.81 9.11 2.4 5.9

8.65 8.70 8.72 8.92 8.66 0.5 1.1 1.4 3.7 0.7

6.98 6.94 0.0 − 0.5

4

5

6

1.52 11.7

1.37 0.7

1.37 0.7

1.43 1.50 5.1 10.2

6.96 6.95 1.37 0.0 − 0.1

1.46 1.56 6.5 13.8

1.37 0.0

1.37 0.0

6.96 6.93 6.91 0.0 − 0.4 −0.7

6.96 6.95 1.36 0.0 − 0.1

1.37 0.7

6.99 0.1

6.99 0.1

1.46 1.59 7.3 16.9

6.98 0.0

6.98 0.0

1

6.99 1.36 0.1

2 1.44 5.8

8.70 6.99 6.88 6.79 6.96 6.94 6.89 6.80 6.97 6.94 1.36 1.27 1.1 − 1.5 − 2.8 − 0.4 − 0.7 − 1.4 −2.7 − 0.2 − 0.7 − 6.6

3

1.36 1.35 1.35 0.0 − 0.7 −0.7

1.47 8.0

8

9

1.37 0.7

1.37 0.7

1.46 1.56 1.36 6.5 13.8 − 0.7

1.37 0.0

1.36 0.0

1.37 0.7

1.40 2.9

1.37 0.7

1.41 1.33 3.6 − 2.2

7

1.38 1.35 1.35 1.4 − 0.7 − 0.7

1.46 1.56 7.3 14.7

1.40 2.9

1.37 0.7

1.40 2.9

1.37 0.7

1.42 4.4

T. Khayamian / Chemometrics and Intelligent Laboratory Systems 88 (2007) 35–40

Table 4 Root mean squared errors of cross validation of the analytes for different three-way arrays originated from the original data, original data with the simulated homoscedastic noise, original data with the simulated heteroscedastic noise and denoised of the data using PARAFAC models

The description of integers from 1 to 9 is the same as Table 3.

39

40

T. Khayamian / Chemometrics and Intelligent Laboratory Systems 88 (2007) 35–40

and wavelet denoising method was applied for each sample separately. The wavelet filter was selected symlet 4 and the level of the decomposition was selected to be 4. The Daubechies, Symlet and Coiflet wavelet filters are the most applied filters. Symlet 4 is one of the symlets with 4 vanishing moments. The wavelet filter and the level of the decompositions were not optimized and they were kept constant for comparison of the results. Level independent and level dependent with universal threshold [4] were used for homoscedastic and heteroscedastic noise, respectively. The results of these investigations are shown in Tables 1 and 2. The results indicate the superiority of the NPLS over the PARAFAC model. Both models are robust with respect to the homoscedastic and heteroscedastic noise. The reason may be attributed to a large systematic variance of the components in the data relative to a small variance of the noise. However, it is necessary to emphasize that the robustness of the models is valid for the Claus data and these statements are not acceptable for any other data. As mentioned, one of the reasons for robustness of the models in relation to noise might be attributed to the large number of the data points in the original data. In order to investigate, this possibility, the number of the excitation wavelengths was decreased from 61 to 31, 31 to 16, 16 to 8 and finally 8 to 4 wavelengths by steps of 2. By this way, in addition to the original data, four three-way arrays with the dimensions of (5, 201, 31), (5, 201, 16), (5, 201, 8) and (5, 201, 4) were created. The performance of each model was evaluated by calculating the RMSECV for the analytes using leave one sample out method. The RMSECV for each analyte at these lower dimensions were calculated and their values were compared with these in the original data (5, 201, 61). When the N-PLS model was used, the maximum RMSECV changes by the dimensional reduction of the original data were 11.1% for analyte 1, 32.5% for analyte 2 and 10.6% for analyte 3 for the three-way arrays with 4 excitation wavelengths. The maximum differences with PARAFAC model were −1.7% for analyte 1, −0.4% for analyte 2 for the three-way arrays with 4 excitation wavelengths and it was 0.6% for analyte 3 for the three-way arrays with 16 excitation wavelengths. These values help us to find out the effect of eliminated excitation wavelengths with their corresponding emission information on the RMSECV changes. The procedure of dimensional reduction was repeated for the data consisted of (original data + homoscedastic noise), (denoised data for homoscedastic noise), (original data+ heteroscedastic noise) and (denoised data for heteroscedastic noise). When the N-PLS models were used, the results showed that the RMSECV values were enhanced by decreasing the dimensions of the three-way arrays. The RMSECV for each analyte at these lower dimensions were calculated and the RMSECV changes relative to the RMSECV of the same dimensional data without simulated noise were calculated. The maximum RMSECV changes for the N-PLS models were 99.1% for analyte 1, 12.29 times for analyte 2 and 96.1% for analyte 3 for the three-way arrays with 4 excitation wavelengths. The results show that the RMSECV changes for the noisy data are much larger than the RMSECV changes for the original data. When the PARAFAC models were used, the RMSECV changes were 5.9% for analyte 1, −2.8% for analyte 2 when 4 excitation wavelengths were used and it was 17.1% for analyte 3 when 8 excitation wavelengths were used. It is necessary to emphasize that although the RMSECV changes for the N-PLS models are

larger than these in the PARAFAC models. The RMSECV in N-PLS models for all the cases (with and without adding noise and denoised data) in 5 different three-way arrays are smaller than that in the PARAFAC models except for analyte 3 with the three-way arrays and 4 excitation wavelengths. As previously mentioned, the N-PLS models use the dependent and independent variables for constructing the models. Therefore, its predictive ability is better than the PARAFAC models. The results of these investigations are listed in Tables 3 and 4. In these tables the values at the second rows, below the numbers of the excitation wavelengths, are the percent of RMSECV changes relative to the RMSECV of the same dimensional data without adding noise. The RMSECV for the original data (5, 2012, 61) is shown at the top of the tables for comparison of its values with the RMSECV of the lower dimensional data with and without simulated noise.

4. Conclusions The robustness of the PARAFAC and N-PLS models for Claus data loaded from the N-way toolbox for MATLAB [1] in relation to homoscedastic and heteroscedastic noise (flicker noise) was investigated. It was demonstrated that this data is robust in relation to these types of noise (uncorrelated noise). The robustness of the models was also investigated for the models using three-way arrays with lower dimensions. The results showed that the RMSECV changes by dimensional reductions of the noisy data and the original data were not the same. The RMSECV changes for the noisy data were larger than the original data. Acknowledgement The author acknowledges the Research Council of Isfahan University of Technology for support of this work. The author also would like to appreciate Mr. Sharifian for his assistance. References [1] C.A. Andersson, R. Bro, The N-way Toolbox for MATAB, Chemom. Intell. Lab. Syst. 52 (2000) 1. [2] D.A. Skoog, F.J. Holler, T.A. Nieman, Principal of Instrumental Analysis, Harcourt Brace & Company, Florida, 1998. [3] A. Savitsky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627. [4] B. Walczak, Wavelet in Chemistry, Elsevier, Amsterdam, 2000. [5] C. Perrin, B. Walczak, D.L. Massart, Anal. Chem. 73 (2001) 4903. [6] P. Wenzell, D.T. Andrews, D.C. Hamilton, K. Faber, B. Kowalski, J. Chemom. 11 (1997) 339. [7] P. Paatero, Chemom. Intell. Lab. Syst. 38 (1997) 223. [8] L. Vega-Montogo, P. Wentzell, J. Chemom. 17 (2003) 237. [9] A. Smilde, R. Bro, P. Geladi, Multi-way Analysis with Applications in the Chemical Sciences, John Wiley & Sons, Inc., Chichester, 2004. [10] MATLAB 7.0, The Math Works, Inc, Natick, MA. [11] M. Leger, P. Wentzell, Appl. Spectrosc. 58 (2004) 855.