Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets

Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets

CHEMOM-02795; No of Pages 12 Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx Contents lists available at ScienceDirect Chemometri...

879KB Sizes 0 Downloads 17 Views

CHEMOM-02795; No of Pages 12 Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab

Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets Amrita Malik ⁎, Roma Tauler Institute of Environmental Assessment and Water Research (IDAEA), Spanish Council for Scientific Research (CSIC), Jordi Girona 18-26, 08034 Barcelona, Catalunya, Spain

a r t i c l e

i n f o

Article history: Received 17 January 2014 Received in revised form 3 April 2014 Accepted 4 April 2014 Available online xxxx Keywords: Four-way datasets MCR-ALS MLPCA Noisy data Quadrilinear constraint

a b s t r a c t This study explores the effect of noise propagation on the resolution capability of multivariate curve resolutionalternating least squares with a recently developed quadrilinearity constraint (MCR-ALSQ). To investigate the effect of application of the quadrilinearity constraint, four environmental profiles were simulated and three types of noise viz. homoscedastic, heteroscedastic, and constant-proportional noise at three different levels were added to the simulated dataset. The profiles recovered with MCR-ALSQ were compared with the ones recovered by bilinear MCR-ALS (MCR-ALSB). The effect of maximum likelihood principal component analysis (MLPCA) as a pre-processing step in MCR-ALSQ (MLPCA–MCR-ALSQ), and MCR-ALSB (MLPCA–MCR-ALSB) analysis was also studied and results were compared with ones obtained with MCR-ALSQ and MCR-ALSB models. The recovery and similarity of the resolved profiles with theoretical ones were assessed in terms of similarity coefficient (r2) and similarity angle (θ). The results of this study conclude that MCR-ALSQ is appropriate to analyze four-way quadrilinear datasets, and that the use of MLPCA as a pre-processing step before MCR-ALSQ improves the resolution profiles to a great extent even in the presence of high levels of noise (heteroscedastic, and constantproportional). © 2014 Elsevier B.V. All rights reserved.

1. Introduction MCR-ALS, a proven tool to solve the mixture analysis problems, can be used to analyze any dataset, consisting of either single or multiple data matrices, described by a bilinear model [1]. Bilinear modeling solutions for two-way datasets are usually associated with resolution ambiguities and rank deficiency problems. This aspect can be improved significantly when data structures containing richer information (like multi-way data sets) are analyzed by the extension of multivariate curve resolution methods [1,2]. In the case of multi-set and multi-way datasets, MCR-ALS is applied on augmented data matrices and unique solutions can be obtained more easily with the implementation of different constraints such as non-negativity, closure, unimodality, selectivity, local rank and trilinearity, depending on the data characteristics [3]. MCR-ALS using data matrix augmentation schemes and implementation of constraints during the alternating least squares (ALS) optimization can be customized according to the specific features of each data matrix allowing for the fulfillment of trilinear or multi-linear models [1]. In MCR-ALS analysis of multi-way datasets, the multi-linear constraints can be applied independently and selectively to each component of the dataset, providing more flexibility to data analysis and allowing for multi-linear, partial-multi-linear or mixed models [4,5]. Hence, MCR⁎ Corresponding author. E-mail addresses: [email protected], [email protected] (A. Malik), [email protected], [email protected] (R. Tauler).

ALS, can easily be adapted to data sets of different complexities and structures, bilinear, trilinear or multi-linear, providing optimal least squares solutions [6]. After successful extension and application of MCR-ALS to analyze different kinds of three-way datasets using a trilinear constraint [3,4,7–9], this method has recently been extended to analyze a four-way dataset using non-negativity and a newly developed quadrilinear constraint [10]. Whenever, a new method or model is proposed for data analysis, it needs to be validated and a practical approach to method validation is to use datasets with known characteristics or in other words, to use simulated data with known structure, characteristics, and noise [11–13]. The main sources of uncertainty associated with curve resolution results are the degree of rotation ambiguity of the recovered profiles and the propagation of experimental noise [1], which need to be considered in the quality assessment of the finally achieved results. Real life datasets usually have some amount of noise with specific characteristics depending on the origin of the dataset at hand to be analyzed. When these datasets are processed and modeled with data analysis methods, propagation of noise to resolved profiles may distort them significantly resulting in erroneous interpretations of the obtained profiles. Therefore, simulated datasets with structures as close as possible to real datasets, with noise or error matrices of known characteristics can be a good approach to validate the tested methods. Most of the bilinear model based methods, including MCR-ALS, assume that the dataset to be modeled has inherent noise which is independently and identically distributed (i.i.d.) with approximately a normal (Gaussian) distribution

http://dx.doi.org/10.1016/j.chemolab.2014.04.002 0169-7439/© 2014 Elsevier B.V. All rights reserved.

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

2

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

to provide optimal solutions. This is a rather good approximation for experimental measurements obtained from spectroscopic and chromatographic methods which are rather precise with low and relatively uniform uncertainties [14]. On the contrary, in practice, this assumption is often not fulfilled by large complex environmental datasets or DNA microarray datasets [14–17]. As the measurement error variances become non-uniform (heteroscedastic noise) the projection subspace estimation for the given data matrix becomes suboptimal. Maximum likelihood principal component analysis (MLPCA) has been used to deal with these kinds of errors [18–20]. MLPCA is a generalization of PCA to analyze data with non-ideal error structures that may range from simple heteroscedascity to more complex structures [14]. The use of MLPCA as an initial projection step in MCR-ALS analysis of noisy data has previously been proposed and reported by Dadashi et al. [21]. This work explores the performance of MCR-ALS with a quadrilinear constraint with respect to the different types of noise and effects of noise propagation in the four-mode profiles of a fourway dataset. For this research work, an environmental dataset with four contamination sources was generated with theoretical profiles based on a previous study [10] and different types of noise, viz. homoscedastic, heteroscedastic, and constant-proportional noise were introduced to the simulated quadrilinear dataset. 2. Methods 2.1. MCR-ALS method The MCR-ALS method has recently been extended to analyze and resolve the profiles for four-way data under a quadrilinearity constraint. The details about the MCR-ALS method and its extension with a quadrilinear constraint can be found elsewhere [1–3,10]. The first step in the MCR-ALS algorithm is the projection of the original dataset into a subspace defined by its principal components and initial estimates of scores (U) or loadings (VT) [5,9,21,22]:

The simulated datasets with homoscedastic noise were modeled with the MCR-ALSB, and MCR-ALSQ only, whereas, the datasets with heteroscedastic, and constant-proportional noise were modeled with the MCR-ALSB, MLPCA–MCR-ALSB, MCR-ALSQ, and MLPCA–MCR-ALSQ models. To check the efficiency of MCR-ALS methods to resolve true profiles from noisy data, the obtained R2 (variance explained) and lof (lack of fit) values were compared with the theoretical ones (lofth and R2th), and, also the obtained profiles were compared with theoretical ones (those used for data simulation).

ð1Þ

^ here, VF is the PCA loading matrix for the F component and D F;PCA is the projection of the original dataset onto a loading subspace. After initial data projection, U and VT matrices are estimated iteratively using the ALS algorithm under desired natural constraints: minU;constraints ^

(i) MCR-ALSB: ordinary MCR-ALS without any multi-linearity (trilinearity or quadrilinearity) constraint but still models the bilinearity assumption and non-negativity constraints, (ii) MLPCA–MCR-ALSB: MCR-ALSB applied on MLPCA projected dataset, (iii) MCR-ALSQ: MCR-ALS with quadrilinearity and non-negativity constraints, and (iv) MLPCA–MCR-ALSQ: MCR-ALSQ applied on MLPCA projected dataset.

2.2. Implementation of the quadrilinearity constraint in MCR-ALS

T ^ D F;PCA ¼ DV F V F

 ^ D

space to provide a closer relationship with PCA [18]. By making optimal use of measurement errors, it separates measurement noise variance from other sources of variance and therefore gives a more accurate estimation of the component subspace than PCA based methods. MLPCA as a pre-processing step can improve the quality of results, if reasonably accurate measurement of noise are provided [14,18–20]. The main purpose of this work is to analyze the stability and resolution of four-way profiles recovered by MCR-ALS under a quadrilinearity constraint in the presence of different types and amounts of noise. For this, four variants of the MCR-ALS model, based on the constraints applied and the use of MLPCA as a pre-processing step were used to investigate the efficiency of the MCR-ALS resolution method against the introduction of different noise levels and patterns in the dataset. These models can be described as following:

 ^ ^ T F;PCA −UV 

ð2Þ

    ^ ¼D ^ ^ ^ T ^ −1 ¼ D VT þ U F;PCA V V V F

ð3Þ

  ^ ^ ^ T minV^ T ;constraints D F;PCA −UV 

ð4Þ

 þ   T ^T ¼ U ^ TU ^ −1 U ^D ^ V D F: F;PCA ¼ U

ð5Þ

The solutions obtained by ordinary PCA or MCR-ALS are only optimal in case of independent and identically distributed (i.i.d.) errors or random homoscedastic errors [16–18]. This assumption cannot be made in general and is not satisfied when relatively large uncertainties in the measurements are present and they are proportional to the values. In these cases, the MLPCA projected dataset can be used for MCR-ALS analysis. The MLPCA method accounts for known measurement errors in the estimates of model subspace parameters and it can deal with different types of error structures, which is not the case with PCA. The method is first presented in terms of a classical measurement error regression model and then transformed to principal component

The MCR-ALS model for a four-way dataset ‘DIJKL’, of dimensions I, J, K, and L in the 1st, 2nd, 3rd, and 4th modes respectively, augmented in column-wise manner (Daug) can be represented as: T

Daug ¼ Uaug V þ Eaug

ð6Þ

where, Uaug is the augmented scores matrix containing loadings for the first, second and third mode, VT is the loading matrix for the second mode, and Eaug is the error term. Schematic representation of the quadrilinearity constraint implemented in the MCR-ALS model to analyze the four-way dataset is provided in Fig. 1. In brief, the quadrilinear constraint is applied during the ALS optimization of the augmented scores matrix (Uaug). As in any other ALS procedure, the first step is to provide the number of components and an initial estimation of either scores (Uaug) or loading (VT) matrices. These initial estimates are then optimized iteratively by ALS optimization and at each iteration a new estimation of the augmented scores and loading matrices is obtained. At each iteration different constraints like non-negativity, normalization (of second mode loadings VT) and quadrilinearity (optional) are introduced. The constrained iterative optimization is carried out until convergence is achieved or until a preselected number of cycles are reached. The quadrilinear constraint, in MCR-ALS can be applied independently and optionally to each component of the data set, giving more flexibility to the whole data analysis and allowing to test for full and partial quadrilinear models. Further details can be found elsewhere [10]. The whole procedure is shown schematically in Fig. 1 and can be summarized as follows: (1) First, the bilinear model decomposition of the augmented data ‘Daug’ is performed according to Eq. (6).

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

3

Fig. 1. Implementation of the quadrilinearity constraint in the MCR-ALS model, and recovery of the profiles in the 1st, 3rd, and 4th modes from augmented scores ‘Uaug’.

(2) The score profiles of the one selected component ‘f’ (ufIKL) of dimensions (IKL,1) are folded accordingly to give a one component profile matrix, (UfIK,L) with IK rows and L columns, where (f = 1,…,F). (3) Matrix (UfIK,L) is then approximated by its first component bilinear SVD decomposition: f

f

f

UIK;L ≈ uIK;1 x1;L

ð7Þ

giving a loading vector (ufIK,1) of dimensions (IK,1) having loadings for the fth component in the first and third modes (intermixed), and vector xf1,L of dimensions (1,L) with loadings for the component fth in the fourth mode. This process is repeated for every component, giving the loading matrix (X) in the fourth mode, of dimensions (F,L), where F is the number of desired or selected components (f = 1,…,F) and L (l = 1,…,L) refers to the total number of conditions in the fourth mode. (4) To recover the loading profiles in the first and third modes, the previously obtained one component profile vector (ufIK,1) of dimensions (IK,1) is folded again to give the single component profile matrix (ufI,K) with I rows and K columns. (5) Matrix (ufI,K) is then approximated by its first-component bilinear SVD decomposition: f

f

f

uI;K ≈yI;1 z1;K

ð8Þ

where, vector yfI,1 of dimensions (I,1) has the loadings of the fth component in the first mode, and zf1,K of dimensions (1,K) have the loadings of fth component in the third mode. This process is repeated for every component targeted to be quadrilinear, giving the loading matrices Y of dimensions (I,F), and Z of dimensions (F,K) for the first and third modes, respectively. Reconstruction of the full augmented scores matrix ‘Uaug’ of dimensions (IKL,F) can be easily achieved using the Kronecker tensor product [23] of X, Y, and Z loading matrices with elements xfL,1, yfI,1, and zfK,1 according to the following equations: f

f

f

uIK;1 ¼ yI;1 ⊗zK;1

f

f

ð9Þ

f

uIKL;1 ¼ uIK;1 ⊗xL;1

ð10Þ

where, ⨂ stands for the Kronecker product which performs element-to-element multiplication of the factor matrices, and ‘f’ represents the selected component of the matrix. Matrix Uaug (in Eq. (6) and Fig. 1) is reconstituted from ufIKL,1 augmented scores column profiles, for the f = 1,…,F resolved components and ALS optimization continues. The above approach (Eqs. (7) and (8)) can also be used to recover the loading profiles of the different components in different modes of the dataset from the augmented scores (Uaug), irrespective of the

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

4

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

application of the multi-linearity constraint i.e., by appropriate refolding of each of the augmented profiles in Uaug into a one component matrix and their subsequent SVD analysis to recover the loading profiles in the different modes [1,9]. On the other hand, Eqs. (9) and (10) can be used to built an augmented scores matrix, Uaug, which conforms with the quadrilinear model and they are also useful for data simulation fulfilling this condition. 2.3. Figures of merit or quality characteristics To check and compare the actual model fit by different models and MCR-ALS variants, lack of fit (lof) and explained variance (R2) were calculated as: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi uXI X J  ^ 2 u d − d i; j i; j u i¼1 j¼1  100 lof ð% Þ ¼ t XI X J 2 d i¼1 j¼1 i; j  1 XI X J  ^ 2 d − d i; j i; j 2 i¼1 j¼1 B C R ð% Þ ¼ @1− XI X J A  100 2 d i; j i¼1 j¼1

ð11Þ

0

ð12Þ

^ and d are the calculated and experimental values, where, d i,j i; j respectively. Quality of the recovery of the component profiles and the presence of rotational ambiguities were estimated with the similarity coefficient (r2) between the profiles obtained with the applied model and the theoretical ones used for the simulation, and with the similarity angle (θ). The value of ‘θ’ can be defined as the angle between the two vectors (a, and b) associated with their profiles, whose arc has the cosine equal to the value of their correlation coefficient. A high similarity between the calculated and theoretical profiles gives a very small angle between the two vectors representing the two profiles, i.e. a cosine of this angle is close to one [3]. 2

r ¼ cosθ

aΤ b kakkbk

−1

θ ¼ cos

r

ð13Þ

ð14Þ

The MCR-ALS program has been written in MATLAB and it is freely available as a toolbox at the web address http://www.mcrals.info/. Preliminary implementation of the quadrilinear constraint MATLAB routine is available to the authors (RT) under request. 3. Data simulation

represent four environmental contamination profiles, i.e. pH, temperature, organic pollution indicators (COD, BOD, NH4-N, and TKN), and bacterial contamination (total and fecal coliforms) for the water quality of River Yamuna, India at 15 sampling locations over 12 month for 7 years. The details can be found elsewhere [10]. 3.1.1. Bilinear dataset (DB) The bilinear model of a two-way dataset provides scores and loadings in the first and second mode, respectively. As depicted in Fig. 1, a four-way dataset when subjected to MCR-ALSB (bilinear model) analysis provides augmented scores ‘Uaug’ which have intermixed loadings for the three (the first, third and fourth) modes, and a separate loading matrix ‘VT’ for the second mode. The bilinear dataset (DB) of dimension (1260 × 9) was built using a very long augmented scores matrix (Uaug,B) of dimensions (1260,4) with log normally distributed random numbers, and then multiplied by the four environmental profiles taken from an earlier study [10] or loading profiles, VT, of dimensions (4,9) to give the bilinear augmented dataset DB of dimensions (1260,9) using the equation: T

DB ¼ Uaug;B V :

ð15Þ

The bilinear data structure was confirmed through rank analysis of augmented matrices. 3.1.2. Trilinear dataset (DT) To fulfill the trilinear condition, loadings in any one of the three modes should be invariant in their shape over the other two modes. To get the trilinear three-way dataset (DT), seven different individual score matrices of dimensions (180,4) were prepared by creating first, one common random log-normally distributed samples scores matrix (U) of dimension (180,4), creating then one common loading matrix X of dimensions (7,4) with random values, and multiplying these two matrices using the Kronecker tensor product (Eq. (10)) which gives the new augmented scores matrix Uaug,T of dimensions (1260,4), and finally, multiplication of the resulting augmented scores matrix (Uaug,T) by the loading (second mode) matrix (VT) gives the augmented data matrix DT, having a trilinear data structure. The dimensions and modes of this trilinear augmented matrix are 180 sample locations (1st mode, IK = 1,…,180), 9 variables or environmental parameters (2nd mode, J = 1,…9), and 7 years (3th mode, L = 1,…,7), which gives a total number of 1260 (IKL = 180 × 7 = 1260) samples and 9 variables or environmental parameters, DT of dimensions (1260,9). Note below that in the case of the four-way data set, the first mode IK of the previously simulated three-way trilinear data set is divided in two new modes, the first mode with I = 1,…,15 samples and the third mode K = 1,…,12 measuring months, and the third mode in the three-way trilinear data set with L = 1,…,7 measuring years, becomes the fourth mode in the four way quadrilinear data set (see below).

3.1. Simulation of pure datasets Quadrilinearity is an optimal situation to get unique solutions in the analysis of four-way datasets. However, if the quadrilinearity constraint is applied to a four-way dataset not fulfilling the quadrilinear model, results will be poorer in terms of explained variance and lack of fit, and the recovered profiles will be wrong. Therefore, before applying multi-way and multi-linear models, the data structure should be analyzed properly. Rank analysis using SVD of different mode-wise (column-wise, row-wise, tube-wise) augmented data matrices can be a useful method to check for the data structure, whether the data is multi-linear or not [3,5]. To check this, two non-quadrilinear datasets (DB: bilinear, & DT: trilinear dataset) and one quadrilinear dataset ‘DQ’ were simulated. In all cases, loading profiles in the second mode (VT) were taken from four different environmental contamination profiles investigated in an earlier study [10]. These chosen loading profiles

3.1.3. Quadrilinear dataset (DQ) To assure that data structure is in agreement with the quadrilinear model, the loading profiles in the third and fourth modes (loading matrices Z of dimensions (12,4) and X of dimensions (7,4)) were generated randomly, whereas the loading profiles in the first mode (loading matrix Y of dimensions (15,4)) were adopted from the earlier study [10]. Using these loading matrices and Eqs. (8) and (9), the augmented scores matrix Uaug,Q conforming to the quadrilinear model was generated, which when multiplied by the loading matrix of the common environmental profiles VT gives the augmented data matrix DQ of dimensions (1260,9), having a quadrilinear data structure. The dimensions and modes of this quadrilinear augmented matrix are 15 sample locations (1st mode, I = 1,…,15), 9 variables or environmental parameters (2nd mode, J = 1,…,9), during 12 months every year (3rd mode, K = 1,…,12), and during 7 years (4th mode, L = 1,…,7), which gives

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

a total number of 1260 (IKL = 15 × 12 × 7 = 1260) samples and 9 variables or environmental parameters (see ref. [10]). The fulfillment of the quadrilinear data structure was confirmed through rank analysis of augmented data matrices according to the four modes, i.e. locations, variables, months and years [5]. 3.2. Datasets with noise To investigate the effect of the application of the quadrilinearity constraint and noise propagation on MCR-ALS resolved profiles, error matrices with different noise levels and structures were added to the pure data matrix (DQ) to generate datasets with various noise structures and levels. In this study, three types of noise structures, i.e. homoscedastic, heteroscedastic, and constant-proportional at three noise levels equivalent to theoretical lack of fits (lofth) ≈10%, 20% and 30% (see previous Eq. (11)) were considered. In all cases, the noise levels were measured in terms of lack of fit (lof) values (see Eq. (11)). 3.2.1. Datasets with homoscedastic (random) noise Construction of a typical homoscedastic noisy dataset in matrix notation form can be written as: DQ H ¼ DQ þ EH

ð16Þ

and, EH ¼ Nð0; σ Þ

ð17Þ

where, DQH is the dataset with a particular type of noise (as described below), DQ is the pure dataset, EH is the homoscedastic noise matrix in hand, and N(0,σ) is a normally distributed matrix of random numbers with zero mean, and standard deviation σ. This type of homoscedastic noise occurs when the noise variance is homogenously distributed across all measurements with zero mean and constant variance [14]. To simulate the datasets with homoscedastic noise, three error matrices (EH1–EH3) were created from a normally distributed matrix of random numbers (N) of zero mean and standard deviation (σ) equal to 3.61, 7.21, and 10.96, respectively. These error matrices were added to the pure or noise free dataset (DQ) providing three noisy datasets viz. DQH1, DQH2 and DQH3 with lofth = 10.31%, 20.30% and 30.06%, respectively. The three standard deviation values were obtained as:    σ ¼ s mean mean DQ

ð18Þ

where, ‘s’ is the fixed proportion of the mean of the pure dataset ‘DQ’; in this study, the value of ‘s’ varied from 25% to 75% of the mean of the dataset (3.61, 7.21, and 10.96). 3.2.2. Datasets with heteroscedastic (proportional) noise DQ P ¼ DQ þ EP

ð19Þ

and,   EP ¼ N 0; σ ij :

ð20Þ

In this case, three error matrices (EP1–EP3) were obtained from a normally distributed matrix of random numbers (N) of zero mean and standard deviations, σij, which varied proportionally according to three fixed proportions of each measured data value in dataset ‘DQ’. These error matrices were added to the noise free dataset (DQ) providing three noisy datasets viz. DQP1, DQP2 and DQP3 with lofth = 10.11%,

5

20.33% and 30.23%, respectively. Here, the standard deviations were obtained as: σ ij ¼ sdij

ð21Þ

where, ‘s’ is the fixed proportion of each (i,j) value, dij, of the pure dataset ‘DQ’ (9.3%, 19%, and 29%, respectively). In this way, each value in the noisy datasets had uncertainty with a standard deviation proportional to the intensity of each measured value. In this study, the value of ‘s’ varied from 9.3 to 29% of the values of the dataset ‘DQ’ providing three different error matrices. 3.2.3. Datasets with constant-proportional noise

DQ CP ¼ DQ þ ECP

ð22Þ

and,   ECP ¼ N 0; σ ij :

ð23Þ

Three error matrices (ECP1–ECP3) were obtained from a normally distributed matrix of random numbers (N) of zero mean and standard deviations, σij, which were calculated according to the equation rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2 σ ij ¼ a2 þ sdij :

ð24Þ

σij, has two parts, one part varied proportionally according to three fixed proportions of each measured data value in dataset ‘DQ’, like in the previous case of proportional noise. The three levels of proportional noise were taken as 8.5%, 18.7%, and 28.7% of DQ. The other constant part ‘a’ was taken as 10% of the mean of the noise free data matrix DQ, These error matrices were added to the pure dataset (DQ) providing three noisy datasets viz. DQCP1, DQCP2 and DQCP3 with three levels of noise (lofth) = 10.11%, 20.33% and 30.23%, respectively. 4. Results and discussion 4.1. Rank analysis DB, DT, and DQ datasets have previously been described as 2nd mode column-wise augmented matrices. However they could have been augmented in other modes or directions (column-wise, row-wise, time-wise…) and SVD could be applied to them for rank analysis and to check their multi-linear structure. A two-way dataset can be rearranged in two modes only (column-wise (CW) or row-wise (RW)), however in the case of multi-way datasets, the samples can be arranged in as many modes as possible. The possible augmentations of all three types of datasets investigated in this work and their corresponding rank analysis results are provided in Table SI-1. In the case of DB, SVD analysis of column-wise augmented matrices presented four distinct components (after the first four components the singular values were zero), whereas, SVD analysis of row-wise augmented matrices showed more than four singular values required to explain the same amount of data variance suggesting a lack of a higher multilinear structure. In the case of the DT data set, four singular values are observed for column-wise (CW), and row-wise (samples-wise (SW) and time-wise (TW)) matrix augmentation referring to the presence of a trilinear structure in the dataset. Similarly, the DQ dataset exhibited four distinctive values in all of the four modes (column-wise (CW), sample-wise (SW), month-wise (MW) and year-wise (YW)) of matrix augmentation confirming the quadrilinear structure of the data (Table SI-1). The trilinear dataset (DT) was checked for quadrilinearity by augmenting the dataset in four modes like the quadrilinear dataset followed by SVD analysis. SVD of column-wise, sample-wise, month-

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

6

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

however, as expected in this case also profiles recovered with MCRALSB are associated with some ambiguity which is completely removed with the implementation of multi-linearity constraints.

wise and year-wise augmented dataset (DT) proposed a different number of significant components in different modes confirming that the dataset does not follow a quadrilinear structure (Table SI-1). The rank of all column-wise matrices from bilinear, trilinear, and quadrilinear datasets was always found to be four through SVD analysis, in agreement with the fact that the profile matrices used for simulation in the four modes were of rank four.

4.3. Noise structure When noise is added to data, sometimes, it becomes more difficult to differentiate between the significant components and those explaining noise only. If the error matrix is known, then the SVD analysis of the error matrix can be useful to get the information about the noise variance and its level. The results of the SVD analysis of the simulated column-wise augmented matrices are as follows (as provided in Table 1): pure data matrix (DQ), three datasets with homoscedastic error (DQH1, DQH2, DQH1), three datasets with heteroscedastic error (DQP1, DQP2, DQP1) and three datasets with constant-proportional error (DQCP1, DQCP2, DQCP3), along with corresponding error matrices (EH1, EH2, EH3, EP1, EP2, EP3, ECP1, ECP2, and ECP3). The pure data DQ presented four distinct singular values confirming the presence of four distinguishable components. In the case of datasets with homoscedastic errors, the first singular values of error matrices (EH1 & EH2) are larger than the fifth value of the corresponding data matrices (DQH1 & DQH2) showing that in these cases (noise ≈ 10% or 20% lofth), the noise is at the level of the fifth component and propagates some variance to the fourth component also; however, in the case of data matrix DQH3 the first singular value of error matrix EH3 is almost at the level of the fourth singular value of DQH3 (noise ≈ 30% lofth) suggesting that this fourth component will be difficult to distinguish from noise. For the datasets with heteroscedastic errors, and constantproportional errors with noise level ≈10% lofth, the first singular value of corresponding error matrices was larger than the third singular value of the respective dataset. Whereas, when the noise level increased (≈20% or 30% lofth), the first singular value of error matrices was always greater than the second singular value of the corresponding datasets. This suggests the incorporation of noise in the variance explained by the components or resolved profiles and a call for the need of an appropriate strategy to diminish the effect of noise on resolved profiles like the use of the proposed maximum likelihood method.

4.2. Multi-linear data structure To see the effect of the multi-linearity constraint on datasets with different kinds of structures, all simulated datasets (DB, DT, DQ) were analyzed with MCR-ALS under the different multi-linearity constraints. To keep this study concise and succinct, results obtained by the different applied MCR-ALS bilinear and multi-linear models, in the second (variable) mode only, will be discussed in detail. The bilinear dataset (DB) when subjected to MCR-ALSB without the multi-linearity constraint gave about 100% explained variance (R2) with a 0.0006% lof (this value is dependent on convergence criterion and of computer accuracy). Whereas, when trilinearity (MCR-ALST) and quadrilinearity (MCR-ALSQ) constraints were applied to the same, the R2 decreased to 53.51% (lof = 68.19%) and 50.53% (lof = 70.34%), respectively (Table SI-1). As anticipated, the profiles recovered with MCR-ALSB, when compared with theoretical ones revealed a nearly perfect similarity coefficient (r2) and small similarity angles (θ), though associated with some ambiguity (Table SI-2). On the contrary, when bilinear dataset (DB) was fitted with MCR-ALST (with trilinear constraints) and MCR-ALSQ (with quadrilinear constraints) profile recovery was very poor (Table SI-2). These results advocate the fact that the multi-linearity constraint should be applied to the dataset, only when it follows the postulated multi-linear structure, otherwise the poorer performance of the model and unreliable solutions will be obtained. When trilinear dataset (DT) was modeled with MCRALS implementing different linearity constraints, MCR-ALSB and MCRALST models fitted the dataset very well with R2 = 100% for both cases and lof = 0.0002% for MCR-ALST and less than 0.0002% for MCR-ALSB. However, as expected, profiles recovered with the MCR-ALSB model showed again some ambiguity inherent to bilinear models which is not the case with profiles obtained with MCR-ALST. The MCR-ALSQ model when applied to trilinear dataset (DT) resolved profiles with only 55.68% R2, a high 66.57% lof and a poor recovery in terms of r2 and θ (Table SI-2). To reconfirm the effect of model and data structure on the reliability of results, the quadrilinear dataset (DQ) was also modeled with all of the above variants of MCR-ALS. The results (Table SI-2) showed a perfect fitting of DQ with all models of MCR-ALS used in the study,

4.4. MCR-ALSQ results for the pure dataset The MCR-ALS models, when applied to pure dataset (DQ) exhibited excellent model performance in terms of explained variance (R 2 = 100%) and lack of fit (lof = ~ 0%), with and without the

Table 1 Results of singular value decomposition for rank-analysis of different simulated datasets and corresponding noise matrices. Noise type

None Homoscedastic

Heteroscedastic

Constant – Proportional

Noise matrix/dataset

DQ EH1 DQH1 EH2 DQH2 EH3 DQH3 EP1 DQP1 EP2 DQP2 EP3 DQP3 ECP1 DQCP1 ECP2 DQCP2 ECP3 DQCP3

Theoretical lof and R2

Singular values s1

s2

s3

s4

s5

s6

s7

s8

s9

3680.41 138.71 3682.07 277.41 3688.61 421.67 3700.57 364.56 3695.13 744.80 3746.29 1136.80 3836.10 337.45 3693.04 734.98 3744.66 1126.31 3833.33

540.71 134.45 555.73 268.89 599.40 408.72 669.36 74.92 545.23 153.07 561.07 233.63 593.12 85.63 546.56 159.17 562.24 236.86 593.55

330.37 134.20 355.18 268.40 419.43 407.96 512.72 38.04 339.85 77.71 368.49 118.61 406.37 63.65 342.53 92.98 371.53 128.73 409.10

119.19 131.97 174.61 263.93 284.18 401.17 412.48 35.71 120.12 72.96 123.08 111.35 169.40 61.83 129.92 88.90 133.19 122.03 175.91

128.91 135.78 257.82 271.46 391.88 411.84 31.91 58.39 65.20 115.40 99.52 127.77 59.61 74.81 81.83 124.77 110.63 137.30

126.85 132.17 253.71 264.34 385.63 401.75 27.37 34.75 55.92 71.04 85.36 108.47 56.43 61.53 75.06 87.46 98.74 119.48

125.42 128.28 250.85 255.87 381.29 387.71 25.66 30.90 52.43 62.78 80.02 94.60 55.34 57.63 72.07 79.06 93.90 105.57

121.66 123.98 243.32 247.90 369.84 376.50 10.54 24.96 21.52 50.89 32.85 77.48 52.72 55.39 56.36 71.21 61.50 91.91

117.82 118.77 235.65 236.96 358.18 359.60 9.18 9.48 18.76 19.35 28.63 29.51 49.96 51.03 53.34 54.57 57.68 59.00

lofth (%) 0

R2th (%) 1e−013

10.31

98.94

20.30

95.88

30.06

90.96

10.11

98.98

20.33

95.87

30.23 10.11

90.86 98.98

20.41

95.83

30.19

90.89

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

quadrilinearity constraint. When MCR-ALSQ was applied to the pure data, the recovered profiles showed a high similarity coefficient (r2 = 1) and negligible angle (θ = or ≈ 0) in all the cases when compared with theoretical profiles, implying that rotation ambiguities were completely solved when the quadrilinearity constraint was used. However, the profiles resolved with MCR-ALSB showed poor r2 and large θ values as compared to that for MCR-ALSQ. This confirms the presence of rotation ambiguity in the solutions obtained by MCR-ALSB which was fully eliminated with MCR-ALSQ. In the absence of noise, differences observed in the profile recoveries should be due to rotational ambiguities inherent to curve resolution methods [24]. Additionally, feasible bands [25] were also calculated, and the results (not shown) confirmed that rotation ambiguities in the profiles were present in the case of bilinear MCR-ALSB model only, whereas, when multi-linearity (tri-linear or quadri-linear) constraints were applied, unique solutions were obtained. Application of the quadrilinearity constraint in the MCR-ALS model (MCR-ALSQ) provided excellent profile recoveries confirming the complete match of the applied model and data structure (Table 2) supporting appropriateness of the MCR-ALS model with the quadrilinearity constraint, if the data structure follows a quadrilinear structure. 4.5. Effect of noise on resolved profiles The efficiency of MCR-ALSQ was assessed by fitting the different simulated noisy datasets. To better compare the results obtained by different MCR-ALS methods, the profiles in the 1st, 3rd, and 4th modes were always refolded from the augmented score profiles (in the bilinear model) obtained by MCR-ALSB and MLPCA–MCR-ALSB methods (Fig. 1, Section 2.2). The profiles in all the modes as resolved by different methods were compared with the theoretical ones, and all the results are provided in the Tables 3–5. However, to avoid the repetition of information and explanations, and for the clearer presentation of the findings in all the cases, only profiles recovered in variable or second mode are discussed in detail. 4.5.1. Homoscedastic (random) noise The results obtained by the MCR-ALSB and MCR-ALSQ methods, for the datasets with different levels of homoscedastic noise are

Table 2 Analysis of pure dataset (DQ) with MCR-ALSB and MCR-ALSQ models, and comparison of resolved profiles with theoretical ones. Data mode

Profilesa

MCR-ALSBb 2

1st mode

u11 u12 u13 u14

2nd (variable) mode

3rd mode

4th mode

u21 u22 u23 u24 u31 u32 u33 u34 u41

lof (%) R2 (%)

MCR-ALSQ 2

r

θ

r

θ

1 0.997 1 0.992 0.988 1 0.99 1 0.997 0.865 0.994 0.994 0.996 0.905 0.997 0.986 9.9e−04 100

0.166 4.803 0.103 7.046 8.82 0.017 6.51 0.0001 4.737 30.144 6.426 6.316 4.990 25.177 4.750 9.658

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.0e−13 100

0 0 0 1.2e−06 0 0 0 0 0 0 0 1.5e−06 0 0 0 0

a u = score profiles; the first number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc.; the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third profile in 2nd mode. b The scores obtained by MCR-ALSB, were refolded (Section 2.2, Fig. 1) to get scores in three modes (1st, 3rd, and 4th modes) of dataset.

7

summarized in Table 3. The profiles obtained by the MCR-ALSQ model in loading mode at the higher noise levels 20.30% lofth (dataset DQH2), and 30.06% lofth (dataset DQH3) along with theoretical ones, for comparison purposes, are presented in Fig. 2. When the dataset DQH1 (with noise level = 10.31% lofth) was fitted with MCR-ALSB, a lof value of 8.08%, and R2 value of 99.35% were obtained, whereas the MCR-ALSQ fitted the dataset with 10.26% lof and 98.95% R2. For MCR-ALSB, the similarity coefficient (r2) values between the resolved profiles and theoretical profiles, in loading mode ranged from 0.97 to 0.99 and angle ‘θ’ ranged from 7.67 to 13.82. Whereas, in the case of MCR-ALSQ better values of r2 (0.999 to 1.00) and very small values of ‘θ’ (0.1–4.93) were obtained, as compared to those for MCRALSB. For the dataset DQH2 also, the MCR-ALSQ was able to recover the profiles better than MCR-ALSB (Table 3). The MCR-ALSB provided a lower lof value (16.01%) and higher R2 (97.44%) in comparison to the MCR-ALSQ model, however, profile recovery was poorer. For the above two cases, the lof obtained by MCR-ALSQ is closer to the theoretical lof of the dataset, whereas, the MCR-ALSB produced a relatively lower lof value and higher R2. The lower lof values obtained with the bilinear model are probably the effect of a larger number of degrees of freedom in this case. Profiles obtained with MCR-ALSB were associated with some ambiguity which is even more obvious in the loadings of the other modes. The profile recoveries by MCR-ALSQ were better due to the elimination of rotational ambiguity with the implementation of the quadrilinearity constraint; however, profile recoveries were affected by the presence of noise. As the level of noise increased, to a lofth of about ≈30% (dataset DQH3), neither MCR-ALSQ nor MCR-ALSB could recover anymore all the profiles clearly (Table 3). Still, the MCR-ALSQ could recover a little better profiles in the first three components than those recovered by MCR-ALSB. The profile recovery was worst in the fourth component. High ‘r2’ values and low angle ‘θ’ between theoretical and obtained profiles by the MCR-ALSQ method indicate that MCR-ALSQ is a good choice to recover the true profiles when the homoscedastic noise is at the level of about 10% lofth, and even to about 20% lofth; however if the noise is increased further, it becomes difficult for the MCR-ALSQ to recover the true profiles (Fig. 2). This may be due to the propagation of a higher level of noise and the increase of ambiguity in the solutions. Similar conclusions can be drawn by comparing the ‘r2’ and ‘θ’ values in the other three modes (scores, Table 3). These results confirm that for quadrilinear data, MCR-ALSQ method could resolve the profiles in a better way as compared to the ones obtained with MCR-ALSB.

4.5.2. Heteroscedastic (proportional) noise As mentioned earlier, three datasets, DQP1, DQP2, and DQP3 with three noise levels of about 10%, 20% and 30% lof, respectively, were fitted with different MCR-ALS methods (Section 2.2) and the resolved profiles were compared with the theoretical ones to check the recovery of profiles. Results are summarized in Table 4. In the case of DQP1 (lofth = 10.11%), MCR-ALSB produced a lof value of 2.11% and R2 = 99.96%, whereas the MCR-ALSQ model provided a lof value = 10.00% and R2 = 99.00%. The lof obtained by MCR-ALSQ is in accordance to the level of noise introduced or theoretical lof (=10.11%). In terms of ‘r2’ and angle ‘θ’, profile recoveries were better for the MCR-ALSQ as compared to MCR-ALSB. When MLPCA was used as a pre-processing step prior to MCR-ALS, the model fits well for both of the applied bilinear and quadrilinear MCR-ALS models (for MLPCA–MCR-ALSB: lof =7.72%, R2 = 99.40%, and for MLPCA–MCRALSQ: lof = 10.10%, R2 =98.98%). As the level of heteroscedastic noise increased (dataset: DQP2, lofth = 20.33%), MCR-ALSB provided a lof value = 4.19% with 99.82% R2, and with implication of the quadrilinearity constraint (MCR-ALSQ) the lof value was 20.15% and R2 was 95.94%. The lof (=20.28%) and R2 (= 95.88%) value were almost equal to the theoretical ones when MLPCA–MCR-ALSQ was applied. On the other hand, MLPCA–MCR-

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

8

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Table 3 MCR-ALS analysis (with different constraints) of datasets with homoscedastic noise and comparison of resolved profiles with theoretical ones. DQH1

st

1 mode

2nd (variable) mode

3rd mode

4th mode

DQH3

MCR-ALSQ

MCR-ALSB

MCR-ALSQ

MCR-ALSB

MCR-ALSQ

Constraint

1,2

1

1,2

1

1,2

1

lof (%)

8.08

10.26

16.01

20.42

24.24

30.27

R2 (%)

99.35

98.95

97.44

95.83

94.12

90.84

Profilesb

r2

θ

r2

θ

r2

θ

r2

θ

r2

θ

r2

θ

u11 u12 u13 u14 u21 u22 u23 u24 u31 u32 u33 u34 u41 u42 u43 u44

1.000 0.9987 0.9958 0.9425 0.9876 0.9899 0.9910 0.9710 0.9949 0.9927 0.8622 0.9865 0.9938 0.9949 0.9411 0.9791

0.149 2.937 5.272 19.530 9.032 8.157 7.671 13.824 5.797 6.928 30.437 9.423 6.382 5.787 19.758 11.72

1.000 0.9995 0.9993 0.9591 1.0000 0.9991 0.9996 0.9963 0.9998 0.9992 0.9946 0.9890 0.9998 0.9995 0.9994 0.9898

0.024 1.799 2.137 16.441 0.107 2.433 1.557 4.928 1.016 2.312 5.944 8.498 1.279 1.887 1.982 8.184

1.000 0.9946 0.9959 0.8666 0.9876 0.9716 0.9830 0.9351 0.9940 0.9876 0.8609 0.9631 0.9911 0.9907 0.9255 0.9650

0.144 5.954 5.192 29.940 9.027 13.679 10.581 20.752 6.273 9.027 30.586 15.618 7.642 7.819 22.250 15.200

1.000 0.9969 0.9975 0.8941 0.9998 0.9994 0.9986 0.9785 0.9987 0.9966 0.9774 0.9713 0.9977 0.9972 0.9950 0.9709

0.077 4.538 4.089 26.603 1.003 1.999 3.036 11.916 2.973 4.759 12.20 13.77 3.90 4.32 5.71 13.85

0.9999 0.9899 0.9961 0.6412 0.9874 0.9534 0.9860 0.3225 0.9894 0.9695 0.8799 0.9050 0.9919 0.9765 0.9341 0.9290

0.612 8.131 5.032 50.118 9.107 17.570 9.600 71.184 8.336 14.187 28.371 25.181 7.317 12.455 20.918 21.728

1.000 0.9911 0.9949 0.3894 0.9996 0.9943 0.9963 0.0009 0.9824 0.9781 0.9626 0.8148 0.9896 0.9767 0.9868 0.8883

0.144 7.664 5.799 67.081 1.545 6.127 4.908 89.949 10.757 12.025 15.712 35.429 8.270 12.383 9.312 27.342

a

Data mode

DQH2

MCR-ALSB

a

1 = non-negativity, 2 = quadrilinearity. u = score profiles; the first number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc., the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third profile in 2nd mode. b

ALSB, again, resulted in higher R2 (97.98%) and lower lof value (15.57%) as compared to the theoretical ones. For the dataset DQP3 (noise level = 30.23% lofth), the lof = 5.38% and R2 = 99.71 were obtained by the MCR-ALSB, whereas MCR-ALSQ for the same provided a lof =30.10% and R2 value = 90.94%. The lof obtained by MLPCA–MCR-ALSQ is much closer to the level of noise introduced (lofth = 30.23%). In terms of ‘r2’ and ‘θ’, profiles (for the first three components) obtained by MCR-ALSQ were relatively better as compared to MCR-ALSB. This is due to the high level of noise present in the dataset; the SVD analysis of error and dataset matrices also showed that at this level it was difficult to distinguish the noise from data variance. The profiles recovered by MLPCA–MCR-ALSQ are superior than MCR-ALSQ (Fig. 3a) due to its capability to deal with heteroscedastic type of noise. For the datasets with a heteroscedastic type of noise, the pairing of MLPCA with MCR-ALSB could improve the fitting of the bilinear MCRALS; however, it could not eliminate the ambiguity associated with resolved profiles. At a low level of noise (lofth = 10.11%), MCR-ALSQ was able to recover the profiles rather well, and the use of MLPCA could help it to recover the profiles near to perfection (Table 4). In terms of similarity parameters, the profile recoveries obtained by MLPCA–MCR-ALSQ were comparable with that of MCR-ALSQ. The success of MCR-ALSQ in the presence of heteroscedastic noise (as well as for other types of noise) at moderate levels (up to ≈20% lofth) may be attributed to the fact that the quadrilinear model is resistant to over-fitting and noise propagation. In the case of higher levels of proportional noise, error destroys completely the data structure and MCRALSQ is unable to recover the right model anymore. When the noise levels were higher, the use of MLPCA as a preprocessing step before MCR-ALSQ improved the profile recoveries to a greater extent (Table 4, Fig. 3a). On the other hand, the use of MLPCA with bilinear MCR-ALS could improve the profile recovery to some extent, although the results obtained by MLPCA–MCR-ALSQ were superior when compared to other models applied. Though, for the sake of brevity, only the resolved profiles in loading mode have been discussed, the same kind of inferences can be drawn from the comparison of the recovery of the resolved profiles in other modes also (Table 4). The overall comparison of fit criteria and recovery of the resolved profiles by various methods supports, therefore the

importance of the quadrilinearity constraint, if the dataset under study follows a quadrilinear structure. Further, resolution and recovery of the profiles can be greatly enhanced if MCR-ALSQ is coupled with the MLPCA estimates, even in the case of high heteroscedastic noise (lofth ≈ 30%) (Fig. 3a). 4.5.3. Constant-proportional noise The effect of the quadrilinearity constraint on MCR-ALS resolved profiles in the presence of constant and proportional noise was investigated by modeling the datasets with the different MCR-ALS methods mentioned earlier (Section 2). The resolved profiles obtained by different models were compared with the theoretical profiles and the results are summarized in Table 5. At the level of noise of about 10.11% lofth (dataset DQCP1), relatively lower lof values and higher R2 values were obtained by MCR-ALSB and MLPCA–MCR-ALSB, when compared with the theoretical lof and explained variance (Table 5). In the case of MCR-ALSQ and MLPCA–MCRALSQ, the obtained lof values were 10.03% (R2 = 98.99%) and 10.08% (R2 = 98.99%), respectively. Comparison of resolved profiles in loading mode, in terms of r2 and angle showed the superiority of MCR-ALSQ (r2: N0.99–1.00, θ: 0.20–1.32) over MCR-ALSB (r2: N0.88–0.99, θ: 7.67– 8.77), and the superiority of MLPCA–MCR-ALSQ (r2: ~ 1.00–1.00, θ: 0.15–1.23) over MLPCA–MCR-ALSB (r2: 0.99–~ 1.00, θ: 1.36–8.09). Overall, at this noise level, the profiles recovered by MCR-ALSQ and MLPCA–MCR-ALSQ were near perfect when compared to other MCR-ALS models used in the study (Table 5). By increasing the noise level (lofth = 20.41%, dataset DQCP2), lof values closer to the theoretical one were obtained with application of the quadrilinearity constraint (20.36% lof by MLPCA–MCR-ALSQ; and 20.24% lof by MCR-ALSQ) like in the case of DQCP1. The MCR-ALSB and MLPCA–MCR-ALSB fitted the model with lof values: 5.18% (R2 = 99.73%) and 13.91% (R2 = 98.06%), respectively. The recovery of the resolved profiles (loading mode), in terms of r2 and θ, was the best achieved by MLPCA–MCR-ALSQ (r2: ~1.00–1.00, θ: 0.41–2.07) followed by MCR-ALSQ (r2: N0.99–~ 0.99, θ: 0.67–3.49). The MCR-ALSB (r2: 0.79– 0.99, θ: 9.08–38.10) and MLPCA–MCR-ALSB (r2: 0.97–N0.99, θ: 3.02– 14.23), in spite of the low lof and high R2 values, could not recover the true profiles. This again, implies the presence of ambiguity in the solutions, which are eliminated with the inclusion of the quadrilinearity

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

DQP1

DQP2 MLPCA–MCRALSB

MCR-ALSO

MLPCA–MCRALSO

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA–MCRALSO

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA–MCRALSO

Constrainta 1

1

1,2

1,2

1

1

1,2

1,2

1

1

1,2

1,2

lof (%)

2.11

7.72

10.00

10.10

4.19

15.57

20.15

20.28

5.38

23.20

30.10

30.17

R2 (%)

99.96

99.40

98.9995

98.98

99.82

97.58

95.94

95.88

99.71

94.62

90.94

90.90

MCR-ALSB

Data mode

b

st

1 mode

2nd (variable) mode

3rd mode

4th mode

a b

DQP3

2

θ

2

r

θ

Profiles

r

u11 u12 u13 u14 u21 u22 u23 u24 u31 u32 u33 u34 u41 u42 u43 u44

1.0000 0.133 1.000 0.179 0.9998 1.073 0.9999 0.879 0.9964 4.858 0.9967 4.655 0.9915 7.456 0.9921 7.184 0.9922 7.169 0.9881 8.837 0.9876 9.027 0.9933 6.657 0.9892 8.410 1.000 0.158 0.9996 0.676 0.9999 0.8326 0.9962 5.026 0.9966 4.730 0.9930 6.781 0.9929 6.809 0.8607 30.604 0.8629 30.352 0.9932 6.682 0.9936 6.479 0.9946 5.954 0.9959 5.164 0.9963 4.903 0.9964 4.860 0.9353 20.716 0.9350 20.768 0.9865 9.409 0.9868 9.310

2

2

2

2

θ

2

r

θ

r

θ

r

θ

r

1.0000 0.9998 0.9994 0.9995 1.0000 0.9997 0.9998 0.9998 1.0000 0.9999 0.9993 0.9998 1.0000 1.0000 0.9996 0.9998

0.044 1.148 2.012 1.766 0.202 1.402 1.076 1.075 0.436 0.972 2.218 1.192 0.379 0.184 1.596 1.060

1.0000 0.9999 0.9996 0.9997 1.0000 1.0000 0.9999 0.9996 1.0000 0.9999 0.9997 0.9998 1.0000 1.0000 1.0000 0.9998

0.036 0.955 1.619 1.360 0.057 0.517 0.821 1.583 0.479 0.956 1.466 1.098 0.382 0.173 0.505 1.005

1.0000 0.9990 0.9947 0.9898 0.9875 0.9695 0.9871 0.8354 0.9961 0.9920 0.8544 0.9912 0.9944 0.9961 0.9378 0.9855

0.116 2.601 5.887 8.205 9.072 14.18 9.225 33.34 5.094 7.237 31.302 7.602 6.058 5.085 20.317 9.769

1.0000 0.209 1.0000 0.088 0.9995 1.816 0.9991 2.395 0.9967 4.639 0.9977 3.896 0.9929 6.828 0.9981 3.488 0.9882 8.809 0.9999 0.6775 0.9932 6.697 0.9988 2.787 1.0000 0.355 0.9980 3.612 0.9996 1.688 0.9997 1.386 0.9966 4.735 0.9999 0.9674 0.9920 7.273 0.9993 2.142 0.8612 30.552 0.9944 6.073 0.9933 6.621 0.9990 2.610 0.9957 5.322 0.9999 0.748 0.9963 4.945 1.0000 0.440 0.9103 24.453 0.9970 4.656 0.9879 8.909 0.9995 1.775

r

θ

2

2

2

θ

2

r

θ

r

θ

r

1.0000 0.9994 0.9985 0.9988 1.0000 0.9998 0.9995 0.9985 0.9998 0.9993 0.9974 0.9992 0.9999 1.0000 0.9992 0.9994

0.074 1.950 3.135 2.804 0.251 1.063 1.764 3.160 1.001 2.088 4.094 2.315 0.820 0.437 2.234 2.002

1.000 0.9969 0.9967 0.4121 0.9830 0.9688 0.9883 0.0226 0.9729 0.9531 0.9363 0.6694 0.9783 0.9701 0.9765 0.8600

0.188 4.485 4.631 65.661 10.591 14.347 8.761 88.706 13.369 17.610 20.559 47.982 11.947 14.054 12.451 30.685

1.0000 0.248 0.9999 0.600 0.9988 2.801 0.9964 4.893 0.9966 4.693 0.9948 5.830 0.9935 6.552 0.4229 64.980 0.9891 8.484 0.9917 7.374 0.9930 6.760 0.9946 5.930 0.9999 0.576 0.9975 4.021 0.9990 2.575 0.0097 89.447 0.9967 4.681 0.9830 10.587 0.9909 7.739 0.9684 14.431 0.863 30.350 0.9850 9.927 0.9929 6.814 0.6527 49.252 0.9956 5.372 0.9923 7.115 0.9962 4.969 0.9924 7.046 0.9291 21.704 0.9934 6.603 0.9894 8.341 0.8266 34.251

r

θ

r2

θ

1.000 0.9986 0.9968 0.9972 1.000 0.9995 0.9988 0.9965 0.9996 0.9983 0.9919 0.9980 0.9997 0.9999 0.9971 0.9986

0.117 2.987 4.534 4.294 0.535 1.726 2.770 4.812 1.560 3.321 7.320 3.633 1.308 0.790 4.374 2.981

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

Table 4 MCR-ALS analysis (with different constraints) of datasets with heteroscedastic (proportional) noise and comparison of resolved profiles with theoretical ones.

1 = non-negativity, 2 = quadrilinearity. u = score profiles; the first number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc., the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third profile in 2nd mode.

9

10

DQCP1

Constrainta Data mode

st

1 mode

2nd (variable) mode

3rd mode

4th mode

a b

DQCP2

DQCP3

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA– MCR-ALSQ

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA– MCR-ALSQ

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA– MCR-ALSQ

1

1

1, 2

1, 2

1

1

1, 2

1, 2

1

1

1, 2

1, 2

lof (%)

3.71

6.30

10.03

10.08

5.18

13.91

20.24

20.36

6.12

21.41

30.07

30.12

R2 (%)

99.86

99.60

98.99

98.99

99.73

98.06

95.91

95.86

99.63

95.42

90.96

90.93

2

2

2

2

2

2

2

2

2

2

2

Profilesb

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r2

θ

u11 u12 u13 u14 u21 u22 u23 u24 u31 u32 u33 u34 u41 u42 u43 u44

1.0000 0.9997 0.9967 0.9833 0.9911 0.9883 0.9896 0.9891 0.9963 0.9932 0.8646 0.9919 0.9944 0.9962 0.9312 0.9877

0.15 1.44 4.68 10.47 7.67 8.77 8.27 8.45 4.95 6.705 30.16 7.28 6.06 4.97 21.38 9.01

1.000 0.9997 0.9972 0.9835 0.9900 0.9925 0.9997 0.9976 0.9970 0.9937 0.8749 0.9925 0.9959 0.9966 0.919 0.9887

0.18 1.32 4.32 10.43 8.09 7.04 1.36 4.00 4.43 6.45 28.97 7.04 5.20 4.72 23.27 8.61

1.0000 0.9998 0.9993 0.9955 1.0000 0.9997 0.9998 0.9998 0.9999 0.9997 0.9988 0.9992 0.9999 1.0000 0.9996 0.9994

0.04 1.23 2.09 5.42 0.20 1.32 1.12 1.15 0.58 1.40 2.85 2.28 0.74 0.34 1.62 1.93

1.0000 0.9998 0.9993 0.9957 1.0000 0.9999 1.000. 0.9998 0.9999 0.9997 0.999 0.9993 0.9999 1.0000 0.9998 0.9995

0.04 1.14 2.07 5.29 0.15 0.72 0.57 1.23 0.59 1.40 2.54 2.18 0.68 0.35 1.17 1.86

1.0000 0.9980 0.9940 0.9785 0.9875 0.9691 0.9864 0.7869 0.9959 0.9916 0.8533 0.9884 0.9941 0.9954 0.9374 0.9847

0.11 2.85 6.31 11.90 9.08 14.28 9.48 38.10 5.21 7.42 31.43 8.72 6.21 5.49 20.38 10.02

1.000 0.9987 0.9959 0.9908 0.9874 0.9693 0.9889 0.9986 0.9956 0.9928 0.8597 0.9921 0.9933 0.9966 0.9095 0.9888

0.14 2.91 5.19 7.77 9.11 14.23 8.55 3.02 5.37 6.89 30.72 7.20 6.63 4.75 24.56 8.57

1.0000 0.9991 0.9976 0.9924 0.9999 0.9989 0.9982 0.9999 0.9998 0.9991 0.9935 0.9979 0.9999 1.0000 0.9966 0.9993

0.09 2.43 3.94 7.07 0.67 2.69 3.48 0.87 1.11 2.42 6.54 3.72 0.98 0.49 4.75 2.14

1.0000 0.9993 0.9980 0.9954 1.0000 0.9999 0.9998 0.9993 0.9998 0.9992 0.9958 0.9986 0.9999 1.000 0.9984 0.9992

0.08 2.09 3.65 5.51 0.41 0.92 1.27 2.07 1.09 2.31 5.24 2.81 0.98 0.48 3.24 2.23

1.0000 0.9968 0.9968 0.4137 0.9824 0.9858 0.9886 0.0221 0.9744 0.9538 0.9362 0.6754 0.9794 0.9702 0.9766 0.8640

0.22 4.56 4.59 65.56 10.78 9.68 8.66 88.74 12.99 17.49 20.58 47.52 11.66 14.03 12.41 30.23

1.000 0.9986 0.9972 0.9863 0.9914 0.9924 0.9983 0.9964 0.9973 0.9920 0.8795 0.9923 0.9956 0.9966 0.9286 0.9912

0.24 3.07 4.29 9.50 7.53 7.06 3.39 4.87 4.22 7.25 28.41 7.11 5.35 4.74 21.78 7.61

0.9987 0.9971 0.9953 0.3986 0.9982 0.9945 0.9975 0.0056 0.9784 0.9680 0.9823 0.6464 0.9947 0.9916 0.9932 0.8288

2.97 4.36 5.54 66.51 3.40 6.03 4.03 89.68 11.92 14.54 10.79 49.73 5.92 7.44 6.70 34.02

1.0000 0.9985 0.9962 0.9946 0.9999 0.9998 0.9995 0.9985 0.9996 0.9982 0.9887 0.9979 0.9997 0.9999 0.9952 0.9988

0.12 3.14 4.99 5.97 0.77 1.19 1.79 3.16 1.64 3.45 8.61 3.72 1.40 0.80 5.60 2.75

1 = non-negativity, 2 = quadrilinearity. u = score profiles; the first number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc., the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third profile in 2nd mode.

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

Table 5 MCR-ALS analysis (with different constraints) of datasets with constant-proportional error and comparison of resolved profiles with theoretical ones.

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

11

(a) 1 0.8 0.6

0.8

1

1

0.6

0.8

0.8

0.6

0.6

0.4 0.4

0.4

0.2

0.2

0.2

0

0

0.4 0.2 0 1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

0

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

(b) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

Fig. 2. Recovery of profiles by MCR-ALS with the quadrilinearity constraint (MCR-ALSQ) (black line with symbols) in the presence of homoscedastic noise from dataset (a) DQH2 (lofth ≈ 20%), and (b) DQH3 (lofth ≈ 30%), and comparison with pure profiles (blue line). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

from other sources of variance. Inclusion of MLPCA as a pre-processing step in the MCR-ALS algorithm with the quadrilinearity constraint (MLPCA–MCR-ALSQ), increases the resolution efficiency of the method providing reliable true profiles even at the noise level of about 30% lof. MLPCA uses a maximum likelihood projection and separates measurement noise variance from other sources of variance and obtains a more accurate estimate of the component subspace [14]; this in turn helps the applied model to achieve better recovery of the profiles. These results further suggest that in general, resolved profiles are more affected by the level of noise as compared to the type of noise in the three cases under study.

constraint (Table 5). At a higher level of noise (lofth = 30.19%, dataset DQCP3), only MLPCA–MCR-ALSQ could recover the reliable profiles comparable to the theoretical one (Table 5, Fig. 4). MCR-ALSQ also recovered the profiles to some extent, but they were completely unreliable after the third component (Fig. 4). MCR-ALSB and MLPCA–MCRALSB provided lower lof values (6.12% and 21.41%) and relatively higher R2 (99.63% and 95.42%) but again the recovered profiles were associated with ambiguity which was eliminated with the inclusion of the quadrilinearity constraint in the algorithm (Table 5). Overall, recovery of profiles and their comparison with theoretical ones, in the presence of different types and different levels of noise suggests that MCR-ALS with the quadrilinearity constraint (MCR-ALSQ) is an efficient way to deal with four-way quadrilinear data. Analysis of the recovered profiles shows that MCR-ALSQ is able to handle the noise efficiently to a level of about 20% lofth. At this noise level (lof ≈ 20%), a very little difference was still observed between profiles recovered by the MCR-ALSQ and MLPCA–MCR-ALSQ models. This confirms the property of MLPCA to separate measurement noise variance

5. Conclusions This research work entails that resolution of reliable profiles can only be obtained, when the applied model and data structures match perfectly, and ambiguity associated with bilinear models can be eliminated by the use of higher linearity constraints depending on the structure of

(a) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

0 1

2

3

4

5

6

7

8

0

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

(b) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2 0 1

2

3

4

5

6

7

8

9

0

0 1

2

3

4

5

6

7

8

9

0 1

2

3

4

5

6

7

8

9

Fig. 3. Recovery of profiles in variable mode of dataset DQP3 in the presence of heteroscedastic noise (lofth ≈ 30%), by: (a) MCR-ALS with the quadrilinearity constraint (MCR-ALSQ) (black line with symbols) and (b) MCR-ALS with the quadrilinearity constraint applied on MLPCA projected data (MLPCA–MCR-ALSQ) (black line with symbols), and comparison with pure profiles (blue line). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

12

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

(a) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

0

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

(b) 1

1

1

1

0.5

0.5

0.5

0.5

0 0

2

4

6

8

10

0 0

2

4

6

8

10

0

0 0

2

4

6

8

10

0

2

4

6

8

10

Fig. 4. Recovery of profiles in variable mode of dataset DQCP3 in the presence of constant-proportional noise (lofth ≈ 30%), by: (a) MCR-ALS with the quadrilinearity constraint (MCRALSQ) (black line with symbols), and (b) MCR-ALS with the quadrilinearity constraint applied on MLPCA projected data (MLPCA–MCR-ALSQ) (black line with symbols), and comparison with pure profiles (blue line). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

the dataset to be analyzed. Further, this work validates the efficiency and reliability of the MCR-ALSQ model to analyze four-way quadrilinear datasets in the presence of different levels and types of noise. The applicability of MLPCA as a pre-processing step prior to MCR-ALSQ analysis was also proven to resolve adequately the profiles associated with high levels of heteroscedastic, and constant-proportional type of noise. The results suggest that MCR-ALS with the quadrilinear constraint can be used efficiently to analyze four-way quadrilinear datasets. Moreover, MLPCA can be conveniently used before MCR-ALS when high levels of heteroscedastic and constant-proportional types of noise are present. Acknowledgements AM gratefully acknowledges the Juan de la Cierva Post-Doctoral research grant (JCI-2011-10895), and support from the, Ministerio de Economia y Competividad, Spain (CTQ2012-38616-C02-01 grant). Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.chemolab.2014.04.002. References [1] R. Tauler, M. Maeder, A. de Juan, in: S.D. Brown, R. Tauler, B. Walczak (Eds.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, 2, Elsevier, Amsterdam, 2009, pp. 473–505, (Chapter 2.24). [2] R. Tauler, Multivariate curve resolution applied to second order data, Chemometr. Intell. Lab. Syst. 30 (1995) 133–146. [3] R. Tauler, I. Marques, E. Casassas, Multivariate curve resolution applied to three-way trilinear data: study of a spectrofluorimetric acid–base titration of salicylic acid at three excitation wavelengths, J. Chemometr. 12 (1998) 55–75. [4] M. Alier, M. Felipe, I. Hernández, R. Tauler, Trilinearity and component interaction constraints in the multivariate curve resolution investigation of NO and O3 pollution in Barcelona, Anal. Bioanal. Chem. 399 (2011) 2015–2029. [5] R. Tauler, A. Smilde, B. Kowalski, Selectivity, local rank, 3-way data analysis and ambiguity in multivariate curve resolution, J. Chemometr. 9 (1995) 31–58. [6] J. Jaumot, R. Gargallo, A. de Juan, R. Tauler, A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB, Chemometr. Intell. Lab. Syst. 76 (2005) 101–110. [7] A. de Juan, R. Tauler, Comparison of three-way resolution methods for non-trilinear data sets, J. Chemometr. 15 (2001) 749–771.

[8] E. Peré-Trepat, A. Ginebreda, R. Tauler, Comparison of different multiway methods for the analysis of geographical metal distributions in fish, sediments and river waters in Catalonia, Chemometr. Intell. Lab. Syst. 88 (2007) 69–83. [9] M. Alier, R. Tauler, Multivariate curve resolution of incomplete data multisets, Chemometr. Intell. Lab. Syst. 127 (2013) 17–28. [10] A. Malik, R. Tauler, Extension and application of multivariate curve resolution-alternating least squares to four-way quadrilinear data-obtained in the investigation of pollution patterns on Yamuna River, India—a case study, Anal. Chim. Acta. 794 (2013) 20–28. [11] J. Quackenbush, Computational analysis of microarray data, Nat. Rev. Genet. 2 (2001) 418–427. [12] C.K. Wierling, M. Steinfath, T. Elge, S. Schulze-Kremer, P. Aanstad, M. Clark, H. Lehrach, R. Herwig, Simulation of DNA array hybridization experiments and evaluation of critical parameters during subsequent image and data analysis, BMC Bioinforma. 3 (2002). [13] M. Nykter, T. Aho, M. Ahdesmäki, P. Ruusuvuori, A. Lehmussola, O. Yli-Harja, Simulation of microarray data with realistic characteristics, BMC Bioinforma. 7 (July 18 2006) 349. [14] P.D. Wentzell, S. Hou, Exploratory data analysis with noisy measurements, J. Chemometr. 26 (2012) 264–281. [15] P.D. Wentzell, T.K. Karakach, S. Roy, M.J. Martinez, C.P. Allen, M. WernerWashburne, Multivariate curve resolution of time course microarray data, BMC Bioinforma. 7 (2006) 343. [16] R. Tauler, M. Viana, X. Querol, A. Alastuey, R.M. Flight, P.D. Wentzell, P.K. Hopke, Comparison of the results obtained by four receptor modelling methods in aerosol source apportionment studies, Atmos. Environ. 43 (2009) 3989–3997. [17] M. Dadashi, H. Abdollahi, R. Tauler, Application of maximum likelihood multivariate curve resolution to noisy data sets, 272013. 34–41. [18] P.D. Wentzell, D.T. Andrews, D.C. Hamilton, K. Faber, B.R. Kowalski, Maximum likelihood principal component analysis, J. Chemometr. 11 (1997) 339–366. [19] P.D. Wentzell, M.T. Lohnes, Maximum likelihood principal component analysis with correlated measurement errors: theoretical and practical considerations, Chemometr. Intell. Lab. Syst. 45 (1999) 65–85. [20] P.D. Wentzell, Other topics in soft-modelling: maximum likelihood-based softmodelling methods, in: S.D. Brown, R. Tauler, B. Walczak (Eds.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, Elsevier Ltd., Amsterdam, 2009, pp. 507–558. [21] M. Dadashi, H. Abdollahi, R. Tauler, Maximum likelihood principal component analysis as initial projection step in multivariate curve resolution analysis of noisy data, Chemometr. Intell. Lab. Syst. 118 (2012) 33–40. [22] R. Tauler, D. Barcelo, Multivariate curve resolution applied to liquid chromatography–diode array detection, Trends Anal. Chem. 12 (1993) 319–327. [23] A. Smilde, R. Bro, P. Geladi, Multi-way Analysis: Applications in the Chemical Sciences, Wiley, 2004, ISBN 978-0-471-98691-1. [24] M. Terrado, D. Barcelo, R. Tauler, Quality assessment of the multivariate curve resolution alternating least squares method for the investigation of environmental pollution patterns in surface water, Environ. Sci. Technol. 43 (2009) 5321–5326. [25] J. Jaumot, R. Tauler, MCR-BANDS: a user friendly MATLAB program for the evaluation of rotation ambiguities in multivariate curve resolution, Chemometr. Intell. Lab. Syst. 103 (2010) 96–107.

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002