Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets

CHEMOM-02795; No of Pages 12 Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx Contents lists available at ScienceDirect Chemometri...

Download PDF

879KB Sizes 0 Downloads 17 Views

Report

PDF Reader
Full Text

CHEMOM-02795; No of Pages 12 Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab

Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets Amrita Malik ⁎, Roma Tauler Institute of Environmental Assessment and Water Research (IDAEA), Spanish Council for Scientiﬁc Research (CSIC), Jordi Girona 18-26, 08034 Barcelona, Catalunya, Spain

a r t i c l e

i n f o

Article history: Received 17 January 2014 Received in revised form 3 April 2014 Accepted 4 April 2014 Available online xxxx Keywords: Four-way datasets MCR-ALS MLPCA Noisy data Quadrilinear constraint

a b s t r a c t This study explores the effect of noise propagation on the resolution capability of multivariate curve resolutionalternating least squares with a recently developed quadrilinearity constraint (MCR-ALSQ). To investigate the effect of application of the quadrilinearity constraint, four environmental proﬁles were simulated and three types of noise viz. homoscedastic, heteroscedastic, and constant-proportional noise at three different levels were added to the simulated dataset. The proﬁles recovered with MCR-ALSQ were compared with the ones recovered by bilinear MCR-ALS (MCR-ALSB). The effect of maximum likelihood principal component analysis (MLPCA) as a pre-processing step in MCR-ALSQ (MLPCA–MCR-ALSQ), and MCR-ALSB (MLPCA–MCR-ALSB) analysis was also studied and results were compared with ones obtained with MCR-ALSQ and MCR-ALSB models. The recovery and similarity of the resolved proﬁles with theoretical ones were assessed in terms of similarity coefﬁcient (r2) and similarity angle (θ). The results of this study conclude that MCR-ALSQ is appropriate to analyze four-way quadrilinear datasets, and that the use of MLPCA as a pre-processing step before MCR-ALSQ improves the resolution proﬁles to a great extent even in the presence of high levels of noise (heteroscedastic, and constantproportional). © 2014 Elsevier B.V. All rights reserved.

1. Introduction MCR-ALS, a proven tool to solve the mixture analysis problems, can be used to analyze any dataset, consisting of either single or multiple data matrices, described by a bilinear model [1]. Bilinear modeling solutions for two-way datasets are usually associated with resolution ambiguities and rank deﬁciency problems. This aspect can be improved signiﬁcantly when data structures containing richer information (like multi-way data sets) are analyzed by the extension of multivariate curve resolution methods [1,2]. In the case of multi-set and multi-way datasets, MCR-ALS is applied on augmented data matrices and unique solutions can be obtained more easily with the implementation of different constraints such as non-negativity, closure, unimodality, selectivity, local rank and trilinearity, depending on the data characteristics [3]. MCR-ALS using data matrix augmentation schemes and implementation of constraints during the alternating least squares (ALS) optimization can be customized according to the speciﬁc features of each data matrix allowing for the fulﬁllment of trilinear or multi-linear models [1]. In MCR-ALS analysis of multi-way datasets, the multi-linear constraints can be applied independently and selectively to each component of the dataset, providing more ﬂexibility to data analysis and allowing for multi-linear, partial-multi-linear or mixed models [4,5]. Hence, MCR⁎ Corresponding author. E-mail addresses: [email protected], [email protected] (A. Malik), [email protected], [email protected] (R. Tauler).

ALS, can easily be adapted to data sets of different complexities and structures, bilinear, trilinear or multi-linear, providing optimal least squares solutions [6]. After successful extension and application of MCR-ALS to analyze different kinds of three-way datasets using a trilinear constraint [3,4,7–9], this method has recently been extended to analyze a four-way dataset using non-negativity and a newly developed quadrilinear constraint [10]. Whenever, a new method or model is proposed for data analysis, it needs to be validated and a practical approach to method validation is to use datasets with known characteristics or in other words, to use simulated data with known structure, characteristics, and noise [11–13]. The main sources of uncertainty associated with curve resolution results are the degree of rotation ambiguity of the recovered proﬁles and the propagation of experimental noise [1], which need to be considered in the quality assessment of the ﬁnally achieved results. Real life datasets usually have some amount of noise with speciﬁc characteristics depending on the origin of the dataset at hand to be analyzed. When these datasets are processed and modeled with data analysis methods, propagation of noise to resolved proﬁles may distort them signiﬁcantly resulting in erroneous interpretations of the obtained proﬁles. Therefore, simulated datasets with structures as close as possible to real datasets, with noise or error matrices of known characteristics can be a good approach to validate the tested methods. Most of the bilinear model based methods, including MCR-ALS, assume that the dataset to be modeled has inherent noise which is independently and identically distributed (i.i.d.) with approximately a normal (Gaussian) distribution

http://dx.doi.org/10.1016/j.chemolab.2014.04.002 0169-7439/© 2014 Elsevier B.V. All rights reserved.

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

2

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

to provide optimal solutions. This is a rather good approximation for experimental measurements obtained from spectroscopic and chromatographic methods which are rather precise with low and relatively uniform uncertainties [14]. On the contrary, in practice, this assumption is often not fulﬁlled by large complex environmental datasets or DNA microarray datasets [14–17]. As the measurement error variances become non-uniform (heteroscedastic noise) the projection subspace estimation for the given data matrix becomes suboptimal. Maximum likelihood principal component analysis (MLPCA) has been used to deal with these kinds of errors [18–20]. MLPCA is a generalization of PCA to analyze data with non-ideal error structures that may range from simple heteroscedascity to more complex structures [14]. The use of MLPCA as an initial projection step in MCR-ALS analysis of noisy data has previously been proposed and reported by Dadashi et al. [21]. This work explores the performance of MCR-ALS with a quadrilinear constraint with respect to the different types of noise and effects of noise propagation in the four-mode proﬁles of a fourway dataset. For this research work, an environmental dataset with four contamination sources was generated with theoretical proﬁles based on a previous study [10] and different types of noise, viz. homoscedastic, heteroscedastic, and constant-proportional noise were introduced to the simulated quadrilinear dataset. 2. Methods 2.1. MCR-ALS method The MCR-ALS method has recently been extended to analyze and resolve the proﬁles for four-way data under a quadrilinearity constraint. The details about the MCR-ALS method and its extension with a quadrilinear constraint can be found elsewhere [1–3,10]. The ﬁrst step in the MCR-ALS algorithm is the projection of the original dataset into a subspace deﬁned by its principal components and initial estimates of scores (U) or loadings (VT) [5,9,21,22]:

The simulated datasets with homoscedastic noise were modeled with the MCR-ALSB, and MCR-ALSQ only, whereas, the datasets with heteroscedastic, and constant-proportional noise were modeled with the MCR-ALSB, MLPCA–MCR-ALSB, MCR-ALSQ, and MLPCA–MCR-ALSQ models. To check the efﬁciency of MCR-ALS methods to resolve true proﬁles from noisy data, the obtained R2 (variance explained) and lof (lack of ﬁt) values were compared with the theoretical ones (lofth and R2th), and, also the obtained proﬁles were compared with theoretical ones (those used for data simulation).

ð1Þ

^ here, VF is the PCA loading matrix for the F component and D F;PCA is the projection of the original dataset onto a loading subspace. After initial data projection, U and VT matrices are estimated iteratively using the ALS algorithm under desired natural constraints: minU;constraints ^

(i) MCR-ALSB: ordinary MCR-ALS without any multi-linearity (trilinearity or quadrilinearity) constraint but still models the bilinearity assumption and non-negativity constraints, (ii) MLPCA–MCR-ALSB: MCR-ALSB applied on MLPCA projected dataset, (iii) MCR-ALSQ: MCR-ALS with quadrilinearity and non-negativity constraints, and (iv) MLPCA–MCR-ALSQ: MCR-ALSQ applied on MLPCA projected dataset.

2.2. Implementation of the quadrilinearity constraint in MCR-ALS

T ^ D F;PCA ¼ DV F V F

^ D

space to provide a closer relationship with PCA [18]. By making optimal use of measurement errors, it separates measurement noise variance from other sources of variance and therefore gives a more accurate estimation of the component subspace than PCA based methods. MLPCA as a pre-processing step can improve the quality of results, if reasonably accurate measurement of noise are provided [14,18–20]. The main purpose of this work is to analyze the stability and resolution of four-way proﬁles recovered by MCR-ALS under a quadrilinearity constraint in the presence of different types and amounts of noise. For this, four variants of the MCR-ALS model, based on the constraints applied and the use of MLPCA as a pre-processing step were used to investigate the efﬁciency of the MCR-ALS resolution method against the introduction of different noise levels and patterns in the dataset. These models can be described as following:

^ ^ T F;PCA −UV

ð2Þ

^ ¼D ^ ^ ^ T ^ −1 ¼ D VT þ U F;PCA V V V F

ð3Þ

^ ^ ^ T minV^ T ;constraints D F;PCA −UV

ð4Þ

þ T ^T ¼ U ^ TU ^ −1 U ^D ^ V D F: F;PCA ¼ U

ð5Þ

The solutions obtained by ordinary PCA or MCR-ALS are only optimal in case of independent and identically distributed (i.i.d.) errors or random homoscedastic errors [16–18]. This assumption cannot be made in general and is not satisﬁed when relatively large uncertainties in the measurements are present and they are proportional to the values. In these cases, the MLPCA projected dataset can be used for MCR-ALS analysis. The MLPCA method accounts for known measurement errors in the estimates of model subspace parameters and it can deal with different types of error structures, which is not the case with PCA. The method is ﬁrst presented in terms of a classical measurement error regression model and then transformed to principal component

The MCR-ALS model for a four-way dataset ‘DIJKL’, of dimensions I, J, K, and L in the 1st, 2nd, 3rd, and 4th modes respectively, augmented in column-wise manner (Daug) can be represented as: T

Daug ¼ Uaug V þ Eaug

ð6Þ

where, Uaug is the augmented scores matrix containing loadings for the ﬁrst, second and third mode, VT is the loading matrix for the second mode, and Eaug is the error term. Schematic representation of the quadrilinearity constraint implemented in the MCR-ALS model to analyze the four-way dataset is provided in Fig. 1. In brief, the quadrilinear constraint is applied during the ALS optimization of the augmented scores matrix (Uaug). As in any other ALS procedure, the ﬁrst step is to provide the number of components and an initial estimation of either scores (Uaug) or loading (VT) matrices. These initial estimates are then optimized iteratively by ALS optimization and at each iteration a new estimation of the augmented scores and loading matrices is obtained. At each iteration different constraints like non-negativity, normalization (of second mode loadings VT) and quadrilinearity (optional) are introduced. The constrained iterative optimization is carried out until convergence is achieved or until a preselected number of cycles are reached. The quadrilinear constraint, in MCR-ALS can be applied independently and optionally to each component of the data set, giving more ﬂexibility to the whole data analysis and allowing to test for full and partial quadrilinear models. Further details can be found elsewhere [10]. The whole procedure is shown schematically in Fig. 1 and can be summarized as follows: (1) First, the bilinear model decomposition of the augmented data ‘Daug’ is performed according to Eq. (6).

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

3

Fig. 1. Implementation of the quadrilinearity constraint in the MCR-ALS model, and recovery of the proﬁles in the 1st, 3rd, and 4th modes from augmented scores ‘Uaug’.

(2) The score proﬁles of the one selected component ‘f’ (ufIKL) of dimensions (IKL,1) are folded accordingly to give a one component proﬁle matrix, (UfIK,L) with IK rows and L columns, where (f = 1,…,F). (3) Matrix (UfIK,L) is then approximated by its ﬁrst component bilinear SVD decomposition: f

f

f

UIK;L ≈ uIK;1 x1;L

ð7Þ

giving a loading vector (ufIK,1) of dimensions (IK,1) having loadings for the fth component in the ﬁrst and third modes (intermixed), and vector xf1,L of dimensions (1,L) with loadings for the component fth in the fourth mode. This process is repeated for every component, giving the loading matrix (X) in the fourth mode, of dimensions (F,L), where F is the number of desired or selected components (f = 1,…,F) and L (l = 1,…,L) refers to the total number of conditions in the fourth mode. (4) To recover the loading proﬁles in the ﬁrst and third modes, the previously obtained one component proﬁle vector (ufIK,1) of dimensions (IK,1) is folded again to give the single component proﬁle matrix (ufI,K) with I rows and K columns. (5) Matrix (ufI,K) is then approximated by its ﬁrst-component bilinear SVD decomposition: f

f

f

uI;K ≈yI;1 z1;K

ð8Þ

where, vector yfI,1 of dimensions (I,1) has the loadings of the fth component in the ﬁrst mode, and zf1,K of dimensions (1,K) have the loadings of fth component in the third mode. This process is repeated for every component targeted to be quadrilinear, giving the loading matrices Y of dimensions (I,F), and Z of dimensions (F,K) for the ﬁrst and third modes, respectively. Reconstruction of the full augmented scores matrix ‘Uaug’ of dimensions (IKL,F) can be easily achieved using the Kronecker tensor product [23] of X, Y, and Z loading matrices with elements xfL,1, yfI,1, and zfK,1 according to the following equations: f

f

f

uIK;1 ¼ yI;1 ⊗zK;1

f

f

ð9Þ

f

uIKL;1 ¼ uIK;1 ⊗xL;1

ð10Þ

where, ⨂ stands for the Kronecker product which performs element-to-element multiplication of the factor matrices, and ‘f’ represents the selected component of the matrix. Matrix Uaug (in Eq. (6) and Fig. 1) is reconstituted from ufIKL,1 augmented scores column proﬁles, for the f = 1,…,F resolved components and ALS optimization continues. The above approach (Eqs. (7) and (8)) can also be used to recover the loading proﬁles of the different components in different modes of the dataset from the augmented scores (Uaug), irrespective of the

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

4

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

application of the multi-linearity constraint i.e., by appropriate refolding of each of the augmented proﬁles in Uaug into a one component matrix and their subsequent SVD analysis to recover the loading proﬁles in the different modes [1,9]. On the other hand, Eqs. (9) and (10) can be used to built an augmented scores matrix, Uaug, which conforms with the quadrilinear model and they are also useful for data simulation fulﬁlling this condition. 2.3. Figures of merit or quality characteristics To check and compare the actual model ﬁt by different models and MCR-ALS variants, lack of ﬁt (lof) and explained variance (R2) were calculated as: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ uXI X J ^ 2 u d − d i; j i; j u i¼1 j¼1 100 lof ð% Þ ¼ t XI X J 2 d i¼1 j¼1 i; j 1 XI X J ^ 2 d − d i; j i; j 2 i¼1 j¼1 B C R ð% Þ ¼ @1− XI X J A 100 2 d i; j i¼1 j¼1

ð11Þ

0

ð12Þ

^ and d are the calculated and experimental values, where, d i,j i; j respectively. Quality of the recovery of the component proﬁles and the presence of rotational ambiguities were estimated with the similarity coefﬁcient (r2) between the proﬁles obtained with the applied model and the theoretical ones used for the simulation, and with the similarity angle (θ). The value of ‘θ’ can be deﬁned as the angle between the two vectors (a, and b) associated with their proﬁles, whose arc has the cosine equal to the value of their correlation coefﬁcient. A high similarity between the calculated and theoretical proﬁles gives a very small angle between the two vectors representing the two proﬁles, i.e. a cosine of this angle is close to one [3]. 2

r ¼ cosθ

aΤ b kakkbk

−1

θ ¼ cos

r

ð13Þ

ð14Þ

The MCR-ALS program has been written in MATLAB and it is freely available as a toolbox at the web address http://www.mcrals.info/. Preliminary implementation of the quadrilinear constraint MATLAB routine is available to the authors (RT) under request. 3. Data simulation

represent four environmental contamination proﬁles, i.e. pH, temperature, organic pollution indicators (COD, BOD, NH4-N, and TKN), and bacterial contamination (total and fecal coliforms) for the water quality of River Yamuna, India at 15 sampling locations over 12 month for 7 years. The details can be found elsewhere [10]. 3.1.1. Bilinear dataset (DB) The bilinear model of a two-way dataset provides scores and loadings in the ﬁrst and second mode, respectively. As depicted in Fig. 1, a four-way dataset when subjected to MCR-ALSB (bilinear model) analysis provides augmented scores ‘Uaug’ which have intermixed loadings for the three (the ﬁrst, third and fourth) modes, and a separate loading matrix ‘VT’ for the second mode. The bilinear dataset (DB) of dimension (1260 × 9) was built using a very long augmented scores matrix (Uaug,B) of dimensions (1260,4) with log normally distributed random numbers, and then multiplied by the four environmental proﬁles taken from an earlier study [10] or loading proﬁles, VT, of dimensions (4,9) to give the bilinear augmented dataset DB of dimensions (1260,9) using the equation: T

DB ¼ Uaug;B V :

ð15Þ

The bilinear data structure was conﬁrmed through rank analysis of augmented matrices. 3.1.2. Trilinear dataset (DT) To fulﬁll the trilinear condition, loadings in any one of the three modes should be invariant in their shape over the other two modes. To get the trilinear three-way dataset (DT), seven different individual score matrices of dimensions (180,4) were prepared by creating ﬁrst, one common random log-normally distributed samples scores matrix (U) of dimension (180,4), creating then one common loading matrix X of dimensions (7,4) with random values, and multiplying these two matrices using the Kronecker tensor product (Eq. (10)) which gives the new augmented scores matrix Uaug,T of dimensions (1260,4), and ﬁnally, multiplication of the resulting augmented scores matrix (Uaug,T) by the loading (second mode) matrix (VT) gives the augmented data matrix DT, having a trilinear data structure. The dimensions and modes of this trilinear augmented matrix are 180 sample locations (1st mode, IK = 1,…,180), 9 variables or environmental parameters (2nd mode, J = 1,…9), and 7 years (3th mode, L = 1,…,7), which gives a total number of 1260 (IKL = 180 × 7 = 1260) samples and 9 variables or environmental parameters, DT of dimensions (1260,9). Note below that in the case of the four-way data set, the ﬁrst mode IK of the previously simulated three-way trilinear data set is divided in two new modes, the ﬁrst mode with I = 1,…,15 samples and the third mode K = 1,…,12 measuring months, and the third mode in the three-way trilinear data set with L = 1,…,7 measuring years, becomes the fourth mode in the four way quadrilinear data set (see below).

3.1. Simulation of pure datasets Quadrilinearity is an optimal situation to get unique solutions in the analysis of four-way datasets. However, if the quadrilinearity constraint is applied to a four-way dataset not fulﬁlling the quadrilinear model, results will be poorer in terms of explained variance and lack of ﬁt, and the recovered proﬁles will be wrong. Therefore, before applying multi-way and multi-linear models, the data structure should be analyzed properly. Rank analysis using SVD of different mode-wise (column-wise, row-wise, tube-wise) augmented data matrices can be a useful method to check for the data structure, whether the data is multi-linear or not [3,5]. To check this, two non-quadrilinear datasets (DB: bilinear, & DT: trilinear dataset) and one quadrilinear dataset ‘DQ’ were simulated. In all cases, loading proﬁles in the second mode (VT) were taken from four different environmental contamination proﬁles investigated in an earlier study [10]. These chosen loading proﬁles

3.1.3. Quadrilinear dataset (DQ) To assure that data structure is in agreement with the quadrilinear model, the loading proﬁles in the third and fourth modes (loading matrices Z of dimensions (12,4) and X of dimensions (7,4)) were generated randomly, whereas the loading proﬁles in the ﬁrst mode (loading matrix Y of dimensions (15,4)) were adopted from the earlier study [10]. Using these loading matrices and Eqs. (8) and (9), the augmented scores matrix Uaug,Q conforming to the quadrilinear model was generated, which when multiplied by the loading matrix of the common environmental proﬁles VT gives the augmented data matrix DQ of dimensions (1260,9), having a quadrilinear data structure. The dimensions and modes of this quadrilinear augmented matrix are 15 sample locations (1st mode, I = 1,…,15), 9 variables or environmental parameters (2nd mode, J = 1,…,9), during 12 months every year (3rd mode, K = 1,…,12), and during 7 years (4th mode, L = 1,…,7), which gives

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

a total number of 1260 (IKL = 15 × 12 × 7 = 1260) samples and 9 variables or environmental parameters (see ref. [10]). The fulﬁllment of the quadrilinear data structure was conﬁrmed through rank analysis of augmented data matrices according to the four modes, i.e. locations, variables, months and years [5]. 3.2. Datasets with noise To investigate the effect of the application of the quadrilinearity constraint and noise propagation on MCR-ALS resolved proﬁles, error matrices with different noise levels and structures were added to the pure data matrix (DQ) to generate datasets with various noise structures and levels. In this study, three types of noise structures, i.e. homoscedastic, heteroscedastic, and constant-proportional at three noise levels equivalent to theoretical lack of ﬁts (lofth) ≈10%, 20% and 30% (see previous Eq. (11)) were considered. In all cases, the noise levels were measured in terms of lack of ﬁt (lof) values (see Eq. (11)). 3.2.1. Datasets with homoscedastic (random) noise Construction of a typical homoscedastic noisy dataset in matrix notation form can be written as: DQ H ¼ DQ þ EH

ð16Þ

and, EH ¼ Nð0; σ Þ

ð17Þ

where, DQH is the dataset with a particular type of noise (as described below), DQ is the pure dataset, EH is the homoscedastic noise matrix in hand, and N(0,σ) is a normally distributed matrix of random numbers with zero mean, and standard deviation σ. This type of homoscedastic noise occurs when the noise variance is homogenously distributed across all measurements with zero mean and constant variance [14]. To simulate the datasets with homoscedastic noise, three error matrices (EH1–EH3) were created from a normally distributed matrix of random numbers (N) of zero mean and standard deviation (σ) equal to 3.61, 7.21, and 10.96, respectively. These error matrices were added to the pure or noise free dataset (DQ) providing three noisy datasets viz. DQH1, DQH2 and DQH3 with lofth = 10.31%, 20.30% and 30.06%, respectively. The three standard deviation values were obtained as: σ ¼ s mean mean DQ

ð18Þ

where, ‘s’ is the ﬁxed proportion of the mean of the pure dataset ‘DQ’; in this study, the value of ‘s’ varied from 25% to 75% of the mean of the dataset (3.61, 7.21, and 10.96). 3.2.2. Datasets with heteroscedastic (proportional) noise DQ P ¼ DQ þ EP

ð19Þ

and, EP ¼ N 0; σ ij :

ð20Þ

In this case, three error matrices (EP1–EP3) were obtained from a normally distributed matrix of random numbers (N) of zero mean and standard deviations, σij, which varied proportionally according to three ﬁxed proportions of each measured data value in dataset ‘DQ’. These error matrices were added to the noise free dataset (DQ) providing three noisy datasets viz. DQP1, DQP2 and DQP3 with lofth = 10.11%,

5

20.33% and 30.23%, respectively. Here, the standard deviations were obtained as: σ ij ¼ sdij

ð21Þ

where, ‘s’ is the ﬁxed proportion of each (i,j) value, dij, of the pure dataset ‘DQ’ (9.3%, 19%, and 29%, respectively). In this way, each value in the noisy datasets had uncertainty with a standard deviation proportional to the intensity of each measured value. In this study, the value of ‘s’ varied from 9.3 to 29% of the values of the dataset ‘DQ’ providing three different error matrices. 3.2.3. Datasets with constant-proportional noise

DQ CP ¼ DQ þ ECP

ð22Þ

and, ECP ¼ N 0; σ ij :

ð23Þ

Three error matrices (ECP1–ECP3) were obtained from a normally distributed matrix of random numbers (N) of zero mean and standard deviations, σij, which were calculated according to the equation rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 σ ij ¼ a2 þ sdij :

ð24Þ

σij, has two parts, one part varied proportionally according to three ﬁxed proportions of each measured data value in dataset ‘DQ’, like in the previous case of proportional noise. The three levels of proportional noise were taken as 8.5%, 18.7%, and 28.7% of DQ. The other constant part ‘a’ was taken as 10% of the mean of the noise free data matrix DQ, These error matrices were added to the pure dataset (DQ) providing three noisy datasets viz. DQCP1, DQCP2 and DQCP3 with three levels of noise (lofth) = 10.11%, 20.33% and 30.23%, respectively. 4. Results and discussion 4.1. Rank analysis DB, DT, and DQ datasets have previously been described as 2nd mode column-wise augmented matrices. However they could have been augmented in other modes or directions (column-wise, row-wise, time-wise…) and SVD could be applied to them for rank analysis and to check their multi-linear structure. A two-way dataset can be rearranged in two modes only (column-wise (CW) or row-wise (RW)), however in the case of multi-way datasets, the samples can be arranged in as many modes as possible. The possible augmentations of all three types of datasets investigated in this work and their corresponding rank analysis results are provided in Table SI-1. In the case of DB, SVD analysis of column-wise augmented matrices presented four distinct components (after the ﬁrst four components the singular values were zero), whereas, SVD analysis of row-wise augmented matrices showed more than four singular values required to explain the same amount of data variance suggesting a lack of a higher multilinear structure. In the case of the DT data set, four singular values are observed for column-wise (CW), and row-wise (samples-wise (SW) and time-wise (TW)) matrix augmentation referring to the presence of a trilinear structure in the dataset. Similarly, the DQ dataset exhibited four distinctive values in all of the four modes (column-wise (CW), sample-wise (SW), month-wise (MW) and year-wise (YW)) of matrix augmentation conﬁrming the quadrilinear structure of the data (Table SI-1). The trilinear dataset (DT) was checked for quadrilinearity by augmenting the dataset in four modes like the quadrilinear dataset followed by SVD analysis. SVD of column-wise, sample-wise, month-

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

6

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

however, as expected in this case also proﬁles recovered with MCRALSB are associated with some ambiguity which is completely removed with the implementation of multi-linearity constraints.

wise and year-wise augmented dataset (DT) proposed a different number of signiﬁcant components in different modes conﬁrming that the dataset does not follow a quadrilinear structure (Table SI-1). The rank of all column-wise matrices from bilinear, trilinear, and quadrilinear datasets was always found to be four through SVD analysis, in agreement with the fact that the proﬁle matrices used for simulation in the four modes were of rank four.

4.3. Noise structure When noise is added to data, sometimes, it becomes more difﬁcult to differentiate between the signiﬁcant components and those explaining noise only. If the error matrix is known, then the SVD analysis of the error matrix can be useful to get the information about the noise variance and its level. The results of the SVD analysis of the simulated column-wise augmented matrices are as follows (as provided in Table 1): pure data matrix (DQ), three datasets with homoscedastic error (DQH1, DQH2, DQH1), three datasets with heteroscedastic error (DQP1, DQP2, DQP1) and three datasets with constant-proportional error (DQCP1, DQCP2, DQCP3), along with corresponding error matrices (EH1, EH2, EH3, EP1, EP2, EP3, ECP1, ECP2, and ECP3). The pure data DQ presented four distinct singular values conﬁrming the presence of four distinguishable components. In the case of datasets with homoscedastic errors, the ﬁrst singular values of error matrices (EH1 & EH2) are larger than the ﬁfth value of the corresponding data matrices (DQH1 & DQH2) showing that in these cases (noise ≈ 10% or 20% lofth), the noise is at the level of the ﬁfth component and propagates some variance to the fourth component also; however, in the case of data matrix DQH3 the ﬁrst singular value of error matrix EH3 is almost at the level of the fourth singular value of DQH3 (noise ≈ 30% lofth) suggesting that this fourth component will be difﬁcult to distinguish from noise. For the datasets with heteroscedastic errors, and constantproportional errors with noise level ≈10% lofth, the ﬁrst singular value of corresponding error matrices was larger than the third singular value of the respective dataset. Whereas, when the noise level increased (≈20% or 30% lofth), the ﬁrst singular value of error matrices was always greater than the second singular value of the corresponding datasets. This suggests the incorporation of noise in the variance explained by the components or resolved proﬁles and a call for the need of an appropriate strategy to diminish the effect of noise on resolved proﬁles like the use of the proposed maximum likelihood method.

4.2. Multi-linear data structure To see the effect of the multi-linearity constraint on datasets with different kinds of structures, all simulated datasets (DB, DT, DQ) were analyzed with MCR-ALS under the different multi-linearity constraints. To keep this study concise and succinct, results obtained by the different applied MCR-ALS bilinear and multi-linear models, in the second (variable) mode only, will be discussed in detail. The bilinear dataset (DB) when subjected to MCR-ALSB without the multi-linearity constraint gave about 100% explained variance (R2) with a 0.0006% lof (this value is dependent on convergence criterion and of computer accuracy). Whereas, when trilinearity (MCR-ALST) and quadrilinearity (MCR-ALSQ) constraints were applied to the same, the R2 decreased to 53.51% (lof = 68.19%) and 50.53% (lof = 70.34%), respectively (Table SI-1). As anticipated, the proﬁles recovered with MCR-ALSB, when compared with theoretical ones revealed a nearly perfect similarity coefﬁcient (r2) and small similarity angles (θ), though associated with some ambiguity (Table SI-2). On the contrary, when bilinear dataset (DB) was ﬁtted with MCR-ALST (with trilinear constraints) and MCR-ALSQ (with quadrilinear constraints) proﬁle recovery was very poor (Table SI-2). These results advocate the fact that the multi-linearity constraint should be applied to the dataset, only when it follows the postulated multi-linear structure, otherwise the poorer performance of the model and unreliable solutions will be obtained. When trilinear dataset (DT) was modeled with MCRALS implementing different linearity constraints, MCR-ALSB and MCRALST models ﬁtted the dataset very well with R2 = 100% for both cases and lof = 0.0002% for MCR-ALST and less than 0.0002% for MCR-ALSB. However, as expected, proﬁles recovered with the MCR-ALSB model showed again some ambiguity inherent to bilinear models which is not the case with proﬁles obtained with MCR-ALST. The MCR-ALSQ model when applied to trilinear dataset (DT) resolved proﬁles with only 55.68% R2, a high 66.57% lof and a poor recovery in terms of r2 and θ (Table SI-2). To reconﬁrm the effect of model and data structure on the reliability of results, the quadrilinear dataset (DQ) was also modeled with all of the above variants of MCR-ALS. The results (Table SI-2) showed a perfect ﬁtting of DQ with all models of MCR-ALS used in the study,

4.4. MCR-ALSQ results for the pure dataset The MCR-ALS models, when applied to pure dataset (DQ) exhibited excellent model performance in terms of explained variance (R 2 = 100%) and lack of ﬁt (lof = ~ 0%), with and without the

Table 1 Results of singular value decomposition for rank-analysis of different simulated datasets and corresponding noise matrices. Noise type

None Homoscedastic

Heteroscedastic

Constant – Proportional

Noise matrix/dataset

DQ EH1 DQH1 EH2 DQH2 EH3 DQH3 EP1 DQP1 EP2 DQP2 EP3 DQP3 ECP1 DQCP1 ECP2 DQCP2 ECP3 DQCP3

Theoretical lof and R2

Singular values s1

s2

s3

s4

s5

s6

s7

s8

s9

3680.41 138.71 3682.07 277.41 3688.61 421.67 3700.57 364.56 3695.13 744.80 3746.29 1136.80 3836.10 337.45 3693.04 734.98 3744.66 1126.31 3833.33

540.71 134.45 555.73 268.89 599.40 408.72 669.36 74.92 545.23 153.07 561.07 233.63 593.12 85.63 546.56 159.17 562.24 236.86 593.55

330.37 134.20 355.18 268.40 419.43 407.96 512.72 38.04 339.85 77.71 368.49 118.61 406.37 63.65 342.53 92.98 371.53 128.73 409.10

119.19 131.97 174.61 263.93 284.18 401.17 412.48 35.71 120.12 72.96 123.08 111.35 169.40 61.83 129.92 88.90 133.19 122.03 175.91

128.91 135.78 257.82 271.46 391.88 411.84 31.91 58.39 65.20 115.40 99.52 127.77 59.61 74.81 81.83 124.77 110.63 137.30

126.85 132.17 253.71 264.34 385.63 401.75 27.37 34.75 55.92 71.04 85.36 108.47 56.43 61.53 75.06 87.46 98.74 119.48

125.42 128.28 250.85 255.87 381.29 387.71 25.66 30.90 52.43 62.78 80.02 94.60 55.34 57.63 72.07 79.06 93.90 105.57

121.66 123.98 243.32 247.90 369.84 376.50 10.54 24.96 21.52 50.89 32.85 77.48 52.72 55.39 56.36 71.21 61.50 91.91

117.82 118.77 235.65 236.96 358.18 359.60 9.18 9.48 18.76 19.35 28.63 29.51 49.96 51.03 53.34 54.57 57.68 59.00

lofth (%) 0

R2th (%) 1e−013

10.31

98.94

20.30

95.88

30.06

90.96

10.11

98.98

20.33

95.87

30.23 10.11

90.86 98.98

20.41

95.83

30.19

90.89

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

quadrilinearity constraint. When MCR-ALSQ was applied to the pure data, the recovered proﬁles showed a high similarity coefﬁcient (r2 = 1) and negligible angle (θ = or ≈ 0) in all the cases when compared with theoretical proﬁles, implying that rotation ambiguities were completely solved when the quadrilinearity constraint was used. However, the proﬁles resolved with MCR-ALSB showed poor r2 and large θ values as compared to that for MCR-ALSQ. This conﬁrms the presence of rotation ambiguity in the solutions obtained by MCR-ALSB which was fully eliminated with MCR-ALSQ. In the absence of noise, differences observed in the proﬁle recoveries should be due to rotational ambiguities inherent to curve resolution methods [24]. Additionally, feasible bands [25] were also calculated, and the results (not shown) conﬁrmed that rotation ambiguities in the proﬁles were present in the case of bilinear MCR-ALSB model only, whereas, when multi-linearity (tri-linear or quadri-linear) constraints were applied, unique solutions were obtained. Application of the quadrilinearity constraint in the MCR-ALS model (MCR-ALSQ) provided excellent proﬁle recoveries conﬁrming the complete match of the applied model and data structure (Table 2) supporting appropriateness of the MCR-ALS model with the quadrilinearity constraint, if the data structure follows a quadrilinear structure. 4.5. Effect of noise on resolved proﬁles The efﬁciency of MCR-ALSQ was assessed by ﬁtting the different simulated noisy datasets. To better compare the results obtained by different MCR-ALS methods, the proﬁles in the 1st, 3rd, and 4th modes were always refolded from the augmented score proﬁles (in the bilinear model) obtained by MCR-ALSB and MLPCA–MCR-ALSB methods (Fig. 1, Section 2.2). The proﬁles in all the modes as resolved by different methods were compared with the theoretical ones, and all the results are provided in the Tables 3–5. However, to avoid the repetition of information and explanations, and for the clearer presentation of the ﬁndings in all the cases, only proﬁles recovered in variable or second mode are discussed in detail. 4.5.1. Homoscedastic (random) noise The results obtained by the MCR-ALSB and MCR-ALSQ methods, for the datasets with different levels of homoscedastic noise are

Table 2 Analysis of pure dataset (DQ) with MCR-ALSB and MCR-ALSQ models, and comparison of resolved proﬁles with theoretical ones. Data mode

Proﬁlesa

MCR-ALSBb 2

1st mode

u11 u12 u13 u14

2nd (variable) mode

3rd mode

4th mode

u21 u22 u23 u24 u31 u32 u33 u34 u41

lof (%) R2 (%)

MCR-ALSQ 2

r

θ

r

θ

1 0.997 1 0.992 0.988 1 0.99 1 0.997 0.865 0.994 0.994 0.996 0.905 0.997 0.986 9.9e−04 100

0.166 4.803 0.103 7.046 8.82 0.017 6.51 0.0001 4.737 30.144 6.426 6.316 4.990 25.177 4.750 9.658

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1.0e−13 100

0 0 0 1.2e−06 0 0 0 0 0 0 0 1.5e−06 0 0 0 0

a u = score proﬁles; the ﬁrst number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc.; the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third proﬁle in 2nd mode. b The scores obtained by MCR-ALSB, were refolded (Section 2.2, Fig. 1) to get scores in three modes (1st, 3rd, and 4th modes) of dataset.

7

summarized in Table 3. The proﬁles obtained by the MCR-ALSQ model in loading mode at the higher noise levels 20.30% lofth (dataset DQH2), and 30.06% lofth (dataset DQH3) along with theoretical ones, for comparison purposes, are presented in Fig. 2. When the dataset DQH1 (with noise level = 10.31% lofth) was ﬁtted with MCR-ALSB, a lof value of 8.08%, and R2 value of 99.35% were obtained, whereas the MCR-ALSQ ﬁtted the dataset with 10.26% lof and 98.95% R2. For MCR-ALSB, the similarity coefﬁcient (r2) values between the resolved proﬁles and theoretical proﬁles, in loading mode ranged from 0.97 to 0.99 and angle ‘θ’ ranged from 7.67 to 13.82. Whereas, in the case of MCR-ALSQ better values of r2 (0.999 to 1.00) and very small values of ‘θ’ (0.1–4.93) were obtained, as compared to those for MCRALSB. For the dataset DQH2 also, the MCR-ALSQ was able to recover the proﬁles better than MCR-ALSB (Table 3). The MCR-ALSB provided a lower lof value (16.01%) and higher R2 (97.44%) in comparison to the MCR-ALSQ model, however, proﬁle recovery was poorer. For the above two cases, the lof obtained by MCR-ALSQ is closer to the theoretical lof of the dataset, whereas, the MCR-ALSB produced a relatively lower lof value and higher R2. The lower lof values obtained with the bilinear model are probably the effect of a larger number of degrees of freedom in this case. Proﬁles obtained with MCR-ALSB were associated with some ambiguity which is even more obvious in the loadings of the other modes. The proﬁle recoveries by MCR-ALSQ were better due to the elimination of rotational ambiguity with the implementation of the quadrilinearity constraint; however, proﬁle recoveries were affected by the presence of noise. As the level of noise increased, to a lofth of about ≈30% (dataset DQH3), neither MCR-ALSQ nor MCR-ALSB could recover anymore all the proﬁles clearly (Table 3). Still, the MCR-ALSQ could recover a little better proﬁles in the ﬁrst three components than those recovered by MCR-ALSB. The proﬁle recovery was worst in the fourth component. High ‘r2’ values and low angle ‘θ’ between theoretical and obtained proﬁles by the MCR-ALSQ method indicate that MCR-ALSQ is a good choice to recover the true proﬁles when the homoscedastic noise is at the level of about 10% lofth, and even to about 20% lofth; however if the noise is increased further, it becomes difﬁcult for the MCR-ALSQ to recover the true proﬁles (Fig. 2). This may be due to the propagation of a higher level of noise and the increase of ambiguity in the solutions. Similar conclusions can be drawn by comparing the ‘r2’ and ‘θ’ values in the other three modes (scores, Table 3). These results conﬁrm that for quadrilinear data, MCR-ALSQ method could resolve the proﬁles in a better way as compared to the ones obtained with MCR-ALSB.

4.5.2. Heteroscedastic (proportional) noise As mentioned earlier, three datasets, DQP1, DQP2, and DQP3 with three noise levels of about 10%, 20% and 30% lof, respectively, were ﬁtted with different MCR-ALS methods (Section 2.2) and the resolved proﬁles were compared with the theoretical ones to check the recovery of proﬁles. Results are summarized in Table 4. In the case of DQP1 (lofth = 10.11%), MCR-ALSB produced a lof value of 2.11% and R2 = 99.96%, whereas the MCR-ALSQ model provided a lof value = 10.00% and R2 = 99.00%. The lof obtained by MCR-ALSQ is in accordance to the level of noise introduced or theoretical lof (=10.11%). In terms of ‘r2’ and angle ‘θ’, proﬁle recoveries were better for the MCR-ALSQ as compared to MCR-ALSB. When MLPCA was used as a pre-processing step prior to MCR-ALS, the model ﬁts well for both of the applied bilinear and quadrilinear MCR-ALS models (for MLPCA–MCR-ALSB: lof =7.72%, R2 = 99.40%, and for MLPCA–MCRALSQ: lof = 10.10%, R2 =98.98%). As the level of heteroscedastic noise increased (dataset: DQP2, lofth = 20.33%), MCR-ALSB provided a lof value = 4.19% with 99.82% R2, and with implication of the quadrilinearity constraint (MCR-ALSQ) the lof value was 20.15% and R2 was 95.94%. The lof (=20.28%) and R2 (= 95.88%) value were almost equal to the theoretical ones when MLPCA–MCR-ALSQ was applied. On the other hand, MLPCA–MCR-

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

8

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Table 3 MCR-ALS analysis (with different constraints) of datasets with homoscedastic noise and comparison of resolved proﬁles with theoretical ones. DQH1

st

1 mode

2nd (variable) mode

3rd mode

4th mode

DQH3

MCR-ALSQ

MCR-ALSB

MCR-ALSQ

MCR-ALSB

MCR-ALSQ

Constraint

1,2

1

1,2

1

1,2

1

lof (%)

8.08

10.26

16.01

20.42

24.24

30.27

R2 (%)

99.35

98.95

97.44

95.83

94.12

90.84

Proﬁlesb

r2

θ

r2

θ

r2

θ

r2

θ

r2

θ

r2

θ

u11 u12 u13 u14 u21 u22 u23 u24 u31 u32 u33 u34 u41 u42 u43 u44

1.000 0.9987 0.9958 0.9425 0.9876 0.9899 0.9910 0.9710 0.9949 0.9927 0.8622 0.9865 0.9938 0.9949 0.9411 0.9791

0.149 2.937 5.272 19.530 9.032 8.157 7.671 13.824 5.797 6.928 30.437 9.423 6.382 5.787 19.758 11.72

1.000 0.9995 0.9993 0.9591 1.0000 0.9991 0.9996 0.9963 0.9998 0.9992 0.9946 0.9890 0.9998 0.9995 0.9994 0.9898

0.024 1.799 2.137 16.441 0.107 2.433 1.557 4.928 1.016 2.312 5.944 8.498 1.279 1.887 1.982 8.184

1.000 0.9946 0.9959 0.8666 0.9876 0.9716 0.9830 0.9351 0.9940 0.9876 0.8609 0.9631 0.9911 0.9907 0.9255 0.9650

0.144 5.954 5.192 29.940 9.027 13.679 10.581 20.752 6.273 9.027 30.586 15.618 7.642 7.819 22.250 15.200

1.000 0.9969 0.9975 0.8941 0.9998 0.9994 0.9986 0.9785 0.9987 0.9966 0.9774 0.9713 0.9977 0.9972 0.9950 0.9709

0.077 4.538 4.089 26.603 1.003 1.999 3.036 11.916 2.973 4.759 12.20 13.77 3.90 4.32 5.71 13.85

0.9999 0.9899 0.9961 0.6412 0.9874 0.9534 0.9860 0.3225 0.9894 0.9695 0.8799 0.9050 0.9919 0.9765 0.9341 0.9290

0.612 8.131 5.032 50.118 9.107 17.570 9.600 71.184 8.336 14.187 28.371 25.181 7.317 12.455 20.918 21.728

1.000 0.9911 0.9949 0.3894 0.9996 0.9943 0.9963 0.0009 0.9824 0.9781 0.9626 0.8148 0.9896 0.9767 0.9868 0.8883

0.144 7.664 5.799 67.081 1.545 6.127 4.908 89.949 10.757 12.025 15.712 35.429 8.270 12.383 9.312 27.342

a

Data mode

DQH2

MCR-ALSB

a

1 = non-negativity, 2 = quadrilinearity. u = score proﬁles; the ﬁrst number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc., the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third proﬁle in 2nd mode. b

ALSB, again, resulted in higher R2 (97.98%) and lower lof value (15.57%) as compared to the theoretical ones. For the dataset DQP3 (noise level = 30.23% lofth), the lof = 5.38% and R2 = 99.71 were obtained by the MCR-ALSB, whereas MCR-ALSQ for the same provided a lof =30.10% and R2 value = 90.94%. The lof obtained by MLPCA–MCR-ALSQ is much closer to the level of noise introduced (lofth = 30.23%). In terms of ‘r2’ and ‘θ’, proﬁles (for the ﬁrst three components) obtained by MCR-ALSQ were relatively better as compared to MCR-ALSB. This is due to the high level of noise present in the dataset; the SVD analysis of error and dataset matrices also showed that at this level it was difﬁcult to distinguish the noise from data variance. The proﬁles recovered by MLPCA–MCR-ALSQ are superior than MCR-ALSQ (Fig. 3a) due to its capability to deal with heteroscedastic type of noise. For the datasets with a heteroscedastic type of noise, the pairing of MLPCA with MCR-ALSB could improve the ﬁtting of the bilinear MCRALS; however, it could not eliminate the ambiguity associated with resolved proﬁles. At a low level of noise (lofth = 10.11%), MCR-ALSQ was able to recover the proﬁles rather well, and the use of MLPCA could help it to recover the proﬁles near to perfection (Table 4). In terms of similarity parameters, the proﬁle recoveries obtained by MLPCA–MCR-ALSQ were comparable with that of MCR-ALSQ. The success of MCR-ALSQ in the presence of heteroscedastic noise (as well as for other types of noise) at moderate levels (up to ≈20% lofth) may be attributed to the fact that the quadrilinear model is resistant to over-ﬁtting and noise propagation. In the case of higher levels of proportional noise, error destroys completely the data structure and MCRALSQ is unable to recover the right model anymore. When the noise levels were higher, the use of MLPCA as a preprocessing step before MCR-ALSQ improved the proﬁle recoveries to a greater extent (Table 4, Fig. 3a). On the other hand, the use of MLPCA with bilinear MCR-ALS could improve the proﬁle recovery to some extent, although the results obtained by MLPCA–MCR-ALSQ were superior when compared to other models applied. Though, for the sake of brevity, only the resolved proﬁles in loading mode have been discussed, the same kind of inferences can be drawn from the comparison of the recovery of the resolved proﬁles in other modes also (Table 4). The overall comparison of ﬁt criteria and recovery of the resolved proﬁles by various methods supports, therefore the

importance of the quadrilinearity constraint, if the dataset under study follows a quadrilinear structure. Further, resolution and recovery of the proﬁles can be greatly enhanced if MCR-ALSQ is coupled with the MLPCA estimates, even in the case of high heteroscedastic noise (lofth ≈ 30%) (Fig. 3a). 4.5.3. Constant-proportional noise The effect of the quadrilinearity constraint on MCR-ALS resolved proﬁles in the presence of constant and proportional noise was investigated by modeling the datasets with the different MCR-ALS methods mentioned earlier (Section 2). The resolved proﬁles obtained by different models were compared with the theoretical proﬁles and the results are summarized in Table 5. At the level of noise of about 10.11% lofth (dataset DQCP1), relatively lower lof values and higher R2 values were obtained by MCR-ALSB and MLPCA–MCR-ALSB, when compared with the theoretical lof and explained variance (Table 5). In the case of MCR-ALSQ and MLPCA–MCRALSQ, the obtained lof values were 10.03% (R2 = 98.99%) and 10.08% (R2 = 98.99%), respectively. Comparison of resolved proﬁles in loading mode, in terms of r2 and angle showed the superiority of MCR-ALSQ (r2: N0.99–1.00, θ: 0.20–1.32) over MCR-ALSB (r2: N0.88–0.99, θ: 7.67– 8.77), and the superiority of MLPCA–MCR-ALSQ (r2: ~ 1.00–1.00, θ: 0.15–1.23) over MLPCA–MCR-ALSB (r2: 0.99–~ 1.00, θ: 1.36–8.09). Overall, at this noise level, the proﬁles recovered by MCR-ALSQ and MLPCA–MCR-ALSQ were near perfect when compared to other MCR-ALS models used in the study (Table 5). By increasing the noise level (lofth = 20.41%, dataset DQCP2), lof values closer to the theoretical one were obtained with application of the quadrilinearity constraint (20.36% lof by MLPCA–MCR-ALSQ; and 20.24% lof by MCR-ALSQ) like in the case of DQCP1. The MCR-ALSB and MLPCA–MCR-ALSB ﬁtted the model with lof values: 5.18% (R2 = 99.73%) and 13.91% (R2 = 98.06%), respectively. The recovery of the resolved proﬁles (loading mode), in terms of r2 and θ, was the best achieved by MLPCA–MCR-ALSQ (r2: ~1.00–1.00, θ: 0.41–2.07) followed by MCR-ALSQ (r2: N0.99–~ 0.99, θ: 0.67–3.49). The MCR-ALSB (r2: 0.79– 0.99, θ: 9.08–38.10) and MLPCA–MCR-ALSB (r2: 0.97–N0.99, θ: 3.02– 14.23), in spite of the low lof and high R2 values, could not recover the true proﬁles. This again, implies the presence of ambiguity in the solutions, which are eliminated with the inclusion of the quadrilinearity

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

DQP1

DQP2 MLPCA–MCRALSB

MCR-ALSO

MLPCA–MCRALSO

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA–MCRALSO

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA–MCRALSO

Constrainta 1

1

1,2

1,2

1

1

1,2

1,2

1

1

1,2

1,2

lof (%)

2.11

7.72

10.00

10.10

4.19

15.57

20.15

20.28

5.38

23.20

30.10

30.17

R2 (%)

99.96

99.40

98.9995

98.98

99.82

97.58

95.94

95.88

99.71

94.62

90.94

90.90

MCR-ALSB

Data mode

b

st

1 mode

2nd (variable) mode

3rd mode

4th mode

a b

DQP3

2

θ

2

r

θ

Proﬁles

r

u11 u12 u13 u14 u21 u22 u23 u24 u31 u32 u33 u34 u41 u42 u43 u44

1.0000 0.133 1.000 0.179 0.9998 1.073 0.9999 0.879 0.9964 4.858 0.9967 4.655 0.9915 7.456 0.9921 7.184 0.9922 7.169 0.9881 8.837 0.9876 9.027 0.9933 6.657 0.9892 8.410 1.000 0.158 0.9996 0.676 0.9999 0.8326 0.9962 5.026 0.9966 4.730 0.9930 6.781 0.9929 6.809 0.8607 30.604 0.8629 30.352 0.9932 6.682 0.9936 6.479 0.9946 5.954 0.9959 5.164 0.9963 4.903 0.9964 4.860 0.9353 20.716 0.9350 20.768 0.9865 9.409 0.9868 9.310

2

2

2

2

θ

2

r

θ

r

θ

r

θ

r

1.0000 0.9998 0.9994 0.9995 1.0000 0.9997 0.9998 0.9998 1.0000 0.9999 0.9993 0.9998 1.0000 1.0000 0.9996 0.9998

0.044 1.148 2.012 1.766 0.202 1.402 1.076 1.075 0.436 0.972 2.218 1.192 0.379 0.184 1.596 1.060

1.0000 0.9999 0.9996 0.9997 1.0000 1.0000 0.9999 0.9996 1.0000 0.9999 0.9997 0.9998 1.0000 1.0000 1.0000 0.9998

0.036 0.955 1.619 1.360 0.057 0.517 0.821 1.583 0.479 0.956 1.466 1.098 0.382 0.173 0.505 1.005

1.0000 0.9990 0.9947 0.9898 0.9875 0.9695 0.9871 0.8354 0.9961 0.9920 0.8544 0.9912 0.9944 0.9961 0.9378 0.9855

0.116 2.601 5.887 8.205 9.072 14.18 9.225 33.34 5.094 7.237 31.302 7.602 6.058 5.085 20.317 9.769

1.0000 0.209 1.0000 0.088 0.9995 1.816 0.9991 2.395 0.9967 4.639 0.9977 3.896 0.9929 6.828 0.9981 3.488 0.9882 8.809 0.9999 0.6775 0.9932 6.697 0.9988 2.787 1.0000 0.355 0.9980 3.612 0.9996 1.688 0.9997 1.386 0.9966 4.735 0.9999 0.9674 0.9920 7.273 0.9993 2.142 0.8612 30.552 0.9944 6.073 0.9933 6.621 0.9990 2.610 0.9957 5.322 0.9999 0.748 0.9963 4.945 1.0000 0.440 0.9103 24.453 0.9970 4.656 0.9879 8.909 0.9995 1.775

r

θ

2

2

2

θ

2

r

θ

r

θ

r

1.0000 0.9994 0.9985 0.9988 1.0000 0.9998 0.9995 0.9985 0.9998 0.9993 0.9974 0.9992 0.9999 1.0000 0.9992 0.9994

0.074 1.950 3.135 2.804 0.251 1.063 1.764 3.160 1.001 2.088 4.094 2.315 0.820 0.437 2.234 2.002

1.000 0.9969 0.9967 0.4121 0.9830 0.9688 0.9883 0.0226 0.9729 0.9531 0.9363 0.6694 0.9783 0.9701 0.9765 0.8600

0.188 4.485 4.631 65.661 10.591 14.347 8.761 88.706 13.369 17.610 20.559 47.982 11.947 14.054 12.451 30.685

1.0000 0.248 0.9999 0.600 0.9988 2.801 0.9964 4.893 0.9966 4.693 0.9948 5.830 0.9935 6.552 0.4229 64.980 0.9891 8.484 0.9917 7.374 0.9930 6.760 0.9946 5.930 0.9999 0.576 0.9975 4.021 0.9990 2.575 0.0097 89.447 0.9967 4.681 0.9830 10.587 0.9909 7.739 0.9684 14.431 0.863 30.350 0.9850 9.927 0.9929 6.814 0.6527 49.252 0.9956 5.372 0.9923 7.115 0.9962 4.969 0.9924 7.046 0.9291 21.704 0.9934 6.603 0.9894 8.341 0.8266 34.251

r

θ

r2

θ

1.000 0.9986 0.9968 0.9972 1.000 0.9995 0.9988 0.9965 0.9996 0.9983 0.9919 0.9980 0.9997 0.9999 0.9971 0.9986

0.117 2.987 4.534 4.294 0.535 1.726 2.770 4.812 1.560 3.321 7.320 3.633 1.308 0.790 4.374 2.981

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

Table 4 MCR-ALS analysis (with different constraints) of datasets with heteroscedastic (proportional) noise and comparison of resolved proﬁles with theoretical ones.

1 = non-negativity, 2 = quadrilinearity. u = score proﬁles; the ﬁrst number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc., the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third proﬁle in 2nd mode.

9

10

DQCP1

Constrainta Data mode

st

1 mode

2nd (variable) mode

3rd mode

4th mode

a b

DQCP2

DQCP3

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA– MCR-ALSQ

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA– MCR-ALSQ

MCR-ALSB

MLPCA–MCRALSB

MCR-ALSQ

MLPCA– MCR-ALSQ

1

1

1, 2

1, 2

1

1

1, 2

1, 2

1

1

1, 2

1, 2

lof (%)

3.71

6.30

10.03

10.08

5.18

13.91

20.24

20.36

6.12

21.41

30.07

30.12

R2 (%)

99.86

99.60

98.99

98.99

99.73

98.06

95.91

95.86

99.63

95.42

90.96

90.93

2

2

2

2

2

2

2

2

2

2

2

Proﬁlesb

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r

θ

r2

θ

u11 u12 u13 u14 u21 u22 u23 u24 u31 u32 u33 u34 u41 u42 u43 u44

1.0000 0.9997 0.9967 0.9833 0.9911 0.9883 0.9896 0.9891 0.9963 0.9932 0.8646 0.9919 0.9944 0.9962 0.9312 0.9877

0.15 1.44 4.68 10.47 7.67 8.77 8.27 8.45 4.95 6.705 30.16 7.28 6.06 4.97 21.38 9.01

1.000 0.9997 0.9972 0.9835 0.9900 0.9925 0.9997 0.9976 0.9970 0.9937 0.8749 0.9925 0.9959 0.9966 0.919 0.9887

0.18 1.32 4.32 10.43 8.09 7.04 1.36 4.00 4.43 6.45 28.97 7.04 5.20 4.72 23.27 8.61

1.0000 0.9998 0.9993 0.9955 1.0000 0.9997 0.9998 0.9998 0.9999 0.9997 0.9988 0.9992 0.9999 1.0000 0.9996 0.9994

0.04 1.23 2.09 5.42 0.20 1.32 1.12 1.15 0.58 1.40 2.85 2.28 0.74 0.34 1.62 1.93

1.0000 0.9998 0.9993 0.9957 1.0000 0.9999 1.000. 0.9998 0.9999 0.9997 0.999 0.9993 0.9999 1.0000 0.9998 0.9995

0.04 1.14 2.07 5.29 0.15 0.72 0.57 1.23 0.59 1.40 2.54 2.18 0.68 0.35 1.17 1.86

1.0000 0.9980 0.9940 0.9785 0.9875 0.9691 0.9864 0.7869 0.9959 0.9916 0.8533 0.9884 0.9941 0.9954 0.9374 0.9847

0.11 2.85 6.31 11.90 9.08 14.28 9.48 38.10 5.21 7.42 31.43 8.72 6.21 5.49 20.38 10.02

1.000 0.9987 0.9959 0.9908 0.9874 0.9693 0.9889 0.9986 0.9956 0.9928 0.8597 0.9921 0.9933 0.9966 0.9095 0.9888

0.14 2.91 5.19 7.77 9.11 14.23 8.55 3.02 5.37 6.89 30.72 7.20 6.63 4.75 24.56 8.57

1.0000 0.9991 0.9976 0.9924 0.9999 0.9989 0.9982 0.9999 0.9998 0.9991 0.9935 0.9979 0.9999 1.0000 0.9966 0.9993

0.09 2.43 3.94 7.07 0.67 2.69 3.48 0.87 1.11 2.42 6.54 3.72 0.98 0.49 4.75 2.14

1.0000 0.9993 0.9980 0.9954 1.0000 0.9999 0.9998 0.9993 0.9998 0.9992 0.9958 0.9986 0.9999 1.000 0.9984 0.9992

0.08 2.09 3.65 5.51 0.41 0.92 1.27 2.07 1.09 2.31 5.24 2.81 0.98 0.48 3.24 2.23

1.0000 0.9968 0.9968 0.4137 0.9824 0.9858 0.9886 0.0221 0.9744 0.9538 0.9362 0.6754 0.9794 0.9702 0.9766 0.8640

0.22 4.56 4.59 65.56 10.78 9.68 8.66 88.74 12.99 17.49 20.58 47.52 11.66 14.03 12.41 30.23

1.000 0.9986 0.9972 0.9863 0.9914 0.9924 0.9983 0.9964 0.9973 0.9920 0.8795 0.9923 0.9956 0.9966 0.9286 0.9912

0.24 3.07 4.29 9.50 7.53 7.06 3.39 4.87 4.22 7.25 28.41 7.11 5.35 4.74 21.78 7.61

0.9987 0.9971 0.9953 0.3986 0.9982 0.9945 0.9975 0.0056 0.9784 0.9680 0.9823 0.6464 0.9947 0.9916 0.9932 0.8288

2.97 4.36 5.54 66.51 3.40 6.03 4.03 89.68 11.92 14.54 10.79 49.73 5.92 7.44 6.70 34.02

1.0000 0.9985 0.9962 0.9946 0.9999 0.9998 0.9995 0.9985 0.9996 0.9982 0.9887 0.9979 0.9997 0.9999 0.9952 0.9988

0.12 3.14 4.99 5.97 0.77 1.19 1.79 3.16 1.64 3.45 8.61 3.72 1.40 0.80 5.60 2.75

1 = non-negativity, 2 = quadrilinearity. u = score proﬁles; the ﬁrst number after ‘u’ indicates the mode of the dataset e.g. 1 = 1st mode, 2 = 2nd mode…etc., the second number after ‘u’ indicates the number of the component e.g. ‘u23’ = third proﬁle in 2nd mode.

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

Table 5 MCR-ALS analysis (with different constraints) of datasets with constant-proportional error and comparison of resolved proﬁles with theoretical ones.

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

11

(a) 1 0.8 0.6

0.8

1

1

0.6

0.8

0.8

0.6

0.6

0.4 0.4

0.4

0.2

0.2

0.2

0

0

0.4 0.2 0 1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

0

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

(b) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

Fig. 2. Recovery of proﬁles by MCR-ALS with the quadrilinearity constraint (MCR-ALSQ) (black line with symbols) in the presence of homoscedastic noise from dataset (a) DQH2 (lofth ≈ 20%), and (b) DQH3 (lofth ≈ 30%), and comparison with pure proﬁles (blue line). (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

from other sources of variance. Inclusion of MLPCA as a pre-processing step in the MCR-ALS algorithm with the quadrilinearity constraint (MLPCA–MCR-ALSQ), increases the resolution efﬁciency of the method providing reliable true proﬁles even at the noise level of about 30% lof. MLPCA uses a maximum likelihood projection and separates measurement noise variance from other sources of variance and obtains a more accurate estimate of the component subspace [14]; this in turn helps the applied model to achieve better recovery of the proﬁles. These results further suggest that in general, resolved proﬁles are more affected by the level of noise as compared to the type of noise in the three cases under study.

constraint (Table 5). At a higher level of noise (lofth = 30.19%, dataset DQCP3), only MLPCA–MCR-ALSQ could recover the reliable proﬁles comparable to the theoretical one (Table 5, Fig. 4). MCR-ALSQ also recovered the proﬁles to some extent, but they were completely unreliable after the third component (Fig. 4). MCR-ALSB and MLPCA–MCRALSB provided lower lof values (6.12% and 21.41%) and relatively higher R2 (99.63% and 95.42%) but again the recovered proﬁles were associated with ambiguity which was eliminated with the inclusion of the quadrilinearity constraint in the algorithm (Table 5). Overall, recovery of proﬁles and their comparison with theoretical ones, in the presence of different types and different levels of noise suggests that MCR-ALS with the quadrilinearity constraint (MCR-ALSQ) is an efﬁcient way to deal with four-way quadrilinear data. Analysis of the recovered proﬁles shows that MCR-ALSQ is able to handle the noise efﬁciently to a level of about 20% lofth. At this noise level (lof ≈ 20%), a very little difference was still observed between proﬁles recovered by the MCR-ALSQ and MLPCA–MCR-ALSQ models. This conﬁrms the property of MLPCA to separate measurement noise variance

5. Conclusions This research work entails that resolution of reliable proﬁles can only be obtained, when the applied model and data structures match perfectly, and ambiguity associated with bilinear models can be eliminated by the use of higher linearity constraints depending on the structure of

(a) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

0 1

2

3

4

5

6

7

8

0

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

(b) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2 0 1

2

3

4

5

6

7

8

9

0

0 1

2

3

4

5

6

7

8

9

0 1

2

3

4

5

6

7

8

9

Fig. 3. Recovery of proﬁles in variable mode of dataset DQP3 in the presence of heteroscedastic noise (lofth ≈ 30%), by: (a) MCR-ALS with the quadrilinearity constraint (MCR-ALSQ) (black line with symbols) and (b) MCR-ALS with the quadrilinearity constraint applied on MLPCA projected data (MLPCA–MCR-ALSQ) (black line with symbols), and comparison with pure proﬁles (blue line). (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

12

A. Malik, R. Tauler / Chemometrics and Intelligent Laboratory Systems xxx (2014) xxx–xxx

(a) 1

1

1

1

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

0

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

(b) 1

1

1

1

0.5

0.5

0.5

0.5

0 0

2

4

6

8

10

0 0

2

4

6

8

10

0

0 0

2

4

6

8

10

0

2

4

6

8

10

Fig. 4. Recovery of proﬁles in variable mode of dataset DQCP3 in the presence of constant-proportional noise (lofth ≈ 30%), by: (a) MCR-ALS with the quadrilinearity constraint (MCRALSQ) (black line with symbols), and (b) MCR-ALS with the quadrilinearity constraint applied on MLPCA projected data (MLPCA–MCR-ALSQ) (black line with symbols), and comparison with pure proﬁles (blue line). (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

the dataset to be analyzed. Further, this work validates the efﬁciency and reliability of the MCR-ALSQ model to analyze four-way quadrilinear datasets in the presence of different levels and types of noise. The applicability of MLPCA as a pre-processing step prior to MCR-ALSQ analysis was also proven to resolve adequately the proﬁles associated with high levels of heteroscedastic, and constant-proportional type of noise. The results suggest that MCR-ALS with the quadrilinear constraint can be used efﬁciently to analyze four-way quadrilinear datasets. Moreover, MLPCA can be conveniently used before MCR-ALS when high levels of heteroscedastic and constant-proportional types of noise are present. Acknowledgements AM gratefully acknowledges the Juan de la Cierva Post-Doctoral research grant (JCI-2011-10895), and support from the, Ministerio de Economia y Competividad, Spain (CTQ2012-38616-C02-01 grant). Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.chemolab.2014.04.002. References [1] R. Tauler, M. Maeder, A. de Juan, in: S.D. Brown, R. Tauler, B. Walczak (Eds.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, 2, Elsevier, Amsterdam, 2009, pp. 473–505, (Chapter 2.24). [2] R. Tauler, Multivariate curve resolution applied to second order data, Chemometr. Intell. Lab. Syst. 30 (1995) 133–146. [3] R. Tauler, I. Marques, E. Casassas, Multivariate curve resolution applied to three-way trilinear data: study of a spectroﬂuorimetric acid–base titration of salicylic acid at three excitation wavelengths, J. Chemometr. 12 (1998) 55–75. [4] M. Alier, M. Felipe, I. Hernández, R. Tauler, Trilinearity and component interaction constraints in the multivariate curve resolution investigation of NO and O3 pollution in Barcelona, Anal. Bioanal. Chem. 399 (2011) 2015–2029. [5] R. Tauler, A. Smilde, B. Kowalski, Selectivity, local rank, 3-way data analysis and ambiguity in multivariate curve resolution, J. Chemometr. 9 (1995) 31–58. [6] J. Jaumot, R. Gargallo, A. de Juan, R. Tauler, A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB, Chemometr. Intell. Lab. Syst. 76 (2005) 101–110. [7] A. de Juan, R. Tauler, Comparison of three-way resolution methods for non-trilinear data sets, J. Chemometr. 15 (2001) 749–771.

[8] E. Peré-Trepat, A. Ginebreda, R. Tauler, Comparison of different multiway methods for the analysis of geographical metal distributions in ﬁsh, sediments and river waters in Catalonia, Chemometr. Intell. Lab. Syst. 88 (2007) 69–83. [9] M. Alier, R. Tauler, Multivariate curve resolution of incomplete data multisets, Chemometr. Intell. Lab. Syst. 127 (2013) 17–28. [10] A. Malik, R. Tauler, Extension and application of multivariate curve resolution-alternating least squares to four-way quadrilinear data-obtained in the investigation of pollution patterns on Yamuna River, India—a case study, Anal. Chim. Acta. 794 (2013) 20–28. [11] J. Quackenbush, Computational analysis of microarray data, Nat. Rev. Genet. 2 (2001) 418–427. [12] C.K. Wierling, M. Steinfath, T. Elge, S. Schulze-Kremer, P. Aanstad, M. Clark, H. Lehrach, R. Herwig, Simulation of DNA array hybridization experiments and evaluation of critical parameters during subsequent image and data analysis, BMC Bioinforma. 3 (2002). [13] M. Nykter, T. Aho, M. Ahdesmäki, P. Ruusuvuori, A. Lehmussola, O. Yli-Harja, Simulation of microarray data with realistic characteristics, BMC Bioinforma. 7 (July 18 2006) 349. [14] P.D. Wentzell, S. Hou, Exploratory data analysis with noisy measurements, J. Chemometr. 26 (2012) 264–281. [15] P.D. Wentzell, T.K. Karakach, S. Roy, M.J. Martinez, C.P. Allen, M. WernerWashburne, Multivariate curve resolution of time course microarray data, BMC Bioinforma. 7 (2006) 343. [16] R. Tauler, M. Viana, X. Querol, A. Alastuey, R.M. Flight, P.D. Wentzell, P.K. Hopke, Comparison of the results obtained by four receptor modelling methods in aerosol source apportionment studies, Atmos. Environ. 43 (2009) 3989–3997. [17] M. Dadashi, H. Abdollahi, R. Tauler, Application of maximum likelihood multivariate curve resolution to noisy data sets, 272013. 34–41. [18] P.D. Wentzell, D.T. Andrews, D.C. Hamilton, K. Faber, B.R. Kowalski, Maximum likelihood principal component analysis, J. Chemometr. 11 (1997) 339–366. [19] P.D. Wentzell, M.T. Lohnes, Maximum likelihood principal component analysis with correlated measurement errors: theoretical and practical considerations, Chemometr. Intell. Lab. Syst. 45 (1999) 65–85. [20] P.D. Wentzell, Other topics in soft-modelling: maximum likelihood-based softmodelling methods, in: S.D. Brown, R. Tauler, B. Walczak (Eds.), Comprehensive Chemometrics: Chemical and Biochemical Data Analysis, Elsevier Ltd., Amsterdam, 2009, pp. 507–558. [21] M. Dadashi, H. Abdollahi, R. Tauler, Maximum likelihood principal component analysis as initial projection step in multivariate curve resolution analysis of noisy data, Chemometr. Intell. Lab. Syst. 118 (2012) 33–40. [22] R. Tauler, D. Barcelo, Multivariate curve resolution applied to liquid chromatography–diode array detection, Trends Anal. Chem. 12 (1993) 319–327. [23] A. Smilde, R. Bro, P. Geladi, Multi-way Analysis: Applications in the Chemical Sciences, Wiley, 2004, ISBN 978-0-471-98691-1. [24] M. Terrado, D. Barcelo, R. Tauler, Quality assessment of the multivariate curve resolution alternating least squares method for the investigation of environmental pollution patterns in surface water, Environ. Sci. Technol. 43 (2009) 5321–5326. [25] J. Jaumot, R. Tauler, MCR-BANDS: a user friendly MATLAB program for the evaluation of rotation ambiguities in multivariate curve resolution, Chemometr. Intell. Lab. Syst. 103 (2010) 96–107.

Please cite this article as: A. Malik, R. Tauler, Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets, Chemometr. Intell. Lab. Syst. (2014), http://dx.doi.org/10.1016/j.chemolab.2014.04.002

Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets

Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets

Recommend Documents