Accepted Manuscript Application of a sparseness constraint in multivariate curve resolution – Alternating least squares Siewert Hugelier, Sara Piqueras, Carmen Bedia, Anna de Juan, Cyril Ruckebusch PII:
S0003-2670(17)30958-3
DOI:
10.1016/j.aca.2017.08.021
Reference:
ACA 235394
To appear in:
Analytica Chimica Acta
Received Date: 24 March 2017 Revised Date:
25 July 2017
Accepted Date: 19 August 2017
Please cite this article as: S. Hugelier, S. Piqueras, C. Bedia, A. de Juan, C. Ruckebusch, Application of a sparseness constraint in multivariate curve resolution – Alternating least squares, Analytica Chimica Acta (2017), doi: 10.1016/j.aca.2017.08.021. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Application of a Sparseness Constraint in Multivariate Curve Resolution –
2
Alternating Least Squares.
3
Siewert Hugelier1,*, Sara Piqueras2,3, Carmen Bedia3, Anna de Juan2, Cyril Ruckebusch1
4
1
Université de Lille, Sciences et Technologies, LASIR, CNRS, F-59000 Lille, France
5
2
Chemometrics group, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain
6
3
Department of Environmental Chemistry, IDAEA-CSIC, Calle Jordi Girona 18-26, 08034 Barcelona, Spain
7
*
Corresponding author. Tel.: +33 320 43 66 61; E-mail address:
[email protected]
8
Abstract
9
The use of sparseness in chemometrics is a concept that has increased in popularity. The advantage is,
10
above all, a better interpretability of the results obtained. In this work, sparseness is implemented as a
11
constraint in multivariate curve resolution – alternating least squares (MCR–ALS), which aims at
12
reproducing raw (mixed) data by a bilinear model of chemically meaningful profiles. In many cases, the
13
mixed raw data analyzed are not sparse by nature, but their decomposition profiles can be, as it is the
14
case in some instrumental responses, such as mass spectra, or in concentration profiles linked to
15
scattered distribution maps of powdered samples in hyperspectral images. To induce sparseness in the
16
constrained profiles, one-dimensional and/or two-dimensional numerical arrays can be fitted using a
17
basis of Gaussian functions with a penalty on the coefficients. In this work, a least squares regression
18
framework with L0-norm penalty is applied. This L0-norm penalty constrains the number of non-null
19
coefficients in the fit of the array constrained without having an a priori on the number and their
20
positions. It has been shown that the sparseness constraint induces the suppression of values linked to
21
uninformative channels and noise in MS spectra and improves the location of scattered compounds in
22
distribution maps, resulting in a better interpretability of the constrained profiles. An additional benefit of
AC C
EP
TE D
M AN U
SC
RI PT
1
1
ACCEPTED MANUSCRIPT
the sparseness constraint is a lower ambiguity in the bilinear model, since the major presence of null
24
coefficients in the constrained profiles also helps to limit the solutions for the profiles in the counterpart
25
matrix of the MCR bilinear model.
26
Keywords: MCR–ALS, Sparseness, Hyperspectral Image Analysis, Mass Spectrometry.
AC C
EP
TE D
M AN U
SC
RI PT
23
2
ACCEPTED MANUSCRIPT
27
1. Introduction Sparseness is a mathematical property that corresponds to the presence of many null elements
29
in arrays of numerical data. Within chemometrics, sparseness is very often linked to the concept of
30
simplicity. Indeed, many classical chemometric methods aiming at exploratory, regression or
31
classification purposes, such as Principal Component Analysis (PCA) [1],[2], Partial Least Squares
32
Regression (PLS) [3],[4] or PLS-Discriminant Analysis (PLS-DA) [5],[6] have their sparse versions. In these
33
contexts, the purpose of using sparseness is to enhance relevant information and facilitate the final
34
qualitative interpretation of the results obtained [7].
SC
RI PT
28
The use of sparseness in multivariate curve resolution methods responds to a completely
36
different spirit. There are indeed situations in mixture analysis problems where pure spectral signals or
37
concentration profiles can be naturally defined as sparse. In these situations, to faithfully reproduce
38
original mixed raw data by a bilinear model of chemically meaningful profiles, sparseness can be
39
proposed as a potential constraint. For instance, mass spectra of pure compounds are sparse since few
40
m/z channels provide relevant non-null signals and the rest of small numerical readings are related to
41
noise. When thinking about concentration profiles, there may be components of the mixture with only
42
few non-null concentration values, e.g., a component is absent in most pixels of a hyperspectral image
43
(HSI), or simply scattered across the total sample surface. In this scenario, applying sparseness to these
44
profiles responds to the true nature of the measurement and is potentially useful to improve the final
45
resolution results.
TE D
EP
AC C
46
M AN U
35
In this work, we apply sparseness as a constraint in the multivariate curve resolution –
47
alternating least squares (MCR–ALS) [8],[9] algorithm. As any other constraint, the main reason to use it
48
is providing the resolved profiles with chemical / physical meaning and decreasing the rotational
49
ambiguity linked to the MCR solutions [10]. As with the implementation of other constraints in the MCR–
50
ALS framework, such as non-negativity, unimodality or closure, the sparseness constraint can be used on
3
ACCEPTED MANUSCRIPT
some or all the profiles of the different compounds in the concentration and/or spectral direction. The
52
use of the sparseness constraint does not require that the original raw data are, by nature, sparse, but
53
only some or all of the compound profiles in the underlying bilinear model. Moreover, the way the
54
constraint is implemented does not assume the number of non-null coefficients, nor their position when
55
reproducing the signal. It should be noted that the use of sparseness cannot and should not be
56
generalized and it has to be applied only when the pure compound profiles in the underlying bilinear
57
model fulfill this property. The implementation of the sparseness constraint works by fitting the sparsest
58
representation of the signal given a basis of Gaussian functions with fixed width. This can be achieved
59
through regularized linear regression. The sparse reproduction of the signal obtained at each step of the
60
ALS is then used to advance in the MCR–ALS analysis. L1 regularization [11] of the spectral profiles in
61
MCR–ALS has been previously tackled by Pomareda et al. [12]. In this work we propose solving the
62
problem by L0-norm penalized regression [13], as is used in sparse deconvolution [14],[15]. De Rooi and
63
Eilers [14] show that using an L0-norm penalty as opposed to an L1-norm penalty prevents shrinkage of
64
the main coefficients, while making adjacent coefficients zero. This results in a more sparse fit for the
65
same fitting error. An additional difference with the work from Pomareda et al. [12] is that the constraint
66
is also extended to two-dimensional signals, by considering a tensor product of Gaussians as basis. This
67
allows the application of the proposed constraint onto the distribution maps in HSI. Again, it is worth
68
stressing that this approach does not require the raw data to be sparse, since only the profiles of the
69
suitable compounds are constrained. This makes the application of sparseness fast, since few profiles of
70
pure compounds are constrained with respect to the high dimensions of the raw original data. A direct
71
positive effect of the sparseness constraint is improving the definition of the constrained profiles, since
72
sparseness is a natural property of them. As for any other constraint, this statement holds as long as
73
sparsity is a natural property of the constrained profiles. Besides, the presence of many null coefficients
74
in the constrained profiles induces low rank or selective windows in the related concentration and/or
AC C
EP
TE D
M AN U
SC
RI PT
51
4
ACCEPTED MANUSCRIPT
75
spectral matrices. As a consequence, a better recovery of all profiles in the bilinear model [16] and a
76
decrease of the rotational ambiguity is achieved [8],[10]. Within the remote sensing community, the application of sparseness in unmixing methods is not
78
an entirely new concept either [17]-[19]. Sparse non-negative matrix factorization (NMF) has attracted a
79
lot of attention recently and focuses on controlling the sparseness of the factors corresponding to the
80
abundance of the components [20],[21]. This approach generally gives more interpretable factors
81
because environmental elements are usually localized in limited spatial zones and thus it reduces
82
rotational ambiguity. Other approaches were focused either on solving a regression-based problem
83
among raw data and a library of spectra [18] or on finding the most sparse combination of compound
84
assignments to define the composition of every pixel in an image [22].
M AN U
SC
RI PT
77
The approach presented will be applied to real representative examples taken with mass
86
spectrometry and FTIR imaging to test the potential of the algorithm in the definition of the constrained
87
profiles and in the general improvement of the description of the full mixture system.
88
2. Theory
89
2.1.
TE D
85
Multivariate curve resolution – alternating least squares (MCR–ALS)
The development and application of the MCR–ALS routine has covered the resolution of data
91
sets of very diverse nature [23]-[26]. It assumes the expression of the Beer-Lambert law so that a bilinear
92
model can be used to decompose a mixed data set into a model related to the pure individual
93
contributions. Doing so allows us to write the system as in (1), in which the matrix D (m x n), containing
94
m mixed spectra collected at n spectral channels can be reproduced by a bilinear model of k
95
components, formed by concentration profiles, represented by matrix C (m x k), and their related
96
spectral profiles, represented by the matrix ST (k x n). The matrix E (m x n) represents the error
97
contribution of the model.
AC C
EP
90
5
ACCEPTED MANUSCRIPT =
+
(1)
MCR–ALS is an iterative alternating least-squares algorithm that resolves concentration and
99
spectral profiles under the action of constraints and can be used on a single data set or multiset
100
structures formed by several concatenated data matrices [9],[26]. Constraints can be very diverse (e.g.,
101
non-negativity, unimodality, closure, etc.) and can be optionally applied to all or to some of the
102
compounds in the system analyzed in the concentration and/or spectral direction. The experimental data
103
matrix D is optimally fitted by the iterative procedure and the lack of fit (LOF) is minimized ∑
% =
,
,
,
,
M AN U
∑
SC
RI PT
98
(2)
104
where dn,j and en,j are the (n x j)-th elements of D and E, respectively. Additionally, the percentage of
105
variance explained is also used to assess the results ²=
∑
,
,
∑
,
−∑ ,
,
,
(3)
MCR–ALS has seen an increased use in the analysis of HSI [27]-[29]. The main characteristic of
107
images is that they are defined by two spatial directions (x-coordinate and y-coordinate for each pixel)
108
and a spectral direction, i.e., as a data cube. The spectra are collected for each pixel and contain the
109
chemical information of the constituents present at a given pixel position. Knowing this, it is indeed true
110
that the bilinear model in (1) still holds for the data pixels. This implies that the three-dimensional data
111
cube has to be transformed into a data matrix to be analyzed by MCR–ALS analysis, as shown in Figure 1
112
[27]-[29]. As a consequence of the cube unfolding operation, the original pixel neighborhood information
113
is lost in the data matrix D and this makes the use of constraints that take into consideration spatial
114
information more difficult. Until recently, spatial constraints were almost limited to local rank constraints
115
based on the analysis of pixel neighborhood areas, done with auxiliary algorithms outside the MCR–ALS
116
framework [30]. To facilitate the introduction of constraints linked to spatial information within MCR–
117
ALS, a novel approach was proposed [31] based on refolding the concentration profiles to their
AC C
EP
TE D
106
6
ACCEPTED MANUSCRIPT
corresponding distribution map at each least squares step. In this way, the pixel neighborhood and
119
spatial information is restored and the global spatial features of the two-dimensional image can be used
120
as information to further resolve the data into its individual components under the form of spatial
121
constraints. Virtually, any image processing could be implemented in this way as a constraint (e.g.
122
smoothness, etc.), as long as the characteristic used is related to the physical or chemical nature of the
123
pure component profiles. Validity of the solution obtained applying a constraint can be assessed
124
checking figures of merit related to the quality of the fit (see (2) and (3)) and the chemical meaning of
125
the recovered profiles. Additionally, by introducing these characteristics, such as sparseness, as a
126
constraint, the MCR–ALS analysis will more quickly reach a stable solution (i.e. less iterations are needed
127
for convergence).
128
M AN U
SC
RI PT
118
INSERT FIGURE 1
129
2.2.
Sparse regularization
The condition of sparseness can be applied as a constraint to model one-dimensional profiles,
131
e.g., MS spectra, or to 2-D refolded profiles, such as distribution maps from HSI, in MCR–ALS models.
132
Below, the formulation to constrain a signal profile (both one-dimensional and two-dimensional) is
133
described in detail.
EP
134
TE D
130
To set the constraint, we look for a sparse representation of the initial signal profile …
, i.e. a representation of y with the fewest non-null coefficients " = #
135
136
given basis of functions Ø (N x N), where the signal can be written as
AC C
!
= "Ø + %
… #
!
=
, for a
(4)
137
The problem of recovering the sparsest representation of the reconstruction coefficients in x is a very
138
general problem, which can be tackled by solving a regularized linear inverse problem. Although this
139
optimization problem is complex, it can be dealt with in many different ways, including the use of
140
compressed sensing [32] or Bayesian approaches [33]. In this work, another approach is applied, which 7
ACCEPTED MANUSCRIPT
has been often applied in a sparse deconvolution context [14],[15],[34]. The matrix Ø is a basis of
142
Gaussians with fixed width that translates into a matrix in which each descending diagonal from left to
143
right is constant (i.e. a Toeplitz matrix structure, see Fig. 2a). When using such a matrix, a pseudo-inverse
144
solution of x can be found " = Ø′Ø
Ø
'( )
= Ø*
RI PT
141
(5)
145
In this work, the matrix Ø is a Gaussian basis with a fixed width. To introduce sparseness in x, an L0-norm
146
penalty can be added to the least squares objective function, leading to
SC
S = min23‖ − "Ø‖ + 5|"|7 8 /∈ℝ
(6)
in which |"|7 = 1 if # ≠ 0 and 0 otherwise. The parameter λ is called the sparseness factor, which
148
defines the trade-off between the signal reconstruction error and the degree of sparseness. The range of
149
the penalty parameter is within 10-8–108, depending on the signal treated.
M AN U
147
150
INSERT FIGURE 2
In (6), the L0-norm constrains the sum of number of non-null elements of x. It forces the
152
elements of x towards 0, unless the elements have a strong contribution to the reconstruction. What
153
follows is a sparse solution of x. A characteristic property of using this type of penalty is that there is no a
154
priori information on the positions (nor number) of the coefficients to be non-null. In practice, solving
155
the equation in (6) is a complex, non-convex and non-differentiable optimization problem, and can only
156
be solved by using an approximation of the L0-norm, as described in [35],[36]. In short, the L0 penalty is a
157
member of the Lp-penalty family, with 0 ≤ = ≤ 2. When = = 2, the penalty added to the least squares
158
objective function is well known as the ridge penalty (5‖"‖ ) [37], for which a closed form solution for x
159
exists of the following form:
AC C
EP
TE D
151
" = Ø) Ø + 5?
'
Ø) ,
(7)
160
where ? = @, the identity matrix. It was shown before [35],[36] that the L0-norm penalty can be
161
approximated by a weighted form of Ridge regression 8
ACCEPTED MANUSCRIPT C "D S = min2 A‖ − "‖ + 5"′B /∈ℝ
(8)
C is a diagonal matrix with elements E The matrix B F = 1⁄G#H + I J. The constant I is added to reduce
163
the number of iterations to convergence of the solution and to avoid numerical instabilities. The explicit
164
C. The iterative process updates #H and E F solution for (6) is then given once again by (7), but now B = B
165
in an alternating way, starting from the unweighted ridge regression until convergence of the solution.
RI PT
162
This framework works well in practice, as shown in Figure 3. This figure represents an illustrative
167
example of the sparse signal fitting on a pre-defined Gaussian basis for a series of sparse signals. Figures
168
3a and 3b relate to a sparse signal formed by wide bands, whereas figure 3c shows the situation for a
169
sparse signal with very narrow features, as an MS spectrum could be. In the top panel (Fig. 3a), the signal
170
(gray line) has been fitted on a basis of Gaussians with a width equal to half of the Full Width at Half
171
Maximum (FWHM). Five non-null coefficients have been obtained. The sum of the individual
172
contributions (colored lines) is the reproduced signal (black line; an offset has been added) and it is clear
173
that uninformative channels (i.e. noise) have been removed. In the middle panel (Fig. 3b), the same
174
signal has been fitted on a basis of Gaussians with a width equal to one fifth of the FWHM. As can be
175
seen, there are more non-null coefficients (21 to be precise), but this has as advantage that the original
176
data can be reproduced better. It also shows that wider peaks making up the spectra can be modeled as
177
a linear superposition of more narrow Gaussians. This can be used as an advantage during analysis, as
178
the true width of the Gaussian-like peaks that make up the signal is not known. Additionally, using a
179
basis of narrow Gaussians allows fitting mass spectrometry-like data, in which the signals are narrow by
180
nature (lower panel; Fig. 3c). Therefore, a basis of Gaussians with narrow FWHM will be used because
181
they adapt to any kind of sparse signal, independently on the width of the non-null signal features.
182
AC C
EP
TE D
M AN U
SC
166
INSERT FIGURE 3
9
ACCEPTED MANUSCRIPT
183
When dealing with two-dimensional signals, the exact same principles as for one-dimensional
184
sparse reproduction can be used, but the model for an image Y (I x J) is slightly different as there are now
185
two dimensions to be taken into account.
186
where the matrix R = SN̅L
PQ T
= M M N̅L P
Q
PQ #PQ
+
L
RI PT
L
(9)
is a four-dimensional array that connects each element of Y to all the
elements of X. In order to solve the system, the data matrix Y is vectorized to y (i x j, 1) and the matrix R
188
is mapped to C (ij x ij). This matrix has a special structure (see Fig. 2b), reflecting the fact that the basis is
189
assumed not to vary with position (see [15] for more information). To construct this matrix, two-
190
dimensional tensor products of Gaussians are used. To solve the vectorized version of (9), the same
191
algorithms as described above (equations (6) to (8)) can be applied.
192
3. Experimental
193
3.1.
M AN U
SC
187
TE D
Mass spectrometry imaging on a green bean sample
Sample preparation and acquisition. The green bean (Phaseolus vulgaris) sample used for
195
analysis was initially humidified for 4 hours wrapped in wet cotton. Then, the bean was flash frozen in
196
liquid nitrogen and stored at -80 °C. The tissue was mounted in a cutting block using an Optimal Cutting
197
Temperature embedding medium (OTC, TissueTek), sliced at 15 µm thickness with a cryostat (Leica CM
198
3050S) and placed directly onto indium tin oxide–coated glass slides (Bruker Daltonik GmbH, Bremen,
199
Germany). The application of 2,5-dihydroxybenzoic acid (Sigma-Aldrich, Spain) matrix onto the slides was
200
performed by sublimation [38]. Spectra were acquired using an Autoflex III MALDI-TOF/TOF instrument
201
(Bruker Daltonik GmbH) equipped with a Smartbeam laser operated at 200-Hz laser repetition rate at the
202
”large focus” setting. The positive reflector ion mode was used in the 400 to 2000 m/z range, with an
203
m/z resolution of 0.001 a.m.u with a laser raster set to 150 μm along both x- and y-axis. The final
204
acquired image size was 34 by 72 pixels.
AC C
EP
194
10
ACCEPTED MANUSCRIPT
Image preprocessing. In mass spectrometry imaging (MSI), preprocessing is crucial for a
206
preliminary data reduction and removal of clearly uninformative m/z channels. An adaptation of the
207
detection of regions of interest (ROI) method by Bedia et al. [39] was applied. This selection only
208
considers m/z channels showing a significant signal in a preset number of pixel spectra for further
209
analysis. To do so, the selection of a noise threshold, a mass tolerance error and the minimum number of
210
spectra that should include a significant reading for a particular m/z value to be kept have to be defined.
211
In this study, m/z channels with an intensity over 0.1 % of the maximum signal were considered different
212
from noise, a 0.1 mass tolerance error was accepted to consider two m/z values to be the same and,
213
hence, to belong to the same spectral ROI and the presence of five significant readings associated with a
214
particular m/z channel was considered to be kept as a ROI. This leads to a data size reduction from 56789
215
m/z values per pixel to 8627. The data compression is then followed by a baseline correction by
216
asymmetric least squares [40], spectral alignment by Correlation Optimized Warping [41] and balance of
217
signal intensity variability by normalization (Euclidean norm) and scaling of the spectra. The scaling of the
218
m/z spectral channels was performed by dividing each reading by the standard deviation of the channel
219
readings plus 0.1 % of the maximum signal of the spectrum obtained by the sum of the selected purest
220
mass spectra. After proper preprocessing, it can be seen that the ROI between 1150 and 2000 do not
221
contain relevant information and, hence, this m/z range was removed from further analysis. The final
222
dataset then considers 3750 m/z values per pixel. Figure 4a shows the compression and pre-processing
223
of the data in a schematic way. The raw data is shown at the beginning of the processing pipeline, while
224
the compressed and corrected data is shown at the end.
SC
M AN U
TE D
EP
AC C
225
RI PT
205
3.2.
Infrared spectroscopy imaging on a pharmaceutical sample
226
Sample preparation and acquisition. A pharmaceutical sample containing three products (citric
227
acid, acetylsalicylic acid and caffeine) has been carefully prepared by grinding the products together with
228
a pestle and mortar. After a homogenous powder mixture is obtained, it was analyzed by using a Thermo 11
ACCEPTED MANUSCRIPT
Scientific Nicolet iN 10 MX Infrared imaging microscope. The microscope is equipped with standard
230
single element MCT detector. A spectrum was taken for every pixel of the image within the 4000 – 675
231
cm-1 range with a spectral resolution of 4 cm-1 (1725 wavenumbers). The pixel size was set at 25 µm x 25
232
µm and 32 scans were accumulated per pixel. The final acquired image size was 41 by 37 pixels.
RI PT
229
Image preprocessing. As the acquired image is quite noisy and contains an almost flat baseline
234
with an offset, some preprocessing has to be performed. In this case, it was chosen to perform a
235
smoothing of the signal by using weighted least squares, followed by a removal of the minimum value of
236
each spectrum. After proper preprocessing was performed, the fingerprint region of the acquired data
237
was selected (1800 – 675 cm-1) to facilitate analysis even further. Figure 4b shows the processing
238
pipeline, with the acquired data at the beginning and the corrected data at the end.
M AN U
SC
233
239 240
INSERT FIGURE 4 3.3.
Sparseness constraint in MCR–ALS
The sparseness constraint was implemented in the MCR–ALS command line code
242
(https://mcrals.wordpress.com/download/mcr-als-command-line/) and all calculations were performed
243
in MATLAB version R2014a (The MathWorks, Natick, MA, USA) on a computer with an Intel(R) Core(TM)
244
i7-4770 CPU @ 3.40GHz CPU and 16 GB RAM memory (64-bit Windows 7).
EP
TE D
241
The critical point of applying the constraint during the MCR–ALS analysis is the sparseness factor,
246
λ, which varies according to the degree of natural sparseness of each of the constituents analyzed, since
247
the shape of the signal (e.g. intensity, number of peaks, etc.) for each of them may be different. Different
248
ways to select the optimal λ have been proposed, as for example a cross-validation approach used in
249
[12], where the optimal λ is selected as the one around the change in root mean square error (RMSE)
250
slope. However, automatic optimization remains an open issue for sparsity with L0-norm due to
251
discontinuity and there is ongoing work to search for alternative approaches. Throughout the
252
manuscript, we do an MCR–ALS analysis with non-negativity constraint on the distribution maps and
AC C
245
12
ACCEPTED MANUSCRIPT
spectral profiles and assess the results (check the lack of fit and explained variance obtained). We then
254
fit the profiles of the constituents that should be sparse (albeit C or St profiles) individually by applying
255
the sparse regularization routine outside the MCR–ALS analysis and optimize the sparseness factor by
256
visual inspection. After completing this step, MCR–ALS analysis with non-negativity and sparseness
257
constraint is performed, for which, at each iteration, the constrained profiles are fitted with a fixed
258
sparseness factor (obtained from this previous step).
4. Results
260
4.1.
SC
259
RI PT
253
M AN U
Mass spectrometry imaging on a green bean sample
MCR–ALS analysis was performed on the compressed and corrected MSI image. First, the
262
number of components of the preprocessed MSI images was set at five, chosen by Singular Value
263
Decomposition (SVD) and the initial estimates were determined on the mass spectra profiles by a
264
SIMPLISMA-based method [42]. Preliminary MCR–ALS with a non-negativity constraint on the
265
concentration distribution maps and the spectral profiles was performed on the green bean data.
266
Normalization of spectra in ST matrix (using the Euclidean norm) was also added. The LOF obtained was
267
15.3 % (explained variance 97.6 %). Figure 5a shows the distribution maps and spectral profiles of the
268
constituents of the bean sample. The spectral profiles of the constituents reveal that the main mass
269
peaks are found together with many noisy channels. Bearing in mind the nature of MS signals, we know
270
that the resolved spectral profiles of the constituents must be sparse signals with only a few significant
271
sharp peaks. Therefore, the use of the sparseness constraint should here improve the definition of m/z
272
spectral signatures and, as a consequence, the overall description of the data set.
AC C
EP
TE D
261
273
INSERT FIGURE 5
274
After the preliminary MCR–ALS analysis of the bean MSI under non-negativity constraints, the analysis
275
was carried out applying the sparseness constraint to the spectral profiles. Optimal sparseness factors
13
ACCEPTED MANUSCRIPT
are obtained as described in Section 3.3. For a better understanding of the results, the sparseness factor
277
chosen for each component (see caption Fig. 5) has been ‘translated’ to a sparseness ratio for the
278
respective components, indicating the number of non-null values in each spectral channel divided by the
279
total number of spectral channels (in %). After applying MCR–ALS with non-negativity in C and ST and
280
sparseness constraints on the spectral direction, the sparseness ratios for the resolved MS profiles of all
281
constituents were 10.2 % – 1.2 % – 7.1 % – 3.4 % – 17.6 % , as opposed to ~ 100 % for all the constituents
282
during the preliminary MCR–ALS analysis under only non-negativity constraints. The results are shown in
283
Figure 5b. It should be clear that introducing the sparseness constraint on the spectral profiles during
284
analysis leads to a better definition of resolved spectral profiles and, consequently, a sharper definition
285
of the distribution maps as well. The definition of the spatial structure has improved especially in the first
286
and the fifth image constituents, whereas the spectral profiles of the other components have been
287
cleaned up considerably. For instance, the distribution map of component 5 in Figure 5b showed a clear
288
accumulation of its constituents in the seed coat. The spectral profile revealed two representative mass
289
values, 710.1 m/z and 724.9 m/z, which were tentatively assigned to the flavonoids Kaempferide 3-
290
rhamnoside-7-(6''-succinylglucoside) and Kaempferol 3-(2''-p-coumaryl-rhamnoside)-7-rhamnoside,
291
respectively. In seeds, flavonoids have been seen to accumulate in the coats where they have important
292
functions in the induction of seed coat-imposed dormancy, as well as in seed longevity and quality of
293
seeds [43]. The distribution maps of components 2 to 4 were left relatively unchanged. The LOF obtained
294
for this analysis, 23.7 % (explained variance: 94.4 %), has increased as compared to the previous analysis.
295
This can be explained because the sparseness constraint removes many values related to non-
296
informative m/z channels, which contribute in a noticeable way to the total variance measured.
297
Nevertheless, the final variance explained is satisfactory given the natural S/N ratio of MS
298
measurements.
AC C
EP
TE D
M AN U
SC
RI PT
276
14
ACCEPTED MANUSCRIPT
299
4.2.
Infrared spectroscopy imaging on a pharmaceutical sample
For this data set, the number of components is known to be three and initial estimates were
301
determined by using a SIMPLISMA-based method. An MCR–ALS analysis on the pre-processed data was
302
first performed by using non-negativity as a constraint, on both the C and ST matrix and normalization of
303
the ST profiles. The resolved distribution maps and spectra can be seen in Figure 6a (LOF: 7.9 %;
304
explained variance: 99.4 %). The sparseness ratio for the component distribution maps is defined here as
305
the ratio of the non-null pixels over the total number of pixels and is 55.4 % – 76.3 % – 86.2 % for the
306
respective components. When looking at the spectral profiles of the resolved components, the profiles
307
of citric acid and acetyl salicylic acid (components 1 and 2) can be clearly distinguished, since the peak
308
positions correspond with the IR spectra found in the database of the National Institute of Standards and
309
Technology (NIST) [44]. On the other hand, the spectrum of caffeine cannot be clearly distinguished in
310
component 3. What can be seen from the component distribution maps is that their contributions are
311
scattered all across the map, as is expected for unstructured powdered samples. This means that the
312
pattern of the components is expected to have localized zones of presence and absence and, therefore,
313
the sparseness constraint can be appropriately applied.
SC
M AN U
TE D
INSERT FIGURE 6
EP
314
RI PT
300
In order to enhance the zones of presence of the model compounds, and to improve the
316
spectrum obtained for the third component, MCR–ALS is performed by applying the sparseness
317
constraint to the distribution maps (sparseness factors, tuned by visual inspection as described in Section
318
3.3: see caption Fig. 6). The results are shown in Figure 6b (LOF: 8.9 %; explained variance: 99.2 %). In
319
this case, since IR measurements have a better S/N ratio than MS spectra and the reduction of non-null
320
measurements as compared with the initial solution is not as high as in the MS example, the LOF is
321
practically not modified by the inclusion of the sparseness constraint. This fact supports the suitability of
322
applying this constraint using the sparseness factor adopted in the analysis. Thus, when calculating the
AC C
315
15
ACCEPTED MANUSCRIPT
sparseness ratio for the solution using the sparseness constraint, values of 33.1 % – 33.3 % – 86.0 % are
324
obtained for the different compounds. This shows that the component distribution maps are sparser
325
than before, with the exception of the third component. The removal of the very low concentration
326
values in the citric acid and acetylsalicylic acid distribution maps has significantly reduced the zones of
327
rank overlap among compounds and, particularly, with caffeine. As a consequence of the presence of
328
zones with lower rank or even being selective, an improvement of the spectral profiles recovered can be
329
noticed. A closer look at the spectral profiles of citric acid and acetylsalicylic acid would indeed reveal an
330
improvement in the intensity ratios of the different peaks with respect to the reference found in the
331
NIST database [44]. When focusing on the third component, an improvement can also be noticed. The
332
resolution of the peaks in the spectrum has increased, clearly revealing two peaks (at ~ 1690 cm-1 and ~
333
1725 cm-1) that can be assigned to caffeine. Additionally, it is worth noting that when the distribution
334
map of this third component is not constrained by sparseness, the sparseness ratio increases to 93.2 %
335
and the spectrum does not reveal the two caffeine peaks anymore (results not shown). This led to the
336
conclusion that it is necessary to constrain all three component distribution maps in this data set.
SC
M AN U
TE D
337
RI PT
323
5. Concluding remarks
A constraint that induces sparseness on the signals has been implemented in MCR–ALS for the
339
resolution of complex multicomponent systems. It can be used on both spectral profiles and component
340
distribution maps and is thus fully applicable to hyperspectral images. To induce sparseness in the signal
341
fitted, a penalized least squares regression framework is used that constrains the number of non-null
342
coefficients. The sparseness factor, λ, is the only parameter to regulate within this framework, making it
343
a flexible and powerful tool to adapt the degree of sparseness to the nature of the different signals and
344
compounds within a data set.
AC C
EP
338
345
Sparseness can be used in combination with other appropriate constraints. The results that are
346
shown correspond to cases where sparsity is desirable in either the resolved concentration profiles or 16
ACCEPTED MANUSCRIPT
the resolved spectra. The real examples tested have shown the usefulness of this constraint in two clear
348
sparse scenarios, i.e., MS signals and scattered compounds in distribution maps, and have covered
349
examples linked to pure compound profiles associated with 1D and 2D numerical arrays. Whether the
350
resolved spectra or contributions should be sparse depend on the analytical situations and on the prior
351
knowledge available regarding the physical nature of the component profiles. It should be clear that the
352
solution is valid only at the hypothesis that sparseness is present in the profiles (but not necessarily in
353
the data). However, the current constraint can be envisioned as potentially applicable to other
354
interesting problems, such as environmental source apportionment analysis, where the composition
355
profiles of pollution sources may lack certain contaminants and the related apportionment profiles may
356
have absent contributions in certain geographical points or seasons.
M AN U
SC
RI PT
347
A relevant asset of the sparseness constraint is the direct effect on the constrained profiles and
358
the indirect positive effect on the rest of the resolved profiles of the system. Thus, to provide a more
359
appropriate and simpler description of naturally sparse signals or concentration profiles, many elements
360
of the profiles are set to have null values. Compound absence in resolution implies lowering the rank of
361
certain concentration and signal windows. As a consequence of the decrease of rank overlap among
362
compounds, the chances to recover unique profiles both in the constrained profile direction and in the
363
profiles of the counterpart matrix of the bilinear model are greatly increased.
EP
6. Acknowledgements
AC C
364
TE D
357
365
The authors thank Paul Eilers for his involvement in the development of the algorithms and appreciate
366
the fruitful discussions on the topic. C. R. thanks the Agence National de la Recherche (ANR-15-CE09-
367
0020-01 Blink). C.B. and A.J have received funding from the European Research Council under the
368
European Union's Seventh Framework Program (FP/2007-2013) / ERC Grant Agreement n. 32073
369
(CHEMAGEB project). S.P. acknowledges the support of the Spanish government through project
370
CTQ2015-66254-C2-2-P. 17
ACCEPTED MANUSCRIPT
377 378 379 380 381 382 383 384 385 386 387 388 389
RI PT
376
semidefinite programming, Siam. Rev. 49 (2007) 434–448.
[3] H. Chun, S. Keles, Sparse partial least squares regression for simultaneous dimension reduction and variable selection, J. Roy. Stat. Soc. B. Met. 72 (2010) 3–25.
SC
375
[2] A. d’Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet, A direct formulation for sparse PCA using
[4] D. Lee, W. Lee, Y. Lee, Y. Pawitan, Sparse partial least squares regression and its applications to highthroughput data analysis, Chemom. Intell. Lab. Syst. 109 (2011) 1–8.
M AN U
374
265–286.
[5] K.-A. Le Cao, S. Boitard, P. Besse, Sparse PLS discriminant analysis : biologically relevant feature selection and graphical displays for multiclass problems, BMC Bioinformatics 12 (2011) 253–268. [6] D. Chung, S. Keles, Sparse partial least squares classification for high dimensional data, Stat. Appl. Genet. Mol. 9 (2010) article 17.
TE D
373
[1] H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graph Stat. 15 (2006)
[7] I. Jolliffe, N. Trendafilov, M. Uddin, A modified principal component technique based on the LASSO, J. Comput. Graph. Stat. 12 (2003) 531–547.
[8] R. Tauler, A. Smilde, B. Kowalski, Selectivity, local rank, three-way data analysis and ambiguity in
EP
372
7. References
multivariate curve resolution, J. Chemometrics 9 (1995) 31–58. [9] R. Tauler, Multivariate curve resolution applied to second order data, Chemom. Intell. Lab. Sys. 30
AC C
371
(1995) 133–146.
390
[10]A. Golshan, H. Abdollahi, S. Beyramysoltan, M. Maeder, K. Neymeyr, R. Rajkó, M. Sawall, R. Tauler, A
391
review of recent methods for the determination of ranges of feasible solutions resulting from soft
392
modelling analyses of multivariate data, Anal. Chim. Acta 911 (2016) 1 – 13.
393 394
[11]R. Tibshirani, Regression shrinkage and selection via the Lasso, J. Roy. Stat. Soc. B Met. 58 (1996) 267– 288.
18
ACCEPTED MANUSCRIPT
395 396
[12]V. Pomareda, D. Calvo, A. Pardo, S. Marco, Hard modeling multivariate curve resolution using LASSO: application to ion mobility spectra, Chemom. Intell. Lab. Syst. 104 (2010) 318–332. [13]A. Tikhonov, V. Arsenin, Solution of Ill-posed Problems, Winston & Sons, Washington, 1977.
398
[14]J. De Rooi, P. Eilers, Deconvolution of pulse trains with the L0 penalty, Anal. Chim. Acta 705 (2011) 218
401 402 403 404 405
[15]J. De Rooi, C. Ruckebusch, P. Eilers, Sparse deconvolution in one and two dimensions : applications in endocrinology and single-molecule fluorescence imaging, Anal. Chem. 86 (2014) 6291–6298.
SC
400
– 226.
[16]R. Manne, On the resolution problem in hyphenated chromatography, Chemom. Intell. Lab. Sys. 27 (1995) 89–94.
M AN U
399
RI PT
397
[17]M.-D. Iordache, J. Bioucas-Dias, A. Plaza, Sparse unmixing of hyperspectral data, IEEE T. Geosci. Remote 49 (2011) 2014–2039.
[18]J. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, J. Chanussot, Hyperspectral
407
unmixing overview : geometrical, statistical, and sparse regression-based approaches, IEEE J. Sel. Top.
408
Appl. 5 (2012) 354–379.
410
[19]X.-F. Zhu, B. Li, J.-D. Wang, L1-norm sparse learning and its application, Appl. Mech. Mater. 88–89 (2011) 379–385.
EP
409
TE D
406
[20]N. Gillis, R. Plemmons, Sparse nonnegative matrix underapproximation and its application to
412
hyperspectral image analysis, Proc. Third IEEE Worskshop on Hyperspectral Image and Signal
413
Processing: Evolution in Remote Sensing (WHISPERS) 1 (2011) 11–14.
414 415
AC C
411
[21]S. Jia, Y. Qian, Constrained nonnegative matrix factorization for hyperspectral unmixing, IEEE T. Geosci. Remote 47 (2009) 161–173.
416
[22]M.-D. Iordache, A. Plaza, J. Bioucas-Dias, On the use of spectral libraries to perform sparse unmixing
417
of hyperspectral data, Proc. Second IEEE Worskshop on Hyperspectral Image and Signal Processing:
418
Evolution in Remote Sensing (WHISPERS) 1 (2010) 1–4.
19
ACCEPTED MANUSCRIPT
420 421 422 423 424
[23]C. Ruckebusch, L. Blanchet, Multivariate curve resolution: A review of advanced and tailored applications and challenges, Anal. Chim. Acta. 765 (2007) 28–36. [24]J. Jaumot, A. de Juan, R. Tauler, MCR–ALS GUI 2.0 : new features and applications, Chemom. Intell. Lab. Syst. 140 (2015) 1–12.
RI PT
419
[25]A. de Juan, R. Tauler, Multivariate Curve Resolution (MCR) from 2000: Progress in Concepts and Applications, Crit. Rev. Anal. Chem. 36 (2006) 163–176.
[26]A. de Juan, S. Rutan, M. Maeder, R. Tauler, Multivariate Curve Resolution Chapters, in: D. Brown, R.
426
Tauler, B. Walczak (Eds.), Comprehensive Chemometrics, vol. 2, Elsevier, Amsterdam, 2009, pp. 207–
427
558.
M AN U
SC
425
428
[27]A. de Juan, S. Piqueras, M. Maeder, T. Hancewicz, L. Duponchel, R. Tauler, Chemometric tools for
429
image analysis, in: R. Salzer, H. Siesler (Eds.), Infrared and Raman Spectroscopic Imaging, Wiley-VCH
430
Verlag GmbH & Co. KGaA, Weinheim, Germany, 2014, pp. 57–110.
[28]L. Duponchel, W. Elmi-Rayaleh, C. Ruckebush, J.-P. Huvenne, Multivariate curve resolution methods in
432
imaging spectroscopy: Influence of extraction methods and instrumental perturbations, J. Chem. Inf.
433
Comput. Sci. 43 (2003) 2057 – 2067.
TE D
431
[29]A. de Juan, R. Tauler, R. Dyson, C. Marcolli, M. Rault, M. Maeder, Spectroscopic imaging and
435
chemometrics: a powerful combination for global and local sample analysis, TrAC – Trend. Anal.
436
Chem. 23 (2004) 70–79.
438
AC C
437
EP
434
[30]A. de Juan, M. Maeder, T. Hancewicz, R. Tauler, Use of local rank-based spatial information for resolution of spectroscopic images, J. Chemometrics 22 (2008), 291–298.
439
[31]S. Hugelier, O. Devos, C. Ruckebusch, On the implementation of spatial constraints in multivariate
440
curve resolution alternating least squares for hyperspectral image analysis, J. Chemometrics 29 (2015)
441
557–561.
20
ACCEPTED MANUSCRIPT
445 446 447 448 449 450 451 452 453
[33]C. Soussen, J. Idier, D. Brie, J. Duan, From Bernoulli-Gaussian deconvolution to sparse signal restoration, IEEE Trans. Signal Process. 59 (2011) 4572–4584.
RI PT
444
299.
[34]S. Hugelier, P. Eilers, O. Devos, C. Ruckebusch, Improved super-resolution microscopy imaging by sparse deconvolution with an inter-frame penalty, J. Chemom. 2017. 31: e2847.
[35]M. Osborne, B. Presnell, B. Turlach, On the lasso and its dual, J. Comput. Graph. Stat. 9 (2000) 319–
SC
443
[32]D. Ge, X. Jiang, Y. Ye, A note on the complexity of Lp minimization, Math. Program. 129 (2011) 285–
337.
[36]F. Frommlet, G. Nuel, An adaptative ridge procedure for L0 regularization, PLoS ONE 11 (2016) e0148620–e0148642.
M AN U
442
[37]A. Hoerl, R. Kennard, Ridge regression: biased estimation for non-orthogonal problems, Technometrics 12 (1970) 55–67.
[38]A. Ly, C. Schöne, M. Becker, J. Rattke, S. Meding, M. Aichler, D. Suckau, A. Walch, S.M. Hauck, M.
455
Ueffing, High-resolution MALDI mass spectrometric imaging of lipids in the mammalian retina,
456
Histochem. Cell Biol. 143 (2015) 453–462.
[39]C. Bedia, R. Tauler, J. Jaumot, Compression strategies for the chemometric analysis of mass
EP
457
TE D
454
spectrometry imaging data, J. Chemom. 30 (2016) 575–588.
459
[40]P. Eilers, A perfect smoother, Anal. Chem. 75 (2003) 3631–3636.
460
[41]G. Tomasi, F. van den Berg, C. Andersson, Correlation optimized warping and dynamic time warping
461
AC C
458
as preprocessing methods for chromatographic data, J. Chem. 18 (2004) 231–241.
462
[42]W. Windig, J. Guilment, Interactive self-modeling mixture analysis, Anal. Chem. 63 (1991) 1425–1432.
463
[43]I. Debeaujon, K. Léon-Kloosterziel, M. Koornneef, Influence of the testa on seed dormancy,
464
germination and longevity in Arabidopsis, Plant Physiol. 122 (2000), 403–413.
21
ACCEPTED MANUSCRIPT
465 466
[44]National Institute of Standards and Technology (NIST), http://webbook.nist.gov/chemistry/ [last accessed: 8 February 2017].
AC C
EP
TE D
M AN U
SC
RI PT
467
22
ACCEPTED MANUSCRIPT
8. Figures
469
Figure 1
M AN U
SC
RI PT
468
470
Figure 1 Schematic representation of a multivariate curve resolution – alternating least squares (MCR–
472
ALS) analysis for hyperspectral images. The refolding/unfolding step during each least squares step
473
allows the input of global spatial features as constraints.
EP AC C
474
TE D
471
23
ACCEPTED MANUSCRIPT
Figure 2
SC
RI PT
475
M AN U
476 477
Figure 2 Schematic representation of the Gaussian bases (fixed width) used for (sparse) signal fitting. In
478
(a) a Toeplitz matrix (zero end condition) for one-dimensional signals and in (b) the banded matrix for
479
two-dimensional signals.
AC C
EP
TE D
480
24
ACCEPTED MANUSCRIPT
Figure 3
M AN U
SC
RI PT
481
482
Figure 3 Illustrative example of sparse signal processing. A series of sparse signals is solved by (a) sparse
484
fitting with a broad Gaussian basis (σ = FWHM/2) (5 non-null coefficients) and (b) sparse fitting with a
485
narrow Gaussian basis (σ = FWHM/5) (21 non-null coefficients). In (c) a sparse signal for which only a
486
narrow Gaussian basis can be used (6 non-null coefficients). Gray: noisy data, black: reproduced data,
487
colors: individual contributions.
EP
AC C
488
TE D
483
25
ACCEPTED MANUSCRIPT
Figure 4
M AN U
SC
RI PT
489
490
Figure 4 Schematic representation of the pre-processing pipeline for (a) the mass spectrometry imaging
492
of a bean sample and (b) the infrared spectroscopy imaging of a pharmaceutical powdered sample. The
493
raw data is shown on the left, while the final compressed and corrected data is shown on the right.
EP AC C
494
TE D
491
26
ACCEPTED MANUSCRIPT
495
M AN U
SC
RI PT
Figure 5
496
Figure 5 MCR–ALS analysis of green bean data, obtained by mass spectrometry imaging with (a) non-
498
negativity as a constraint (calculation time: 22.6 seconds (19 iterations); LOF: 15.3 %) and (b) non-
499
negativity and sparseness of the spectral profiles as a constraint (Sparseness factors: 0.001, 0.006, 0.002,
500
0.004 and 0.001 for the respective components) (calculation time: 160.6 seconds (9 iterations); LOF: 23.7
501
%).
EP AC C
502
TE D
497
27
ACCEPTED MANUSCRIPT
Figure 6
AC C
EP
TE D
M AN U
SC
RI PT
503
504 505
Figure 6 MCR–ALS analysis of a pharmaceutical powdered sample, obtained by infrared imaging with (a)
506
non-negativity as a constraint (calculation time: 11.2 seconds (41 iterations); LOF: 7.9 %) and (b) non-
28
ACCEPTED MANUSCRIPT
negativity and sparseness of the component distribution maps as a constraint (Sparseness factor: 0.005
508
for all components) (calculation time: 93.2 seconds (22 iterations); LOF: 8.9 %).
AC C
EP
TE D
M AN U
SC
RI PT
507
29
ACCEPTED MANUSCRIPT
Highlights 1) A constraint has been developed for MCR-ALS that uses the natural sparseness of the decomposition profiles to help resolving complex multicomponent systems.
dimensional and/or two-dimensional numerical arrays.
RI PT
2) The constraint uses an L0-penalized least squares framework and can be applied on one-
3) The sparseness constraint has a direct effect on the constrained profiles and an indirect
AC C
EP
TE D
M AN U
SC
positive effect on the rest of the resolved profiles of the system.