Application of a sparseness constraint in multivariate curve resolution – Alternating least squares

Application of a sparseness constraint in multivariate curve resolution – Alternating least squares

Accepted Manuscript Application of a sparseness constraint in multivariate curve resolution – Alternating least squares Siewert Hugelier, Sara Piquera...

3MB Sizes 0 Downloads 26 Views

Accepted Manuscript Application of a sparseness constraint in multivariate curve resolution – Alternating least squares Siewert Hugelier, Sara Piqueras, Carmen Bedia, Anna de Juan, Cyril Ruckebusch PII:

S0003-2670(17)30958-3

DOI:

10.1016/j.aca.2017.08.021

Reference:

ACA 235394

To appear in:

Analytica Chimica Acta

Received Date: 24 March 2017 Revised Date:

25 July 2017

Accepted Date: 19 August 2017

Please cite this article as: S. Hugelier, S. Piqueras, C. Bedia, A. de Juan, C. Ruckebusch, Application of a sparseness constraint in multivariate curve resolution – Alternating least squares, Analytica Chimica Acta (2017), doi: 10.1016/j.aca.2017.08.021. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Application of a Sparseness Constraint in Multivariate Curve Resolution –

2

Alternating Least Squares.

3

Siewert Hugelier1,*, Sara Piqueras2,3, Carmen Bedia3, Anna de Juan2, Cyril Ruckebusch1

4

1

Université de Lille, Sciences et Technologies, LASIR, CNRS, F-59000 Lille, France

5

2

Chemometrics group, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain

6

3

Department of Environmental Chemistry, IDAEA-CSIC, Calle Jordi Girona 18-26, 08034 Barcelona, Spain

7

*

Corresponding author. Tel.: +33 320 43 66 61; E-mail address: [email protected]

8

Abstract

9

The use of sparseness in chemometrics is a concept that has increased in popularity. The advantage is,

10

above all, a better interpretability of the results obtained. In this work, sparseness is implemented as a

11

constraint in multivariate curve resolution – alternating least squares (MCR–ALS), which aims at

12

reproducing raw (mixed) data by a bilinear model of chemically meaningful profiles. In many cases, the

13

mixed raw data analyzed are not sparse by nature, but their decomposition profiles can be, as it is the

14

case in some instrumental responses, such as mass spectra, or in concentration profiles linked to

15

scattered distribution maps of powdered samples in hyperspectral images. To induce sparseness in the

16

constrained profiles, one-dimensional and/or two-dimensional numerical arrays can be fitted using a

17

basis of Gaussian functions with a penalty on the coefficients. In this work, a least squares regression

18

framework with L0-norm penalty is applied. This L0-norm penalty constrains the number of non-null

19

coefficients in the fit of the array constrained without having an a priori on the number and their

20

positions. It has been shown that the sparseness constraint induces the suppression of values linked to

21

uninformative channels and noise in MS spectra and improves the location of scattered compounds in

22

distribution maps, resulting in a better interpretability of the constrained profiles. An additional benefit of

AC C

EP

TE D

M AN U

SC

RI PT

1

1

ACCEPTED MANUSCRIPT

the sparseness constraint is a lower ambiguity in the bilinear model, since the major presence of null

24

coefficients in the constrained profiles also helps to limit the solutions for the profiles in the counterpart

25

matrix of the MCR bilinear model.

26

Keywords: MCR–ALS, Sparseness, Hyperspectral Image Analysis, Mass Spectrometry.

AC C

EP

TE D

M AN U

SC

RI PT

23

2

ACCEPTED MANUSCRIPT

27

1. Introduction Sparseness is a mathematical property that corresponds to the presence of many null elements

29

in arrays of numerical data. Within chemometrics, sparseness is very often linked to the concept of

30

simplicity. Indeed, many classical chemometric methods aiming at exploratory, regression or

31

classification purposes, such as Principal Component Analysis (PCA) [1],[2], Partial Least Squares

32

Regression (PLS) [3],[4] or PLS-Discriminant Analysis (PLS-DA) [5],[6] have their sparse versions. In these

33

contexts, the purpose of using sparseness is to enhance relevant information and facilitate the final

34

qualitative interpretation of the results obtained [7].

SC

RI PT

28

The use of sparseness in multivariate curve resolution methods responds to a completely

36

different spirit. There are indeed situations in mixture analysis problems where pure spectral signals or

37

concentration profiles can be naturally defined as sparse. In these situations, to faithfully reproduce

38

original mixed raw data by a bilinear model of chemically meaningful profiles, sparseness can be

39

proposed as a potential constraint. For instance, mass spectra of pure compounds are sparse since few

40

m/z channels provide relevant non-null signals and the rest of small numerical readings are related to

41

noise. When thinking about concentration profiles, there may be components of the mixture with only

42

few non-null concentration values, e.g., a component is absent in most pixels of a hyperspectral image

43

(HSI), or simply scattered across the total sample surface. In this scenario, applying sparseness to these

44

profiles responds to the true nature of the measurement and is potentially useful to improve the final

45

resolution results.

TE D

EP

AC C

46

M AN U

35

In this work, we apply sparseness as a constraint in the multivariate curve resolution –

47

alternating least squares (MCR–ALS) [8],[9] algorithm. As any other constraint, the main reason to use it

48

is providing the resolved profiles with chemical / physical meaning and decreasing the rotational

49

ambiguity linked to the MCR solutions [10]. As with the implementation of other constraints in the MCR–

50

ALS framework, such as non-negativity, unimodality or closure, the sparseness constraint can be used on

3

ACCEPTED MANUSCRIPT

some or all the profiles of the different compounds in the concentration and/or spectral direction. The

52

use of the sparseness constraint does not require that the original raw data are, by nature, sparse, but

53

only some or all of the compound profiles in the underlying bilinear model. Moreover, the way the

54

constraint is implemented does not assume the number of non-null coefficients, nor their position when

55

reproducing the signal. It should be noted that the use of sparseness cannot and should not be

56

generalized and it has to be applied only when the pure compound profiles in the underlying bilinear

57

model fulfill this property. The implementation of the sparseness constraint works by fitting the sparsest

58

representation of the signal given a basis of Gaussian functions with fixed width. This can be achieved

59

through regularized linear regression. The sparse reproduction of the signal obtained at each step of the

60

ALS is then used to advance in the MCR–ALS analysis. L1 regularization [11] of the spectral profiles in

61

MCR–ALS has been previously tackled by Pomareda et al. [12]. In this work we propose solving the

62

problem by L0-norm penalized regression [13], as is used in sparse deconvolution [14],[15]. De Rooi and

63

Eilers [14] show that using an L0-norm penalty as opposed to an L1-norm penalty prevents shrinkage of

64

the main coefficients, while making adjacent coefficients zero. This results in a more sparse fit for the

65

same fitting error. An additional difference with the work from Pomareda et al. [12] is that the constraint

66

is also extended to two-dimensional signals, by considering a tensor product of Gaussians as basis. This

67

allows the application of the proposed constraint onto the distribution maps in HSI. Again, it is worth

68

stressing that this approach does not require the raw data to be sparse, since only the profiles of the

69

suitable compounds are constrained. This makes the application of sparseness fast, since few profiles of

70

pure compounds are constrained with respect to the high dimensions of the raw original data. A direct

71

positive effect of the sparseness constraint is improving the definition of the constrained profiles, since

72

sparseness is a natural property of them. As for any other constraint, this statement holds as long as

73

sparsity is a natural property of the constrained profiles. Besides, the presence of many null coefficients

74

in the constrained profiles induces low rank or selective windows in the related concentration and/or

AC C

EP

TE D

M AN U

SC

RI PT

51

4

ACCEPTED MANUSCRIPT

75

spectral matrices. As a consequence, a better recovery of all profiles in the bilinear model [16] and a

76

decrease of the rotational ambiguity is achieved [8],[10]. Within the remote sensing community, the application of sparseness in unmixing methods is not

78

an entirely new concept either [17]-[19]. Sparse non-negative matrix factorization (NMF) has attracted a

79

lot of attention recently and focuses on controlling the sparseness of the factors corresponding to the

80

abundance of the components [20],[21]. This approach generally gives more interpretable factors

81

because environmental elements are usually localized in limited spatial zones and thus it reduces

82

rotational ambiguity. Other approaches were focused either on solving a regression-based problem

83

among raw data and a library of spectra [18] or on finding the most sparse combination of compound

84

assignments to define the composition of every pixel in an image [22].

M AN U

SC

RI PT

77

The approach presented will be applied to real representative examples taken with mass

86

spectrometry and FTIR imaging to test the potential of the algorithm in the definition of the constrained

87

profiles and in the general improvement of the description of the full mixture system.

88

2. Theory

89

2.1.

TE D

85

Multivariate curve resolution – alternating least squares (MCR–ALS)

The development and application of the MCR–ALS routine has covered the resolution of data

91

sets of very diverse nature [23]-[26]. It assumes the expression of the Beer-Lambert law so that a bilinear

92

model can be used to decompose a mixed data set into a model related to the pure individual

93

contributions. Doing so allows us to write the system as in (1), in which the matrix D (m x n), containing

94

m mixed spectra collected at n spectral channels can be reproduced by a bilinear model of k

95

components, formed by concentration profiles, represented by matrix C (m x k), and their related

96

spectral profiles, represented by the matrix ST (k x n). The matrix E (m x n) represents the error

97

contribution of the model.

AC C

EP

90

5

ACCEPTED MANUSCRIPT =

+

(1)

MCR–ALS is an iterative alternating least-squares algorithm that resolves concentration and

99

spectral profiles under the action of constraints and can be used on a single data set or multiset

100

structures formed by several concatenated data matrices [9],[26]. Constraints can be very diverse (e.g.,

101

non-negativity, unimodality, closure, etc.) and can be optionally applied to all or to some of the

102

compounds in the system analyzed in the concentration and/or spectral direction. The experimental data

103

matrix D is optimally fitted by the iterative procedure and the lack of fit (LOF) is minimized ∑

% =

,

,

,

,

M AN U



SC

RI PT

98

(2)

104

where dn,j and en,j are the (n x j)-th elements of D and E, respectively. Additionally, the percentage of

105

variance explained is also used to assess the results ²=



,

,



,

−∑ ,

,

,

(3)

MCR–ALS has seen an increased use in the analysis of HSI [27]-[29]. The main characteristic of

107

images is that they are defined by two spatial directions (x-coordinate and y-coordinate for each pixel)

108

and a spectral direction, i.e., as a data cube. The spectra are collected for each pixel and contain the

109

chemical information of the constituents present at a given pixel position. Knowing this, it is indeed true

110

that the bilinear model in (1) still holds for the data pixels. This implies that the three-dimensional data

111

cube has to be transformed into a data matrix to be analyzed by MCR–ALS analysis, as shown in Figure 1

112

[27]-[29]. As a consequence of the cube unfolding operation, the original pixel neighborhood information

113

is lost in the data matrix D and this makes the use of constraints that take into consideration spatial

114

information more difficult. Until recently, spatial constraints were almost limited to local rank constraints

115

based on the analysis of pixel neighborhood areas, done with auxiliary algorithms outside the MCR–ALS

116

framework [30]. To facilitate the introduction of constraints linked to spatial information within MCR–

117

ALS, a novel approach was proposed [31] based on refolding the concentration profiles to their

AC C

EP

TE D

106

6

ACCEPTED MANUSCRIPT

corresponding distribution map at each least squares step. In this way, the pixel neighborhood and

119

spatial information is restored and the global spatial features of the two-dimensional image can be used

120

as information to further resolve the data into its individual components under the form of spatial

121

constraints. Virtually, any image processing could be implemented in this way as a constraint (e.g.

122

smoothness, etc.), as long as the characteristic used is related to the physical or chemical nature of the

123

pure component profiles. Validity of the solution obtained applying a constraint can be assessed

124

checking figures of merit related to the quality of the fit (see (2) and (3)) and the chemical meaning of

125

the recovered profiles. Additionally, by introducing these characteristics, such as sparseness, as a

126

constraint, the MCR–ALS analysis will more quickly reach a stable solution (i.e. less iterations are needed

127

for convergence).

128

M AN U

SC

RI PT

118

INSERT FIGURE 1

129

2.2.

Sparse regularization

The condition of sparseness can be applied as a constraint to model one-dimensional profiles,

131

e.g., MS spectra, or to 2-D refolded profiles, such as distribution maps from HSI, in MCR–ALS models.

132

Below, the formulation to constrain a signal profile (both one-dimensional and two-dimensional) is

133

described in detail.

EP

134

TE D

130

To set the constraint, we look for a sparse representation of the initial signal profile …

, i.e. a representation of y with the fewest non-null coefficients " = #

135



136

given basis of functions Ø (N x N), where the signal can be written as

AC C

!

= "Ø + %

… #

!

=

, for a

(4)

137

The problem of recovering the sparsest representation of the reconstruction coefficients in x is a very

138

general problem, which can be tackled by solving a regularized linear inverse problem. Although this

139

optimization problem is complex, it can be dealt with in many different ways, including the use of

140

compressed sensing [32] or Bayesian approaches [33]. In this work, another approach is applied, which 7

ACCEPTED MANUSCRIPT

has been often applied in a sparse deconvolution context [14],[15],[34]. The matrix Ø is a basis of

142

Gaussians with fixed width that translates into a matrix in which each descending diagonal from left to

143

right is constant (i.e. a Toeplitz matrix structure, see Fig. 2a). When using such a matrix, a pseudo-inverse

144

solution of x can be found " = Ø′Ø

Ø

'( )

= Ø*

RI PT

141

(5)

145

In this work, the matrix Ø is a Gaussian basis with a fixed width. To introduce sparseness in x, an L0-norm

146

penalty can be added to the least squares objective function, leading to

SC

S = min23‖ − "Ø‖ + 5|"|7 8 /∈ℝ

(6)

in which |"|7 = 1 if # ≠ 0 and 0 otherwise. The parameter λ is called the sparseness factor, which

148

defines the trade-off between the signal reconstruction error and the degree of sparseness. The range of

149

the penalty parameter is within 10-8–108, depending on the signal treated.

M AN U

147

150

INSERT FIGURE 2

In (6), the L0-norm constrains the sum of number of non-null elements of x. It forces the

152

elements of x towards 0, unless the elements have a strong contribution to the reconstruction. What

153

follows is a sparse solution of x. A characteristic property of using this type of penalty is that there is no a

154

priori information on the positions (nor number) of the coefficients to be non-null. In practice, solving

155

the equation in (6) is a complex, non-convex and non-differentiable optimization problem, and can only

156

be solved by using an approximation of the L0-norm, as described in [35],[36]. In short, the L0 penalty is a

157

member of the Lp-penalty family, with 0 ≤ = ≤ 2. When = = 2, the penalty added to the least squares

158

objective function is well known as the ridge penalty (5‖"‖ ) [37], for which a closed form solution for x

159

exists of the following form:

AC C

EP

TE D

151

" = Ø) Ø + 5?

'

Ø) ,

(7)

160

where ? = @, the identity matrix. It was shown before [35],[36] that the L0-norm penalty can be

161

approximated by a weighted form of Ridge regression 8

ACCEPTED MANUSCRIPT C "D S = min2 A‖ − "‖ + 5"′B /∈ℝ

(8)

C is a diagonal matrix with elements E The matrix B F = 1⁄G#H + I J. The constant I is added to reduce

163

the number of iterations to convergence of the solution and to avoid numerical instabilities. The explicit

164

C. The iterative process updates #H and E F solution for (6) is then given once again by (7), but now B = B

165

in an alternating way, starting from the unweighted ridge regression until convergence of the solution.

RI PT

162

This framework works well in practice, as shown in Figure 3. This figure represents an illustrative

167

example of the sparse signal fitting on a pre-defined Gaussian basis for a series of sparse signals. Figures

168

3a and 3b relate to a sparse signal formed by wide bands, whereas figure 3c shows the situation for a

169

sparse signal with very narrow features, as an MS spectrum could be. In the top panel (Fig. 3a), the signal

170

(gray line) has been fitted on a basis of Gaussians with a width equal to half of the Full Width at Half

171

Maximum (FWHM). Five non-null coefficients have been obtained. The sum of the individual

172

contributions (colored lines) is the reproduced signal (black line; an offset has been added) and it is clear

173

that uninformative channels (i.e. noise) have been removed. In the middle panel (Fig. 3b), the same

174

signal has been fitted on a basis of Gaussians with a width equal to one fifth of the FWHM. As can be

175

seen, there are more non-null coefficients (21 to be precise), but this has as advantage that the original

176

data can be reproduced better. It also shows that wider peaks making up the spectra can be modeled as

177

a linear superposition of more narrow Gaussians. This can be used as an advantage during analysis, as

178

the true width of the Gaussian-like peaks that make up the signal is not known. Additionally, using a

179

basis of narrow Gaussians allows fitting mass spectrometry-like data, in which the signals are narrow by

180

nature (lower panel; Fig. 3c). Therefore, a basis of Gaussians with narrow FWHM will be used because

181

they adapt to any kind of sparse signal, independently on the width of the non-null signal features.

182

AC C

EP

TE D

M AN U

SC

166

INSERT FIGURE 3

9

ACCEPTED MANUSCRIPT

183

When dealing with two-dimensional signals, the exact same principles as for one-dimensional

184

sparse reproduction can be used, but the model for an image Y (I x J) is slightly different as there are now

185

two dimensions to be taken into account.

186

where the matrix R = SN̅L

PQ T

= M M N̅L P

Q

PQ #PQ

+

L

RI PT

L

(9)

is a four-dimensional array that connects each element of Y to all the

elements of X. In order to solve the system, the data matrix Y is vectorized to y (i x j, 1) and the matrix R

188

is mapped to C (ij x ij). This matrix has a special structure (see Fig. 2b), reflecting the fact that the basis is

189

assumed not to vary with position (see [15] for more information). To construct this matrix, two-

190

dimensional tensor products of Gaussians are used. To solve the vectorized version of (9), the same

191

algorithms as described above (equations (6) to (8)) can be applied.

192

3. Experimental

193

3.1.

M AN U

SC

187

TE D

Mass spectrometry imaging on a green bean sample

Sample preparation and acquisition. The green bean (Phaseolus vulgaris) sample used for

195

analysis was initially humidified for 4 hours wrapped in wet cotton. Then, the bean was flash frozen in

196

liquid nitrogen and stored at -80 °C. The tissue was mounted in a cutting block using an Optimal Cutting

197

Temperature embedding medium (OTC, TissueTek), sliced at 15 µm thickness with a cryostat (Leica CM

198

3050S) and placed directly onto indium tin oxide–coated glass slides (Bruker Daltonik GmbH, Bremen,

199

Germany). The application of 2,5-dihydroxybenzoic acid (Sigma-Aldrich, Spain) matrix onto the slides was

200

performed by sublimation [38]. Spectra were acquired using an Autoflex III MALDI-TOF/TOF instrument

201

(Bruker Daltonik GmbH) equipped with a Smartbeam laser operated at 200-Hz laser repetition rate at the

202

”large focus” setting. The positive reflector ion mode was used in the 400 to 2000 m/z range, with an

203

m/z resolution of 0.001 a.m.u with a laser raster set to 150 μm along both x- and y-axis. The final

204

acquired image size was 34 by 72 pixels.

AC C

EP

194

10

ACCEPTED MANUSCRIPT

Image preprocessing. In mass spectrometry imaging (MSI), preprocessing is crucial for a

206

preliminary data reduction and removal of clearly uninformative m/z channels. An adaptation of the

207

detection of regions of interest (ROI) method by Bedia et al. [39] was applied. This selection only

208

considers m/z channels showing a significant signal in a preset number of pixel spectra for further

209

analysis. To do so, the selection of a noise threshold, a mass tolerance error and the minimum number of

210

spectra that should include a significant reading for a particular m/z value to be kept have to be defined.

211

In this study, m/z channels with an intensity over 0.1 % of the maximum signal were considered different

212

from noise, a 0.1 mass tolerance error was accepted to consider two m/z values to be the same and,

213

hence, to belong to the same spectral ROI and the presence of five significant readings associated with a

214

particular m/z channel was considered to be kept as a ROI. This leads to a data size reduction from 56789

215

m/z values per pixel to 8627. The data compression is then followed by a baseline correction by

216

asymmetric least squares [40], spectral alignment by Correlation Optimized Warping [41] and balance of

217

signal intensity variability by normalization (Euclidean norm) and scaling of the spectra. The scaling of the

218

m/z spectral channels was performed by dividing each reading by the standard deviation of the channel

219

readings plus 0.1 % of the maximum signal of the spectrum obtained by the sum of the selected purest

220

mass spectra. After proper preprocessing, it can be seen that the ROI between 1150 and 2000 do not

221

contain relevant information and, hence, this m/z range was removed from further analysis. The final

222

dataset then considers 3750 m/z values per pixel. Figure 4a shows the compression and pre-processing

223

of the data in a schematic way. The raw data is shown at the beginning of the processing pipeline, while

224

the compressed and corrected data is shown at the end.

SC

M AN U

TE D

EP

AC C

225

RI PT

205

3.2.

Infrared spectroscopy imaging on a pharmaceutical sample

226

Sample preparation and acquisition. A pharmaceutical sample containing three products (citric

227

acid, acetylsalicylic acid and caffeine) has been carefully prepared by grinding the products together with

228

a pestle and mortar. After a homogenous powder mixture is obtained, it was analyzed by using a Thermo 11

ACCEPTED MANUSCRIPT

Scientific Nicolet iN 10 MX Infrared imaging microscope. The microscope is equipped with standard

230

single element MCT detector. A spectrum was taken for every pixel of the image within the 4000 – 675

231

cm-1 range with a spectral resolution of 4 cm-1 (1725 wavenumbers). The pixel size was set at 25 µm x 25

232

µm and 32 scans were accumulated per pixel. The final acquired image size was 41 by 37 pixels.

RI PT

229

Image preprocessing. As the acquired image is quite noisy and contains an almost flat baseline

234

with an offset, some preprocessing has to be performed. In this case, it was chosen to perform a

235

smoothing of the signal by using weighted least squares, followed by a removal of the minimum value of

236

each spectrum. After proper preprocessing was performed, the fingerprint region of the acquired data

237

was selected (1800 – 675 cm-1) to facilitate analysis even further. Figure 4b shows the processing

238

pipeline, with the acquired data at the beginning and the corrected data at the end.

M AN U

SC

233

239 240

INSERT FIGURE 4 3.3.

Sparseness constraint in MCR–ALS

The sparseness constraint was implemented in the MCR–ALS command line code

242

(https://mcrals.wordpress.com/download/mcr-als-command-line/) and all calculations were performed

243

in MATLAB version R2014a (The MathWorks, Natick, MA, USA) on a computer with an Intel(R) Core(TM)

244

i7-4770 CPU @ 3.40GHz CPU and 16 GB RAM memory (64-bit Windows 7).

EP

TE D

241

The critical point of applying the constraint during the MCR–ALS analysis is the sparseness factor,

246

λ, which varies according to the degree of natural sparseness of each of the constituents analyzed, since

247

the shape of the signal (e.g. intensity, number of peaks, etc.) for each of them may be different. Different

248

ways to select the optimal λ have been proposed, as for example a cross-validation approach used in

249

[12], where the optimal λ is selected as the one around the change in root mean square error (RMSE)

250

slope. However, automatic optimization remains an open issue for sparsity with L0-norm due to

251

discontinuity and there is ongoing work to search for alternative approaches. Throughout the

252

manuscript, we do an MCR–ALS analysis with non-negativity constraint on the distribution maps and

AC C

245

12

ACCEPTED MANUSCRIPT

spectral profiles and assess the results (check the lack of fit and explained variance obtained). We then

254

fit the profiles of the constituents that should be sparse (albeit C or St profiles) individually by applying

255

the sparse regularization routine outside the MCR–ALS analysis and optimize the sparseness factor by

256

visual inspection. After completing this step, MCR–ALS analysis with non-negativity and sparseness

257

constraint is performed, for which, at each iteration, the constrained profiles are fitted with a fixed

258

sparseness factor (obtained from this previous step).

4. Results

260

4.1.

SC

259

RI PT

253

M AN U

Mass spectrometry imaging on a green bean sample

MCR–ALS analysis was performed on the compressed and corrected MSI image. First, the

262

number of components of the preprocessed MSI images was set at five, chosen by Singular Value

263

Decomposition (SVD) and the initial estimates were determined on the mass spectra profiles by a

264

SIMPLISMA-based method [42]. Preliminary MCR–ALS with a non-negativity constraint on the

265

concentration distribution maps and the spectral profiles was performed on the green bean data.

266

Normalization of spectra in ST matrix (using the Euclidean norm) was also added. The LOF obtained was

267

15.3 % (explained variance 97.6 %). Figure 5a shows the distribution maps and spectral profiles of the

268

constituents of the bean sample. The spectral profiles of the constituents reveal that the main mass

269

peaks are found together with many noisy channels. Bearing in mind the nature of MS signals, we know

270

that the resolved spectral profiles of the constituents must be sparse signals with only a few significant

271

sharp peaks. Therefore, the use of the sparseness constraint should here improve the definition of m/z

272

spectral signatures and, as a consequence, the overall description of the data set.

AC C

EP

TE D

261

273

INSERT FIGURE 5

274

After the preliminary MCR–ALS analysis of the bean MSI under non-negativity constraints, the analysis

275

was carried out applying the sparseness constraint to the spectral profiles. Optimal sparseness factors

13

ACCEPTED MANUSCRIPT

are obtained as described in Section 3.3. For a better understanding of the results, the sparseness factor

277

chosen for each component (see caption Fig. 5) has been ‘translated’ to a sparseness ratio for the

278

respective components, indicating the number of non-null values in each spectral channel divided by the

279

total number of spectral channels (in %). After applying MCR–ALS with non-negativity in C and ST and

280

sparseness constraints on the spectral direction, the sparseness ratios for the resolved MS profiles of all

281

constituents were 10.2 % – 1.2 % – 7.1 % – 3.4 % – 17.6 % , as opposed to ~ 100 % for all the constituents

282

during the preliminary MCR–ALS analysis under only non-negativity constraints. The results are shown in

283

Figure 5b. It should be clear that introducing the sparseness constraint on the spectral profiles during

284

analysis leads to a better definition of resolved spectral profiles and, consequently, a sharper definition

285

of the distribution maps as well. The definition of the spatial structure has improved especially in the first

286

and the fifth image constituents, whereas the spectral profiles of the other components have been

287

cleaned up considerably. For instance, the distribution map of component 5 in Figure 5b showed a clear

288

accumulation of its constituents in the seed coat. The spectral profile revealed two representative mass

289

values, 710.1 m/z and 724.9 m/z, which were tentatively assigned to the flavonoids Kaempferide 3-

290

rhamnoside-7-(6''-succinylglucoside) and Kaempferol 3-(2''-p-coumaryl-rhamnoside)-7-rhamnoside,

291

respectively. In seeds, flavonoids have been seen to accumulate in the coats where they have important

292

functions in the induction of seed coat-imposed dormancy, as well as in seed longevity and quality of

293

seeds [43]. The distribution maps of components 2 to 4 were left relatively unchanged. The LOF obtained

294

for this analysis, 23.7 % (explained variance: 94.4 %), has increased as compared to the previous analysis.

295

This can be explained because the sparseness constraint removes many values related to non-

296

informative m/z channels, which contribute in a noticeable way to the total variance measured.

297

Nevertheless, the final variance explained is satisfactory given the natural S/N ratio of MS

298

measurements.

AC C

EP

TE D

M AN U

SC

RI PT

276

14

ACCEPTED MANUSCRIPT

299

4.2.

Infrared spectroscopy imaging on a pharmaceutical sample

For this data set, the number of components is known to be three and initial estimates were

301

determined by using a SIMPLISMA-based method. An MCR–ALS analysis on the pre-processed data was

302

first performed by using non-negativity as a constraint, on both the C and ST matrix and normalization of

303

the ST profiles. The resolved distribution maps and spectra can be seen in Figure 6a (LOF: 7.9 %;

304

explained variance: 99.4 %). The sparseness ratio for the component distribution maps is defined here as

305

the ratio of the non-null pixels over the total number of pixels and is 55.4 % – 76.3 % – 86.2 % for the

306

respective components. When looking at the spectral profiles of the resolved components, the profiles

307

of citric acid and acetyl salicylic acid (components 1 and 2) can be clearly distinguished, since the peak

308

positions correspond with the IR spectra found in the database of the National Institute of Standards and

309

Technology (NIST) [44]. On the other hand, the spectrum of caffeine cannot be clearly distinguished in

310

component 3. What can be seen from the component distribution maps is that their contributions are

311

scattered all across the map, as is expected for unstructured powdered samples. This means that the

312

pattern of the components is expected to have localized zones of presence and absence and, therefore,

313

the sparseness constraint can be appropriately applied.

SC

M AN U

TE D

INSERT FIGURE 6

EP

314

RI PT

300

In order to enhance the zones of presence of the model compounds, and to improve the

316

spectrum obtained for the third component, MCR–ALS is performed by applying the sparseness

317

constraint to the distribution maps (sparseness factors, tuned by visual inspection as described in Section

318

3.3: see caption Fig. 6). The results are shown in Figure 6b (LOF: 8.9 %; explained variance: 99.2 %). In

319

this case, since IR measurements have a better S/N ratio than MS spectra and the reduction of non-null

320

measurements as compared with the initial solution is not as high as in the MS example, the LOF is

321

practically not modified by the inclusion of the sparseness constraint. This fact supports the suitability of

322

applying this constraint using the sparseness factor adopted in the analysis. Thus, when calculating the

AC C

315

15

ACCEPTED MANUSCRIPT

sparseness ratio for the solution using the sparseness constraint, values of 33.1 % – 33.3 % – 86.0 % are

324

obtained for the different compounds. This shows that the component distribution maps are sparser

325

than before, with the exception of the third component. The removal of the very low concentration

326

values in the citric acid and acetylsalicylic acid distribution maps has significantly reduced the zones of

327

rank overlap among compounds and, particularly, with caffeine. As a consequence of the presence of

328

zones with lower rank or even being selective, an improvement of the spectral profiles recovered can be

329

noticed. A closer look at the spectral profiles of citric acid and acetylsalicylic acid would indeed reveal an

330

improvement in the intensity ratios of the different peaks with respect to the reference found in the

331

NIST database [44]. When focusing on the third component, an improvement can also be noticed. The

332

resolution of the peaks in the spectrum has increased, clearly revealing two peaks (at ~ 1690 cm-1 and ~

333

1725 cm-1) that can be assigned to caffeine. Additionally, it is worth noting that when the distribution

334

map of this third component is not constrained by sparseness, the sparseness ratio increases to 93.2 %

335

and the spectrum does not reveal the two caffeine peaks anymore (results not shown). This led to the

336

conclusion that it is necessary to constrain all three component distribution maps in this data set.

SC

M AN U

TE D

337

RI PT

323

5. Concluding remarks

A constraint that induces sparseness on the signals has been implemented in MCR–ALS for the

339

resolution of complex multicomponent systems. It can be used on both spectral profiles and component

340

distribution maps and is thus fully applicable to hyperspectral images. To induce sparseness in the signal

341

fitted, a penalized least squares regression framework is used that constrains the number of non-null

342

coefficients. The sparseness factor, λ, is the only parameter to regulate within this framework, making it

343

a flexible and powerful tool to adapt the degree of sparseness to the nature of the different signals and

344

compounds within a data set.

AC C

EP

338

345

Sparseness can be used in combination with other appropriate constraints. The results that are

346

shown correspond to cases where sparsity is desirable in either the resolved concentration profiles or 16

ACCEPTED MANUSCRIPT

the resolved spectra. The real examples tested have shown the usefulness of this constraint in two clear

348

sparse scenarios, i.e., MS signals and scattered compounds in distribution maps, and have covered

349

examples linked to pure compound profiles associated with 1D and 2D numerical arrays. Whether the

350

resolved spectra or contributions should be sparse depend on the analytical situations and on the prior

351

knowledge available regarding the physical nature of the component profiles. It should be clear that the

352

solution is valid only at the hypothesis that sparseness is present in the profiles (but not necessarily in

353

the data). However, the current constraint can be envisioned as potentially applicable to other

354

interesting problems, such as environmental source apportionment analysis, where the composition

355

profiles of pollution sources may lack certain contaminants and the related apportionment profiles may

356

have absent contributions in certain geographical points or seasons.

M AN U

SC

RI PT

347

A relevant asset of the sparseness constraint is the direct effect on the constrained profiles and

358

the indirect positive effect on the rest of the resolved profiles of the system. Thus, to provide a more

359

appropriate and simpler description of naturally sparse signals or concentration profiles, many elements

360

of the profiles are set to have null values. Compound absence in resolution implies lowering the rank of

361

certain concentration and signal windows. As a consequence of the decrease of rank overlap among

362

compounds, the chances to recover unique profiles both in the constrained profile direction and in the

363

profiles of the counterpart matrix of the bilinear model are greatly increased.

EP

6. Acknowledgements

AC C

364

TE D

357

365

The authors thank Paul Eilers for his involvement in the development of the algorithms and appreciate

366

the fruitful discussions on the topic. C. R. thanks the Agence National de la Recherche (ANR-15-CE09-

367

0020-01 Blink). C.B. and A.J have received funding from the European Research Council under the

368

European Union's Seventh Framework Program (FP/2007-2013) / ERC Grant Agreement n. 32073

369

(CHEMAGEB project). S.P. acknowledges the support of the Spanish government through project

370

CTQ2015-66254-C2-2-P. 17

ACCEPTED MANUSCRIPT

377 378 379 380 381 382 383 384 385 386 387 388 389

RI PT

376

semidefinite programming, Siam. Rev. 49 (2007) 434–448.

[3] H. Chun, S. Keles, Sparse partial least squares regression for simultaneous dimension reduction and variable selection, J. Roy. Stat. Soc. B. Met. 72 (2010) 3–25.

SC

375

[2] A. d’Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet, A direct formulation for sparse PCA using

[4] D. Lee, W. Lee, Y. Lee, Y. Pawitan, Sparse partial least squares regression and its applications to highthroughput data analysis, Chemom. Intell. Lab. Syst. 109 (2011) 1–8.

M AN U

374

265–286.

[5] K.-A. Le Cao, S. Boitard, P. Besse, Sparse PLS discriminant analysis : biologically relevant feature selection and graphical displays for multiclass problems, BMC Bioinformatics 12 (2011) 253–268. [6] D. Chung, S. Keles, Sparse partial least squares classification for high dimensional data, Stat. Appl. Genet. Mol. 9 (2010) article 17.

TE D

373

[1] H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graph Stat. 15 (2006)

[7] I. Jolliffe, N. Trendafilov, M. Uddin, A modified principal component technique based on the LASSO, J. Comput. Graph. Stat. 12 (2003) 531–547.

[8] R. Tauler, A. Smilde, B. Kowalski, Selectivity, local rank, three-way data analysis and ambiguity in

EP

372

7. References

multivariate curve resolution, J. Chemometrics 9 (1995) 31–58. [9] R. Tauler, Multivariate curve resolution applied to second order data, Chemom. Intell. Lab. Sys. 30

AC C

371

(1995) 133–146.

390

[10]A. Golshan, H. Abdollahi, S. Beyramysoltan, M. Maeder, K. Neymeyr, R. Rajkó, M. Sawall, R. Tauler, A

391

review of recent methods for the determination of ranges of feasible solutions resulting from soft

392

modelling analyses of multivariate data, Anal. Chim. Acta 911 (2016) 1 – 13.

393 394

[11]R. Tibshirani, Regression shrinkage and selection via the Lasso, J. Roy. Stat. Soc. B Met. 58 (1996) 267– 288.

18

ACCEPTED MANUSCRIPT

395 396

[12]V. Pomareda, D. Calvo, A. Pardo, S. Marco, Hard modeling multivariate curve resolution using LASSO: application to ion mobility spectra, Chemom. Intell. Lab. Syst. 104 (2010) 318–332. [13]A. Tikhonov, V. Arsenin, Solution of Ill-posed Problems, Winston & Sons, Washington, 1977.

398

[14]J. De Rooi, P. Eilers, Deconvolution of pulse trains with the L0 penalty, Anal. Chim. Acta 705 (2011) 218

401 402 403 404 405

[15]J. De Rooi, C. Ruckebusch, P. Eilers, Sparse deconvolution in one and two dimensions : applications in endocrinology and single-molecule fluorescence imaging, Anal. Chem. 86 (2014) 6291–6298.

SC

400

– 226.

[16]R. Manne, On the resolution problem in hyphenated chromatography, Chemom. Intell. Lab. Sys. 27 (1995) 89–94.

M AN U

399

RI PT

397

[17]M.-D. Iordache, J. Bioucas-Dias, A. Plaza, Sparse unmixing of hyperspectral data, IEEE T. Geosci. Remote 49 (2011) 2014–2039.

[18]J. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, J. Chanussot, Hyperspectral

407

unmixing overview : geometrical, statistical, and sparse regression-based approaches, IEEE J. Sel. Top.

408

Appl. 5 (2012) 354–379.

410

[19]X.-F. Zhu, B. Li, J.-D. Wang, L1-norm sparse learning and its application, Appl. Mech. Mater. 88–89 (2011) 379–385.

EP

409

TE D

406

[20]N. Gillis, R. Plemmons, Sparse nonnegative matrix underapproximation and its application to

412

hyperspectral image analysis, Proc. Third IEEE Worskshop on Hyperspectral Image and Signal

413

Processing: Evolution in Remote Sensing (WHISPERS) 1 (2011) 11–14.

414 415

AC C

411

[21]S. Jia, Y. Qian, Constrained nonnegative matrix factorization for hyperspectral unmixing, IEEE T. Geosci. Remote 47 (2009) 161–173.

416

[22]M.-D. Iordache, A. Plaza, J. Bioucas-Dias, On the use of spectral libraries to perform sparse unmixing

417

of hyperspectral data, Proc. Second IEEE Worskshop on Hyperspectral Image and Signal Processing:

418

Evolution in Remote Sensing (WHISPERS) 1 (2010) 1–4.

19

ACCEPTED MANUSCRIPT

420 421 422 423 424

[23]C. Ruckebusch, L. Blanchet, Multivariate curve resolution: A review of advanced and tailored applications and challenges, Anal. Chim. Acta. 765 (2007) 28–36. [24]J. Jaumot, A. de Juan, R. Tauler, MCR–ALS GUI 2.0 : new features and applications, Chemom. Intell. Lab. Syst. 140 (2015) 1–12.

RI PT

419

[25]A. de Juan, R. Tauler, Multivariate Curve Resolution (MCR) from 2000: Progress in Concepts and Applications, Crit. Rev. Anal. Chem. 36 (2006) 163–176.

[26]A. de Juan, S. Rutan, M. Maeder, R. Tauler, Multivariate Curve Resolution Chapters, in: D. Brown, R.

426

Tauler, B. Walczak (Eds.), Comprehensive Chemometrics, vol. 2, Elsevier, Amsterdam, 2009, pp. 207–

427

558.

M AN U

SC

425

428

[27]A. de Juan, S. Piqueras, M. Maeder, T. Hancewicz, L. Duponchel, R. Tauler, Chemometric tools for

429

image analysis, in: R. Salzer, H. Siesler (Eds.), Infrared and Raman Spectroscopic Imaging, Wiley-VCH

430

Verlag GmbH & Co. KGaA, Weinheim, Germany, 2014, pp. 57–110.

[28]L. Duponchel, W. Elmi-Rayaleh, C. Ruckebush, J.-P. Huvenne, Multivariate curve resolution methods in

432

imaging spectroscopy: Influence of extraction methods and instrumental perturbations, J. Chem. Inf.

433

Comput. Sci. 43 (2003) 2057 – 2067.

TE D

431

[29]A. de Juan, R. Tauler, R. Dyson, C. Marcolli, M. Rault, M. Maeder, Spectroscopic imaging and

435

chemometrics: a powerful combination for global and local sample analysis, TrAC – Trend. Anal.

436

Chem. 23 (2004) 70–79.

438

AC C

437

EP

434

[30]A. de Juan, M. Maeder, T. Hancewicz, R. Tauler, Use of local rank-based spatial information for resolution of spectroscopic images, J. Chemometrics 22 (2008), 291–298.

439

[31]S. Hugelier, O. Devos, C. Ruckebusch, On the implementation of spatial constraints in multivariate

440

curve resolution alternating least squares for hyperspectral image analysis, J. Chemometrics 29 (2015)

441

557–561.

20

ACCEPTED MANUSCRIPT

445 446 447 448 449 450 451 452 453

[33]C. Soussen, J. Idier, D. Brie, J. Duan, From Bernoulli-Gaussian deconvolution to sparse signal restoration, IEEE Trans. Signal Process. 59 (2011) 4572–4584.

RI PT

444

299.

[34]S. Hugelier, P. Eilers, O. Devos, C. Ruckebusch, Improved super-resolution microscopy imaging by sparse deconvolution with an inter-frame penalty, J. Chemom. 2017. 31: e2847.

[35]M. Osborne, B. Presnell, B. Turlach, On the lasso and its dual, J. Comput. Graph. Stat. 9 (2000) 319–

SC

443

[32]D. Ge, X. Jiang, Y. Ye, A note on the complexity of Lp minimization, Math. Program. 129 (2011) 285–

337.

[36]F. Frommlet, G. Nuel, An adaptative ridge procedure for L0 regularization, PLoS ONE 11 (2016) e0148620–e0148642.

M AN U

442

[37]A. Hoerl, R. Kennard, Ridge regression: biased estimation for non-orthogonal problems, Technometrics 12 (1970) 55–67.

[38]A. Ly, C. Schöne, M. Becker, J. Rattke, S. Meding, M. Aichler, D. Suckau, A. Walch, S.M. Hauck, M.

455

Ueffing, High-resolution MALDI mass spectrometric imaging of lipids in the mammalian retina,

456

Histochem. Cell Biol. 143 (2015) 453–462.

[39]C. Bedia, R. Tauler, J. Jaumot, Compression strategies for the chemometric analysis of mass

EP

457

TE D

454

spectrometry imaging data, J. Chemom. 30 (2016) 575–588.

459

[40]P. Eilers, A perfect smoother, Anal. Chem. 75 (2003) 3631–3636.

460

[41]G. Tomasi, F. van den Berg, C. Andersson, Correlation optimized warping and dynamic time warping

461

AC C

458

as preprocessing methods for chromatographic data, J. Chem. 18 (2004) 231–241.

462

[42]W. Windig, J. Guilment, Interactive self-modeling mixture analysis, Anal. Chem. 63 (1991) 1425–1432.

463

[43]I. Debeaujon, K. Léon-Kloosterziel, M. Koornneef, Influence of the testa on seed dormancy,

464

germination and longevity in Arabidopsis, Plant Physiol. 122 (2000), 403–413.

21

ACCEPTED MANUSCRIPT

465 466

[44]National Institute of Standards and Technology (NIST), http://webbook.nist.gov/chemistry/ [last accessed: 8 February 2017].

AC C

EP

TE D

M AN U

SC

RI PT

467

22

ACCEPTED MANUSCRIPT

8. Figures

469

Figure 1

M AN U

SC

RI PT

468

470

Figure 1 Schematic representation of a multivariate curve resolution – alternating least squares (MCR–

472

ALS) analysis for hyperspectral images. The refolding/unfolding step during each least squares step

473

allows the input of global spatial features as constraints.

EP AC C

474

TE D

471

23

ACCEPTED MANUSCRIPT

Figure 2

SC

RI PT

475

M AN U

476 477

Figure 2 Schematic representation of the Gaussian bases (fixed width) used for (sparse) signal fitting. In

478

(a) a Toeplitz matrix (zero end condition) for one-dimensional signals and in (b) the banded matrix for

479

two-dimensional signals.

AC C

EP

TE D

480

24

ACCEPTED MANUSCRIPT

Figure 3

M AN U

SC

RI PT

481

482

Figure 3 Illustrative example of sparse signal processing. A series of sparse signals is solved by (a) sparse

484

fitting with a broad Gaussian basis (σ = FWHM/2) (5 non-null coefficients) and (b) sparse fitting with a

485

narrow Gaussian basis (σ = FWHM/5) (21 non-null coefficients). In (c) a sparse signal for which only a

486

narrow Gaussian basis can be used (6 non-null coefficients). Gray: noisy data, black: reproduced data,

487

colors: individual contributions.

EP

AC C

488

TE D

483

25

ACCEPTED MANUSCRIPT

Figure 4

M AN U

SC

RI PT

489

490

Figure 4 Schematic representation of the pre-processing pipeline for (a) the mass spectrometry imaging

492

of a bean sample and (b) the infrared spectroscopy imaging of a pharmaceutical powdered sample. The

493

raw data is shown on the left, while the final compressed and corrected data is shown on the right.

EP AC C

494

TE D

491

26

ACCEPTED MANUSCRIPT

495

M AN U

SC

RI PT

Figure 5

496

Figure 5 MCR–ALS analysis of green bean data, obtained by mass spectrometry imaging with (a) non-

498

negativity as a constraint (calculation time: 22.6 seconds (19 iterations); LOF: 15.3 %) and (b) non-

499

negativity and sparseness of the spectral profiles as a constraint (Sparseness factors: 0.001, 0.006, 0.002,

500

0.004 and 0.001 for the respective components) (calculation time: 160.6 seconds (9 iterations); LOF: 23.7

501

%).

EP AC C

502

TE D

497

27

ACCEPTED MANUSCRIPT

Figure 6

AC C

EP

TE D

M AN U

SC

RI PT

503

504 505

Figure 6 MCR–ALS analysis of a pharmaceutical powdered sample, obtained by infrared imaging with (a)

506

non-negativity as a constraint (calculation time: 11.2 seconds (41 iterations); LOF: 7.9 %) and (b) non-

28

ACCEPTED MANUSCRIPT

negativity and sparseness of the component distribution maps as a constraint (Sparseness factor: 0.005

508

for all components) (calculation time: 93.2 seconds (22 iterations); LOF: 8.9 %).

AC C

EP

TE D

M AN U

SC

RI PT

507

29

ACCEPTED MANUSCRIPT

Highlights 1) A constraint has been developed for MCR-ALS that uses the natural sparseness of the decomposition profiles to help resolving complex multicomponent systems.

dimensional and/or two-dimensional numerical arrays.

RI PT

2) The constraint uses an L0-penalized least squares framework and can be applied on one-

3) The sparseness constraint has a direct effect on the constrained profiles and an indirect

AC C

EP

TE D

M AN U

SC

positive effect on the rest of the resolved profiles of the system.