Fluorescence spectroscopy as tool for the geographical discrimination of coffees produced in different regions of Minas Gerais State in Brazil

Fluorescence spectroscopy as tool for the geographical discrimination of coffees produced in different regions of Minas Gerais State in Brazil

Accepted Manuscript Fluorescence spectroscopy as tool for the geographical discrimination of coffees produced in different regions of Minas Gerais Sta...

5MB Sizes 2 Downloads 96 Views

Accepted Manuscript Fluorescence spectroscopy as tool for the geographical discrimination of coffees produced in different regions of Minas Gerais State in Brazil

Bruno G. Botelho, Leandro S. Oliveira, Adriana S. Franca PII:

S0956-7135(17)30030-0

DOI:

10.1016/j.foodcont.2017.01.020

Reference:

JFCO 5427

To appear in:

Food Control

Received Date:

10 November 2016

Revised Date:

24 January 2017

Accepted Date:

25 January 2017

Please cite this article as: Bruno G. Botelho, Leandro S. Oliveira, Adriana S. Franca, Fluorescence spectroscopy as tool for the geographical discrimination of coffees produced in different regions of Minas Gerais State in Brazil, Food Control (2017), doi: 10.1016/j.foodcont.2017.01.020

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT  Fluorescence spectroscopy for geographical discrimination of green coffees.  Discrimination among coffees from the major producing regions in MG, Brazil.  PARAFAC, NPLS-DA and UPLS-DA were tested for model development.  UPLS-DA provided the most accurate models, with 100% precision for MM samples.

ACCEPTED MANUSCRIPT 1

Fluorescence spectroscopy as tool for the geographical discrimination of

2

coffees produced in different regions of Minas Gerais State in Brazil

3

Bruno G. Botelhoa, Leandro S. Oliveiraa,b, Adriana S. Francaa,b,*

4 5

aPPGCA,

6

Horizonte, MG, Brazil.

Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, 31270-901, Belo

7 8

bDEMEC,

9

Belo Horizonte, MG, Brazil.

Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, 31270-901,

10 11 12 13 14 15 16

*

17

DEMEC, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, 31270-

Corresponding author.

18

901, Belo Horizonte, MG, Brazil.

19

Tel.: +55 31 34093512. Fax: +55 31 34433783. E-mail:[email protected]

20 21 l1

ACCEPTED MANUSCRIPT 22

ABSTRACT

23 24

The designation of origin of high-value agricultural and food products has become

25

increasingly relevant for the producers, since it allows the consumers to relate the singular

26

characteristics of their preferred product to its respective provenance. Thus, coffee producers

27

are in pursuit of ways to certify their products according to their authenticity pertaining

28

provenance. Fluorescence spectroscopy was applied in order to develop a geographical

29

discrimination model of coffees produced in Minas Gerais State, Brazil. PARAFAC, NPLS-

30

DA and UPLS-DA were used in order to discriminate samples produced in four major

31

production areas in Minas Gerais, namely Cerrado Mineiro (CM), Matas de Minas (MM),

32

Norte de Minas (NM) e Sul de Minas (SM). The UPLS-DA presented the best results, with f-

33

scores for CM and SM higher than 0.8, for both training and test sets, which indicates good

34

classification. MM model presented a good f-score for the training set (1.000), but a poor

35

result was obtained for the test set (0.250), mainly due to false positive samples. NM models

36

presented an intermediary result, with a f-score of 0.913 for training set and 0.625 for test set.

37

The proposed method requires a simple sample pre-treatment, it is fast and can be used for the

38

determination of the geographical origin of coffees produced in Minas Gerais State.

39 40 41 42 43

Keywords: Fluorescence spectroscopy; PARAFAC; NPLS-DA;UPLS-DA; Coffee;

44

geographical origin

45 l2

ACCEPTED MANUSCRIPT 46

1. Introduction

47 48

According to the Brazilian Ministry of Agriculture, Livestock and Supply (MAPA) the

49

geographical indication (GI) is a denomination given to products or services that have

50

singular characteristics that are related to their provenance, and those characteristics bring a

51

good reputation, high added value and a distinction from similar products. These products

52

present unique quality due to natural conditions, such as climate, soil, vegetation or local

53

know-how. The Brazilian Agency responsible for providing the GI designation, INPI

54

(Instituto Nacional de Propriedade Industrial) (MAPA, 2015), adopted two categories of GIs:

55

Indication of Provenance (IP) and Denominations of Origin (DO). The latter, in addition to

56

evidence of reputation associated with the name of the place of production, requires a

57

percentage of the production in question to be carried out in situ.

58 59

Until 2013, 46 GIs had been registered in Brazil, 38 of them national products/services, and of

60

these, 30 IPs and 8 DOs. There were also eight international products registered, all DOs

61

(Wilkinson, Cerdan, & Dorigon, 2015). Of the 30 products registered as IPs, only 3 are

62

coffees. One from Paraná State (Norte Pioneiro Coffee) and two from Minas Gerais (MG)

63

State (Cerrado Mineiro Coffee and Serra da Mantiqueira Coffee). Minas Gerais State is the

64

largest coffee producer in Brazil, accounting for approximately 50% of the total production.

65

The State is officially divided in four major producing regions: Sul de Minas (SM), in the

66

southern part of the State (encompassing areas between 21o 13' to 22o 10' S and 44o 20' to 47o

67

20' W with altitudes from 700 to 1080 m, and climate of the B2 and B3 types, Köppen

68

System); Cerrado Mineiro (CM), in the western part (encompassing areas between 16o 37' to

69

20o 13' S and 45o 20' to 49o 48' W with altitudes from 820 to 1110 m, and climate of the B1 l3

ACCEPTED MANUSCRIPT 70

type); Matas de Minas (MM), in the southeastern part (encompassing areas between 18o 35' to

71

21o 26' S and 40o 50' to 43o 36' W with altitudes from 400 to 700 m, and climate of the B1, B2,

72

B4 and C1 and C2 types); and Norte de Minas (NM), in the northern part (encompassing areas

73

between 17o 05' to 18o 09' S and 40o 50' to 42o 40' W with altitudes around 1099 m, and

74

climate of the C1 and D types) (Figure 1) (Barbosa et al., 2010; CONAB, 2015)."

75 76

According to the Brazilian Coffee Industry Association (ABIC), after acquiring the IP seal,

77

the market prices of Cerrado Mineiro Coffee presented an increase of 30 to 40% (ABIC,

78

2016). As the consumer is willing to pay more for a product from a specific area, expecting to

79

acquire a product of higher quality, some attempts have been made to create special labels for

80

some specialty coffee production regions, such as those by the Cerrado Mineiro Organization

81

(Cerrado Mineiro, 2016). However, more than labelling is needed to ensure coffee origin

82

certification. Analytical methodologies using different techniques, such as near (Link, Lemes,

83

Marquetti, dos Santos Scholz, & Bona, 2014a) and mid infrared spectroscopies (Link, Lemes,

84

Marquetti, dos Santos Scholz, & Bona, 2014b), gas chromatography (Carrera, Leon-

85

Camacho, Pablos, & Gonzalez, 1998; Costa Freitas & Mosca, 1999; Risticevic, Carasek, &

86

Pawliszyn, 2008; Taveira et al., 2014; Toledo et al., 2016), ultra high pressure liquid

87

chromatography coupled to mass spectrometry (Mehari et al., 2016), isotope ratio mass

88

spectrometry (Weckerle, Richling, Heinrich, & Schreier, 2002; Rodrigues et al., 2009), multi-

89

collector inductively coupled plasma mass spectrometer (Liu, You, Chen, Liu, & Chung,

90

2014) and inductively coupled plasma optical emission spectrometry (Muniz-Valencia,

91

Jurado, Ceballos-Magana, Alcazar, & Hernandez-Diaz, 2014), have been developed in order

92

to determine in an unequivocal way the provenance of coffee samples. These techniques have

93

been used to discriminate coffee from different continents, or even produced in the same l4

ACCEPTED MANUSCRIPT 94

country, but no work has been done on the discrimination of coffees produced in different

95

microregions of Minas Gerais State. Also, although fluorescence spectroscopy has been

96

extensively used for the classification of food (Sádecká & Tóthová, 2007), no record has been

97

found in the literature on its application for the discrimination of coffees.

98 99

In view of the aforementioned, the main objective of this paper was to develop a supervised

100

classification method using chemometrics tools capable of discriminating coffees produced in

101

Minas Gerais according to their origin, using fluorescence spectroscopy.

102 103

4. Materials and methods

104 105

4.1. Sample description

106

One hundred and ten samples of green Arabica coffee from the four regions of Minas Gerais

107

previously cited (2012-2013, 2013-2014 and 2014-2015 crop years) were provided by

108

producers themselves or by cooperatives or associations of producers. From the total of

109

samples, 34 were from Cerrado Mineiro (CM), 21 from Matas de Minas region (MM), 20

110

from Norte de Minas region (NM) and 35 from Sul de Minas region (SM).

111 112

Approximately 150 g of coffee were ground using a MCF55 rotating disk mill (Arbel, Brazil).

113

After grinding, the obtained powder was sieved in a 20 mesh sieve (d < 1 mm). The samples

114

were then stored in plastic bags hermetically closed until analysis.

115 116

The green coffee powder was submitted to aqueous extraction to allow the fluorescence

117

spectra acquisition. 3 g of each sample (previously ground and sieved) were placed in a 50 l5

ACCEPTED MANUSCRIPT 118

mL Falcon Tube containing 20 mL of distilled water. The tube was then mixed in a vortex

119

mixer for 30 seconds and heated for 15 minutes at 80oC using a water bath. After the 15-

120

minute period, the samples were placed in an ice bath to lower their temperature and then

121

centrifuged for 5 minutes at 3500 rpm. After centrifugation, the samples were filtered and the

122

aqueous extract obtained was stored at -18oC until analysis.

123 124

In the day of the analysis, the frozen extracts were naturally thawed until achieving thermal

125

equilibrium with the surrounding environment (approximately 20oC). Subsequently, the

126

extracts were diluted ten times using distilled water and further submitted to fluorescence

127

spectroscopy analysis.

128 129

4.2. Instrumentation

130 131

Fluorescence spectra were obtained in a Varian Cary Eclipse spectrofluorimeter, using a 10

132

mm quartz cuvette. All the excitation-emission matrices (EEM) were obtained in the

133

excitation range from 250 to 500 nm (20 nm steps) and in the emission range from 350 to 600

134

nm (2 nm steps). The excitation and emission monochromators slit widths were both 10.0 nm

135

and the scanning rate was 9600 nm min−1.

136 137

4.3. Statistical Analysis

138 139

PARAFAC (PARAllel FACtor analysis), NPLS-DA and UPLS-DA were employed

140

for construction of the discrimination models. PARAFAC is a decomposition method that can

141

be considered a generalization of PCA to higher order data. It decomposes data into triads or l6

ACCEPTED MANUSCRIPT 142

trilinear components. Each component in a PARAFAC model is formed of one score vector

143

(information related to the samples related to the samples) and two loading vectors. In this

144

paper, the loadings vectors represent the excitation and the emission spectral data. The

145

advantage of using PARAFAC when compared to other bilinear methods is the uniqueness of

146

solution, which allows the extraction of the pure spectra of the analyzed species (Bro, 1996).

147

NPLS-DA is the combination of NPLS (Superior order or N Partial Least Squares or

148

Multilinear PLS) and discriminant analysis. NPLS consists of an extension of the PLS two-

149

dimensional algorithm to allow dealing with independent data sets of orders higher than two

150

(cubic or 4-dimension arrays, for example). The combination of NPLS and discriminant

151

analysis allows the classification of samples using these high order data, which usually

152

increases sensibility (Bro, 1996).

153 154

U-PLSDA (Unfolded Partial Least Squares) is a variation of PLS where high order data is

155

unfolded. Unfolding consists in reducing the dimensionality of the data (transforming a cubic

156

array - third order data - into a linear array - second order data.) Although the reduction

157

causes some loss of information about the samples and model interpretability, UPLSDA gains

158

on simplicity and easiness, because it uses the PLS basic algorithm (Olivieri & Escandar,

159

2014).

160 161

Data were handled using MATLAB software, version 7.13 (The MathWorks, Natick, MA,

162

USA). The PLSDA routine used for the U-PLSDA models, the NPLS and the PARAFAC

163

routines came from the PLS Toolbox, version 6.5 (Eigenvector Technologies, Manson, WA,

164

USA). l7

ACCEPTED MANUSCRIPT 165 166

5. Results and discussion

167 168

5.1. PARAFAC

169 170

Figure 2a shows a mean contour map of EEM for all the 110 green coffee samples.

171

Colors represent variations in signal intensity ranging from blue (low intensity) to yellow

172

(high intensity). It is quite noticeable that there is a high intensity area crossing diagonally all

173

the contour map. Such signals are not related to the samples, and are caused by light

174

scattering effects (Rayleigh scatter), a physical phenomenon that occurs naturally in

175

fluorescence analysis. Given that such signals can overlap with signals from the sample,

176

scattering removal algorithms should be employed. In this study, we used the one proposed by

177

Bahram et al. (2006). The same mean contour map obtained after the scattering removal can

178

be seen on Figure 2b. In this map, one can observe four high intensity regions corresponding

179

to the following excitation/emission pairs: 370/440 nm, 400/500 nm, 390/540 nm and 420/440

180

nm. However, it is not possible to infer directly about the relevance of such regions and obtain

181

a precise spectral attribution, given the high interference region observed in the center of the

182

contour map, probably associated to spectral overlapping of several fluorophore compounds

183

in the coffee extract.

184 185

In order to facilitate spectral interpretation, the PARAFAC curve resolution technique was

186

employed. A three component model, accounting for 97% of X (spectral data) variance and

187

presenting a core consistency value of 78 was created. No visible trends were found among

188

the different groups when the scores of the three components were plotted against each other. l8

ACCEPTED MANUSCRIPT 189 190

Even though PARAFAC was not able to indicate any tendency of sample grouping, its use is

191

still relevant, because it performs spectral deconvolution and separates overlaping signals.

192

Thus, pure spectra are obtained for each of the model components, indicating which

193

fluorophores are present in the samples and improving spectral interpretation. Figure 3 shows

194

the loadings obtained for the three components. Belay et al. (2015) studied the interaction

195

between caffeic acid, chlorogenic acid (5-CQA) and caffeine, and, by experimental methods,

196

estimated the emission and excitation peaks for caffeic acid as 370 and 460 nm, respectively.

197

These values are similar to the ones presented by the first component (Figure 3a), which

198

indicates that this component might be due to the presence of caffeic acid in the extracts, a

199

phenolic compound present in coffee usually esterified to quinic acid thus comprising the

200

chlorogenic acids (Farah & Donangelo, 2006).

201 202

The second component presented an excitation peak near 410 nm and emission peak around

203

540 nm (Figure 3b), which is similar to the excitation/emission peaks of quercetin in a PBS

204

solution (Nifli et al., 2007). Quercetin is a flavonoid present in green coffees (Mullen et al.,

205

2013), and, together with caffeic acid, represent the major antioxidants in beverages

206

containing caffeine (Woodward, 2008). The third component presented excitation/emission

207

peaks that are consistent with the lipid fraction of coffee (Figure 3c), sometimes specifically

208

attributed to the tocopherol present in it (Guzmán et al., 2015; Tanajura da Silva et al., 2015).

209

Tocopherol is a major component of the unsaponifiable fraction of coffee oil (Speer &

210

Kölling-Speer, 2006) and has been used as a marker for detection and identification of

211

adulteration of roasted and ground coffee with corn (Jham et al., 2007).

212 l9

ACCEPTED MANUSCRIPT 213

Since no natural clustering was observed regarding the different origins of the coffees in the

214

PARAFAC results, an N-way supervised classification method, the NPLS-DA, was used to

215

develop a discrimination method for the classification of the coffee samples.

216 217

5.2. NPLS-DA

218 219

A NPLS-DA model was built using the same 110 samples of green coffee described

220

previously in section 4.1. Samples from each region were separated into training (two thirds

221

of the total samples) and test (remaining one third) sets, using the Kennard-Stone (KS)

222

algorithm. Data were unfolded in a two-dimensional array, since the KS algorithm is not

223

suitable for cubic arrays. After the separation process, the data were refolded to their original

224

shape.

225 226

At the end of the process, the training set was composed of 75 samples (23 from CM, 14 from

227

MM, 14 from NM and 24 from SM) and the test set was composed of 35 samples (11 from

228

CM, 7 from MM, 6 from NM and 11 from SM). The number of latent variables (LV) was

229

selected using random subsets cross validation, and the LV number that presented the smallest

230

cross validation classification error was chosen. The best models were built with 2 LVs that

231

accounted for 84.5% and 12.5% of the X and Y variance, respectively.

232 233

As can be seen in Figure 4, the majority of the CM samples were correctly classified both in

234

the training and the test sets, and there was also a high number of false positive samples from

235

every class, mostly from SM samples. MM classification model was able to differentiate the

236

MM samples from CM and SM, but not from NM, with a high number of false positives and l10

ACCEPTED MANUSCRIPT 237

false negatives between these two regions. NM and SM models also presented a similar

238

behavior, with a high number of misclassifications from all the different regions. In the SM

239

model, for example, all of the CM samples were misclassified as false positive.

240 241

Table 1 summarizes all the classification results, based on the most probable class

242

classification (Wise et al., 2006), organized in such a way to facilitate the identification of the

243

sources of misclassification of regions. The most noticeable misclassifications are in regard to

244

the SM samples. More than 50% of them were incorrectly classified as CM in the training set.

245

A significant number of misclassifications (more than 60% of the samples) also occurred in

246

the test set.

247 248

Table 1 also show some qualitative Figures of Merit (FoM) that were estimated based on the

249

models results. False Negative rate (FNR) and False Positive Rate (FPR) give us an idea of

250

how the classification errors are distributed, and if a model is able to efficiently classify its

251

positive samples (FNR) and correctly discriminate the samples that do not belong to it (FPR)

252

(Christin et al., 2012):

253 254

𝐹𝑃𝑅 =

𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 + 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

(1)

𝐹𝑁𝑅 =

𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 + 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

(2)

255 256 257 258 259

Table 1 - Confusion matrix and qualitative figures of merit estimated for the developed

260

NPLS-DA model.

261 l11

ACCEPTED MANUSCRIPT

SM 14 1 1 8

CM MM NM SM

CM 10 0 0 1

MM 1 5 0 1

NM 1 2 2 1

SM 7 1 0 3

FPR

FNR

NMC

Precision

Recall

F1 Scores

CM

0.296

0.115

0.411

0.488

0.870

0.625

MM

0.092

0.263

0.355

0.600

0.643

0.621

NM

0.240

0.429

0.667

0.333

0.574

0.421

SM

0.123

0.410

0.533

0.500

0.304

0.378

CM

0.273

0.083

0.356

0.526

0.909

0.667

MM

0.097

0.222

0.319

0.625

0.714

0.667

NM

0.000

0.400

0.400

1.000

0.333

0.500

SM

0.111

0.421

0.532

0.500

0.273

0.353

Training

NM 2 3 8 1

Test

MM 2 9 0 3

Training

CM MM NM SM

CM 20 0 0 3

Test

Predicted as

Actual class

262 263 264

Precision is directly related to FPR and indicates the percentage of correctly classified

265

positive samples among all classified as positive (true positive and false positive):

266 267

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

(3)

268 269

This FoM provides an idea of how capable is the method in separating the samples belonging

270

to one class from the ones that do not belong to it (Christin et al., 2012). In the training set,

271

precision values ranged from 0.333 to 0.600, and in the test set it ranged from 0.500 to 1.000. l12

ACCEPTED MANUSCRIPT 272

CM and SM models presented the smallest precision values. Although the CM model

273

classified almost all of its samples correctly, it also presented a great number of false samples,

274

which lowered its precision. The opposite happened with the SM model, which was not able

275

to classify its related samples in a proper way (only 8 out of 20). However, when compared

276

with CM, it presented only a few false positive samples (7 in SM against 21 in CM).

277 278

The recall (also called sensitivity or true positive rate) represents the model capacity of

279

providing a correct classification of its positive samples, considering only the positive

280

samples (Christin et al., 2012):

281 282

𝑅𝑒𝑐𝑎𝑙𝑙 =

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

(4)

283 284

It is interesting to notice that CM and MM models obtained recall values distinguishably

285

higher than NM and SM models, for both training and test sets. This indicates that CM and

286

MM models have a higher capacity of classifying correctly its positive samples. The Number

287

of Misclassification (NMC) and the f-score provide an overall evaluation of model quality:

288 289

290

𝑁𝑀𝐶 = 𝐹𝑃𝑅 + 𝐹𝑁𝑅 (5)

𝑓 ‒ 𝑠𝑐𝑜𝑟𝑒 =

(

(𝛽2 + 1) 𝑥 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑟𝑒𝑐𝑎𝑙𝑙 𝛽2𝑥 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑟𝑒𝑐𝑎𝑙𝑙

)

(6)

291 292

NMC estimates the total of misclassified samples (both false positive and false negative) in a

293

model, and the f-score provides a harmonic mean of the precision and recall, that is more l13

ACCEPTED MANUSCRIPT 294

focused in how good a model is in classifying correctly its positive samples. Recall and

295

precision are balanced in the f-score when the β constant parameter is set to 1 and is in favor

296

of precision when β > 1. In this paper, β was set to 1, because the main goal was to correctly

297

classify all samples, which requires both precision and recall to be taken in account at the

298

same extent (Christin et al., 2012). NMC values ranged from 0.355 (MM) to 0.667 (NM) for

299

the training set and from 0.319 (MM) to 0.532 (SM) for the test set. SM had the worst NMC

300

for the test set and the second worst in the training set, which only reassures the model

301

difficulty in classifying correctly the samples from these specific regions. CM and MM had

302

the smallest NMC values, but can still be considered high when compared to others published

303

in the literature. Taveira et al. (2014) were able to develop a model capable of classifying

304

samples from three different locations in Minas Gerais using metabolomics profile obtained

305

by CG-Q/MS, obtaining a 100% correct classification of its samples. Mehari et al. (2016)

306

when developing a chemometric method to determine geographical origin of Ethiopian

307

coffees using UPLC-MS did a similar approach. The phenolic compounds profile was used,

308

and a 90% correct classification rate was obtained. These higher classification rates may be

309

associated to the utilization of more sophisticated analytical methods, capable of detecting

310

substances in much smaller concentrations than the fluorescence spectroscopy.

311 312

Link et al. (2014b) used mid-infrared spectroscopy to determine the geographical origin of

313

Brazilian coffees produced in four different locations of Parana State. The results presented

314

by the authors were similar to the ones obtained by Taveira et al. (2014) and Mehari et al.

315

(2016), with almost 90% of correct classification. The factor that might have contributed the

316

most to a good performance was the small sample variance. Samples from only three

317

locations and harvested in the same year were used. In the present work, the samples came l14

ACCEPTED MANUSCRIPT 318

from more than 80 different locations and with harvesting years ranging from 2012 to 2015.

319

The natural variance of samples due to both different locations and harvested years represents

320

a more realistic situation because commercialized coffees may come from different crop

321

years, since producers and respective cooperatives and associations usually store coffees from

322

different crops, in order to get better prices for the bags due to market prices fluctuations. As

323

this factor does not affect Designation of Origin (DO), variations in samples crop years,

324

ranging from 2012 to 2015, were taken in consideration to obtain a representative and robust

325

multivariate model.

326 327 328

The two components used in the NPLS-DA model are depicted in Figure 5. It is possible to

329

see that these components are consistent with the second and third components from

330

PARAFAC model. This indicates that, even though caffeic acid has been detected by

331

fluorescence spectroscopy, it does not contribute to the classification of the samples.

332 333

Similar results were found by Taveira et al. (2014), where caffeic acid was not a significant

334

metabolite in the classification of coffee samples from different locations of Minas Gerais

335

State. As the NPLS-DA model performance was still considered unsatisfactory, another

336

strategy was used, the UPLS-DA, where the trilinear data is unfolded to a bidimensional

337

array, and a conventional PLS-DA algorithm is used (Olivieri & Escandar, 2014).

338 339

5.2. UPLS-DA

340

l15

ACCEPTED MANUSCRIPT 341

The same 110 samples used in PARAFAC and NPLS-DA models were used to build the

342

UPLS-DA models. The samples were divided into training and test sets, the same way as

343

described for the NPLS-DA, except that they were not refolded for model construction. A few

344

pre-processing algorithms were tested in order to improve the model classification, and the

345

best results were obtained by the combination of MSC (Multiplicative Scatter Correction) and

346

OSC (Orthogonal Signal Correction). The number of latent variables (LV) was selected using

347

random subset cross validation, and the LV number that presented the smallest cross

348

validation classification error was chosen. The best models were built with 8 LVs that

349

accounted for 99% and 51% of the X and Y variance, respectively.

350 351

As can be seen in Figure 6, in an overall way, the models built using the UPLS-DA presented

352

better performances than the ones built with NPLS-DA. MM model was able to classify

353

correctly all of its samples in the training set, followed by NM model (only one

354

misclassification), CM (two misclassifications) and SM (five misclassifications). SM

355

misclassifications were similar to the ones in the NPLS-DA, with all the five samples being

356

classified as CM. This confirms that there really is a similarity amongst the fluorophore

357

compositions of these two regions.

358 359

Table 2 shows the confusion matrix and the qualitative FoM estimated for the UPLS-DA

360

models. All the FoM are indicating an improvement in the classification capacity of the

361

models. The precision values are ranging from 0.808 (CM) to 1.000 (NM and MM) for the

362

training set and from 0.500 (NM) to 1.000 (MM) for the test set. The recall values were also

363

improved with the unfolding process, with values ranging from 0.792 (SM) to 1.000 (MM)

364

for the training set and from 0.143 to 0.909 (CM and SM) for the test set. Although the MM l16

ACCEPTED MANUSCRIPT 365

model was able to classify correctly all of its samples in the training set, it was only capable

366

of classifying correctly 1 out of 7 samples in the test set, which resulted in poor recall and F1

367

Scores. The precision was not affected, because the model did not generate any false positive

368

samples. It is also noticeable the improvement of the NMC parameter for all the models. The

369

number of misclassification ranged from 0% (MM) to 22.8% (SM) for the training set and

370

from 16% (CM and SM) to 46.2% (MM) for the test set. If the MM model is not considered,

371

MMC would range from 16 to 29% which is not far from the performance obtained by other

372

classifications methods employing more sophisticated techniques, such as UPLC-MS or CG-

373

MS (Taveira et al., 2014; Mehari et al., 2016).

374 375

Table 2 - Confusion matrix and qualitative figures of merit estimated for the developed

376

UPLS2-DA model

377

SM 5 0 0 19

CM MM NM SM

CM 10 0 0 1

MM 0 1 5 1

NM 1 0 5 0

SM 1 0 0 10

FPR

FNR

NMC

Precision

Recall

F1 Scores

CM

0.088

0.080

0.168

0.808

0.913

0.857

MM

0.000

0.000

0.000

1.000

1.000

1.000

NM

0.000

0.067

0.067

1.000

0.929

0.963

SM

0.056

0.172

0.228

0.864

0.792

0.826

CM

0.077

0.083

0.160

0.833

0.909

0.870

Training

NM 0 0 13 1

Test

MM 0 14 0 0

Training

CM MM NM SM

CM 21 0 0 2

Tes

Predicted as

Actual class

l17

MM

0.000

0.462

0.462

1.000

0.143

0.250

NM

0.147

0.143

0.290

0.500

0.833

0.625

SM

0.077

0.083

0.160

0.833

0.909

0.870

t

ACCEPTED MANUSCRIPT

378 379 380

Table 3 shows a comparison between F1 Scores from UPLS-DA and NPLS-DA. With this

381

comparison, it is clear that the unfolding process improved the performance of classification

382

of coffees from different regions of Minas Gerais State, except for the MM model in the test

383

set, that presented a significant reduction in the F1 Scores (62.5%).

384 385 386

Table 3 – Comparison between NPLS-DA and UPLS-DA f-scores

UPLS-DA 0.857 1.000 0.963 0.826 0.870 0.250 0.625 0.870

Test

CM MM NM SM CM MM NM SM

NPLS-DA 0.625 0.621 0.250 0.378 0.667 0.667 0.500 0.353

Training

f-score

387 388

The refolded VIP Scores for the UPLS-DA models can be seen in Figure 7. VIP Scores from

389

CM and MM models clearly resemble the second and the third components of the PARAFAC

390

model (quercetin and tocopherol, respectively), and both NM and SM presented a very

391

distinguishable peak, that can be related to the first component of PARAFAC, associated with

392

caffeic acid. Caffeic acid did not contribute to the classifications when using the NPLS-DA

l18

ACCEPTED MANUSCRIPT 393

models, so the improvement in performance obtained with the UPLS-DA models may be

394

attributed to the presence of this component in the VIP Scores.

395 396

6. Conclusions

397 398

The models developed from the results of fluorescence spectroscopy coupled with unfolded

399

partial least squares discriminant analysis (UPLS-DA) performed well in the classification of

400

coffees produced in different regions of Minas Gerais State. Model performance was similar

401

to other works published in the literature where more sophisticated time-consuming analytical

402

techniques were employed, with the exception of the MM model. The poor performance of

403

this model might be due to the selection of extreme samples in this class with the Kennard

404

Stone algorithm. Nonethelles, the presented technique provides an good alternative in

405

assuring the origin of coffee produced in the IP areas of Minas Gerais.

406 407

Acknowledgements

408

The authors acknowledge financial support from the following Brazilian Government

409

Agencies: CNPq (Grant # 475746/2013-9; 306139/2013-8) and FAPEMIG (Grant # PPM-

410

00619-15; BPD-00670-14)

411 412

References

413

CONAB. Companhia Nacional de Abastecimento. Acompanhamento da safra brasileira de

414

café – Terceiro levantamento Setembro/2015. (2015). http://www.conab.gov.br/OlalaCMS/

415

uploads/arquivos/15_09_29_09_01_35_boletim_cafe_setembro_2015.pdf Accessed 20.04.16.

l19

ACCEPTED MANUSCRIPT 416

ABIC. Café do Cerrado está mais valorizado. Associação Brasileira da Industria do Café.

417

(2016).

418

Acessed 20.04.16.

419

Cerrado Mineiro. Denominação de origem. (2016).

420

index.php?pg=denominacaodeorigem#group1/ Acessed 20.04.16.

421

Bahram, M., Bro, R., Stedmon, C., & Afkhami, A. (2006). Handling of Rayleigh and Raman

422

scatter for PARAFAC modeling of fluorescence data using interpolation. Journal of

423

Chemometrics, 20, 99–105.

424

Barbosa, J.N., Borém, F.M., Alves, H.M.R., Volpato, M.M.L., Vieira, T.G.C. & Souza,

425

V.C.O. (2010). Spatial distribution of coffees from Minas Gerais State and their relation with

426

quality, Coffee Science 5, 237-250.

427

Belay, A., Kim, H. K., & Hwang, Y.-H. (2016). Binding of caffeine with caffeic acid and

428

chlorogenic acid using fluorescence quenching, UV/vis and FTIR spectroscopic techniques.

429

Luminescence, 31, 565–572.

430

Bro, R. (1996). Multiway calibration. Multilinear PLS. Journal of Chemometrics, 10, 47–61.

431

Bro, R. (1997). PARAFAC. Tutorial and applications, Chemometrics and Intelligent

432

Laboratory Systems, 38, 149-171.

433

(http://www.sciencedirect.com/science/article/pii/S0169743997000324)

434

Carrera, F., Leon-Camacho, M., Pablos, F., & Gonzalez, A. G. (1998). Authentication of

435

green coffee varieties according to their sterolic profile. Analytica Chimica Acta, 370, 131–

436

139.

http://abic.com.br/publique/cgi/cgilua.exe/sys/start.htm?sid=59&infoid=3662/

http://www.cerradomineiro.org/

l20

ACCEPTED MANUSCRIPT 437

Christin, C., Hoefsloot, H. C. J., Smilde, a. K., Hoekman, B., Suits, F., Bischoff, R., &

438

Horvatovich, P. (2013). A critical assessment of feature selection methods for biomarker

439

discovery in clinical proteomics. Molecular & Cellular Proteomics, 12, 263–276.

440

Costa Freitas, A. M., & Mosca, A. I. (1999). Coffee geographic origin - An aid to coffee

441

differentiation. Food Research International, 32, 565–573.

442

Farah, A. & Donangelo, C. M. (2006). Phenolic compounds in coffee. Brazilian Journal of

443

Plant Physiology, 18, 23–36.

444

Guzmán, E., Baeten, V., Pierna, J. A. F. & García-Mesa, J. A. (2015). Evaluation of the

445

overall quality of olive oil using fluorescence spectroscopy. Food Chemistry, 173, 927–934.

446

Jham, G. N., Winkler, J. K., Berhow, M. A. & Vaughn, S. F. (2007). γ-Tocopherol as a

447

Marker of Brazilian Coffee (Coffea arabica L.) Adulteration by Corn. Journal of Agricultural

448

and Food Chemistry, 55, 5995–5999.

449

Link, J. V., Lemes, A. L. G., Marquetti, I., dos Santos Scholz, M. B., & Bona, E. (2014a).

450

Geographical and genotypic segmentation of arabica coffee using self-organizing maps. Food

451

Research International, 59, 1–7.

452

Link, J. V., Lemes, A. L. G., Marquetti, I., dos Santos Scholz, M. B., & Bona, E. (2014b).

453

Geographical and genotypic classification of arabica coffee using Fourier transform infrared

454

spectroscopy and radial-basis function networks. Chemometrics and Intelligent Laboratory

455

Systems, 135, 150–156.

456

Liu, H. C., You, C. F., Chen, C. Y., Liu, Y. C., & Chung, M. T. (2014). Geographic

457

determination of coffee beans using multi-element analysis and isotope ratios of boron and l21

ACCEPTED MANUSCRIPT 458

strontium. Food Chemistry, 142, 439–445.

459

MAPA. Ministério da Agricultura, Pecuária e Abastecimento. Indicação Geográfica – IG.

460

(2015).

461

Accessed 20.04.16.

462

Mehari, B., Redi-Abshiro, M., Chandravanshi, B. S., Combrinck, S., Atlabachew, M., &

463

McCrindle, R. (2016). Profiling of phenolic compounds using UPLC-MS for determining the

464

geographical origin of green coffee beans from Ethiopia. Journal of Food Composition and

465

Analysis, 45, 16–25.

466

Mullen, W., Nemzer, B., Stalmach, A., Ali, S. and Combet, E. (2013). Polyphenolic and

467

Hydroxycinnamate Contents of Whole Coffee Fruits from China, India, and Mexico. Journal

468

of Agricultural and Food Chemistry, 61, 5298−5309.

469

Muniz-Valencia, R., Jurado, J. M., Ceballos-Magana, S. G., Alcazar, A., & Hernandez-Diaz,

470

J. (2014). Characterization of Mexican coffee according to mineral contents by means of

471

multilayer perceptrons artificial neural networks. Journal of Food Composition and Analysis,

472

34, 7–11.

473

Nifli, A., Theodoropoulos, P., Munier, S., Castagnino, C., Roussakis, E., Katerinopoulos, H.

474

E., Castanas, E. (2007). Quercetin Exhibits a Specific Fluorescence in Cellular Milieu : A

475

Valuable Tool for the Study of Its Intracellular Distribution. Journal of Agricultural and Food

476

Chemistry, 55, 2873–2878.

477

Olivieri, A. & Escandar, G. (2014). Practical Three Way Calibration. Philadelphia : Elsevier

478

(p. 330)

http://www.agricultura.gov.br/desenvolvimento-sustentavel/indicacao-geografica/

l22

ACCEPTED MANUSCRIPT 479

Risticevic, S., Carasek, E., & Pawliszyn, J. (2008). Headspace solid-phase microextraction-

480

gas chromatographic-time-of-flight mass spectrometric methodology for geographical origin

481

verification of coffee. Analytica Chimica Acta, 617, 72–84.

482

Rodrigues, C. I., Maia, R., Miranda, M., Ribeirinho, M., Nogueira, J. M. F., & Máguas, C.

483

(2009). Stable isotope analysis for green coffee bean: A possible method for geographic

484

origin discrimination. Journal of Food Composition and Analysis, 22, 463–471.

485

Sádecká, J., & Tóthová, J. (2007). Fluorescence Spectroscopy and Chemometrics in the Food

486

Classification: a Review. Czech Journal of Food Science, 25, 159–173.

487

Speer, K. & Kölling-Speer, I. (2006). The lipid fraction of the coffee bean. Brazilian Journal

488

of Plant Physiology, 18, 201–216.

489

Tanajura da Silva, C.E., Filardi, V.L., Pepe, I. M., Chaves, M. A. & Santos, C. M. S. (2015).

490

Classification of food vegetable oils by fluorimetry and artificial neural networks. Food

491

Control, 47, 86–91.

492

Taveira, J. H. D. S., Borém, F. M., Figueiredo, L. P., Reis, N., Franca, A. S., Harding, S. A.,

493

& Tsai, C. J. (2014). Potential markers of coffee genotypes grown in different Brazilian

494

regions: A metabolomics approach. Food Research International, 61, 75–82.

495

Toledo, P.R.A.B., Melo, M.M.R., Pezza, H.R., Toci, A.T., Pezza, L., & Silva, C.M. (2016).

496

Discriminant analysis for unveiling the origin of roasted coffee samples: A tool for quality

497

control of coffee related products. Food Control, In press, corrected proof.

498

Weckerle, B., Richling, E., Heinrich, S., & Schreier, P. (2002). Origin assessment of green

499

coffee (Coffea arabica) by multi-element stable isotope analysis of caffeine. Analytical and l23

ACCEPTED MANUSCRIPT 500

Bioanalytical Chemistry, 374, 886–890.

501

Wilkinson, J., Cerdan, C., & Dorigon, C. (2015). Geographical Indications and “Origin”

502

Products in Brazil - The Interplay of Institutions and Networks. World Development,

503

http://dx.doi.org/10.1016/j.worlddev.2015.05.003.

504

Wise, B. W., Gallagher, N. B., Bro, R., Shaver, J. M., Windig, W., & Koch, R. S. (2006).

505

PLS-Toolbox 4.0 for use with MatlabTM (Manual). Manson: EigenVector Research Inc.

506

Woodward, G. M. (2008). The potential effect of excessive coffee consumption on nicotine

507

metabolism: CYP2A6 inhibition by caffeic acid and quercetin. Bioscience Horizons, 1, 98–

508

103.

l24

ACCEPTED MANUSCRIPT 509

Figure Captions:

510 511

Figure 1 - Identification of major coffee producing regions in Minas Gerais State map.

512 513

Figure 2 – Mean contour map of EEM matrix from coffee extracts (a) before and (b) after the

514

scattering removal.

515 516

Figure 3 – First (a), second (b) and third (c) PARAFAC components (Blue lines – excitation;

517

Red lines – emission).

518 519

Figure 4 – Classification plots for the NPLS-DA models developed ( - CM  - MM  -

520

NM  - SM)

521 522

Figure 5 – First (a) and second (b) components from NPLS-DA model.

523 524

Figure 6 – Classification plots for the NPLS-DA models developed ( - CM  - MM  -

525

NM  - SM)

526 527

Figure 7 – VIP Scores from UPLS-DA models - (a) CM (b) MM (c) NM (d) SM

528 529

l25