Measurement of non-sugar solids content in Chinese rice wine using near infrared spectroscopy combined with an efficient characteristic variables selection algorithm

Measurement of non-sugar solids content in Chinese rice wine using near infrared spectroscopy combined with an efficient characteristic variables selection algorithm

Accepted Manuscript Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Character...

676KB Sizes 0 Downloads 24 Views

Accepted Manuscript Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Characteristic Variables Selection Algorithm Qin Ouyang, Jiewen Zhao, Quansheng Chen PII: DOI: Reference:

S1386-1425(15)30018-4 http://dx.doi.org/10.1016/j.saa.2015.06.071 SAA 13841

To appear in:

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy

Received Date: Revised Date: Accepted Date:

14 January 2015 21 June 2015 23 June 2015

Please cite this article as: Q. Ouyang, J. Zhao, Q. Chen, Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Characteristic Variables Selection Algorithm, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2015), doi: http://dx.doi.org/10.1016/j.saa. 2015.06.071

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Manuscript

4

Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Characteristic Variables Selection Algorithm

5

Qin Ouyang, Jiewen Zhao, and Quansheng Chen∗

6

School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China;

1

2 3

∗ Corresponding author. Tel.: +86-511-88790318; fax: +86-511-88780201 E-mail address: [email protected] (Q. Chen)

Manuscript

7

Abstract: The non-sugar solids (NSS) content is one of the most important nutrition indicators of

8

Chinese rice wine. This study proposed a rapid method for the measurement of NSS content in

9

Chinese rice wine using near infrared (NIR) spectroscopy. We also systemically studied the efficient

10

spectral variables selection algorithms that have to go through modeling. A new algorithm of

11

synergy interval partial least square with competitive adaptive reweighted sampling (Si-CARS-PLS)

12

was proposed for modeling. The performance of the final model was back-evaluated using root mean

13

square error of calibration (RMSEC) and correlation coefficient (Rc) in calibration set and similarly

14

tested by mean square error of prediction (RMSEP) and correlation coefficient (Rp) in prediction set.

15

The optimum model by Si-CARS-PLS algorithm was achieved when 7 PLS factors and 18 variables

16

were included, and the results were as follows: Rc=0.95 and RMSEC=1.12 in the calibration set,

17

Rp=0.95 and RMSEP=1.22 in the prediction set. In addition, Si-CARS-PLS algorithm showed its

18

superiority when compared with the commonly used algorithms in multivariate calibration. This

19

work demonstrated that NIR spectroscopy technique combined with a suitable multivariate

20

calibration algorithm has a high potential in rapid measurement of NSS content in Chinese rice wine.

21

Keywords: Chinese rice wine; Non-sugar solids; Near infrared spectroscopy; Synergy interval

22

partial least square; Competitive adaptive reweighted sampling

23

2

Manuscript

24

Introduction

25

Chinese rice wine, also known as yellow wine, fermented directly from glutinous rice with wheat

26

Qu (wheat Qu is made from raw wheat inoculated with moulds, bacteria, and yeast), is one of the

27

three most ancient alcoholic beverages in the world [1]. Chinese rice wine is classified in four

28

categories depending on the total sugar content: semi-dry, dry, semi-sweet and sweet [2]. They are

29

enjoyed by different consumers. Because of the high content of amino acids, proteins,

30

oligosaccharides, vitamins, and mineral elements, Chinese rice wine is known as a health beverage

31

[3]. Non-sugar solids (NSS) mainly include dextrin, protein and its decomposition products, glycerin,

32

non-volatile acid and so on. It is an important nutrition indicator to access the quality grade of

33

Chinese rice wine.

34

Currently, industries or agencies in China commonly employ the traditional analytical method

35

mentioned in GB/T 13662-2008 to detect NSS content in Chinese rice wine. Researchers also tried

36

some new methods to measure NSS content in Chinese rice wine [4]. Although the aforementioned

37

methods show good precision, accuracy and reliability, they are time-consuming, tedious and require

38

chemical use that is sometimes harmful to the environment and demand skilled manpower as well.

39

Thus, a simple, rapid and comparatively accurate method to detect NSS content in Chinese rice wine

40

is essentially required for food quality monitoring for the food industry and quality control agencies.

41

Near infrared (NIR) spectroscopy is a fast, easy, economical and non-destructive technique that

42

can be a suitable substitute for traditional analytical methods. This technique has been widely used in

43

food analysis and detection [5, 6]. Since 2006, Yu and Ying et al. [7-9] attempted to use NIR

44

spectroscopy technique for the determination of enological parameters (alcoholic degree, pH value,

45

total acid, amino acid nitrogen, degrees Brix and amino acids) in Chinese rice wine; their group also 3

Manuscript

46

applied NIR spectroscopy for the classification and identification analysis of Chinese rice wine [10].

47

While, the prediction of NSS content in Chinese rice wine using NIR spectroscopy remains scarce.

48

NIR spectra are mainly the absorption of the overtones and the combination of some functional

49

groups in samples, such as C-H (aliphatic), C-H (aromatic), C-O (carboxyl), O-H (hydroxyl) and

50

N-H (amine and amide) [11]. It is now well known that the amount of information derived from the

51

spectra data requires the use of multivariate calibration models to extract maximum understandable

52

data from the multivariate data set [12]. The previous studies about using NIR spectroscopy in the

53

analysis of Chinese rice wine mainly focus on models based on the full spectra or the manually

54

selected spectra [7, 8, 13]. The stability and prediction ability of full spectra models maybe

55

weakened because of involving the water absorption peak, other unrelated and collinear spectral

56

variables. Researchers have always been endeavoring in finding mathematical models with better

57

performance and stability [14, 15]. Variables selection methods are always the priority since they can

58

select useful information and/or eliminate variables mostly containing noise for improving the model

59

performance [16, 17], such as, interval PLS (iPLS) [18], synergy interval partial least squares

60

(Si-PLS) [19], genetic algorithms (GA) [20] and competitive adaptive reweighted sampling (CARS)

61

[21]. Different approaches and possible combinations differ in terms of accuracy. CARS as an

62

optimization tool, recently has been adopted for variables selection in spectroscopic multivariate

63

calibration [22-24]. Nevertheless, the published works mainly focused on selecting the efficient

64

variables by CARS from the full spectra [25-27], while, too many variables in full spectra may cause

65

that CARS cannot find the optimal variables. Si-PLS can help in selecting efficient spectral intervals

66

to achieve a good model; however, even in a small subinterval, there are still some collinear

67

variables. Combining the advantages of the two variables selection methods, a new algorithm, called 4

Manuscript

68

Si-CARS-PLS algorithm was proposed, which could improve the performance of models. This

69

algorithm includes two steps: the first is to select efficient spectral intervals by Si-PLS, and the

70

second is to select optimal variables from these efficient spectral intervals. Up to now, few studies on

71

the use of NIR spectroscopy with Si-CARS-PLS have been reported, and this algorithm has not yet

72

been applied to predict the quality of Chinese rice wine in modeling.

73

Therefore, the aim of this work was to provide a variables selection method, namely

74

Si-CARS-PLS algorithm, which can further improve the predictive ability of models and simplify the

75

models; additionally, apply NIR spectroscopy technique coupled with Si-CARS-PLS for the rapid

76

and accurate prediction of NSS content in Chinese rice wine.

77

Materials and methods

78

Samples

79

Totally 120 samples of Chinese rice wine, only from the semi-sweet category, were obtained from

80

“Danyang” brand, Jiangsu province Danyang Winery Co., Ltd., in order to keep the consistency of

81

experimental conditions and acquire good results as much as possible. In addition, Chinese rice wine

82

of the semi-sweet category is more popular to consumers in the region of Jiangsu province. Chinese

83

rice wine from Danyang region is well-known in China, which is made from high-quality glutinous

84

rice. These samples covered all types of semi-sweet products in this winery, in which, the same

85

product included three or four samples but from different manufacturing dates.

86

Spectral measurement

5

Manuscript

87

The NIR spectra of Chinese rice wine samples were acquired using the Antaris II Near-infrared

88

spectrophotometer (Thermo Electron Co., USA) with a transmittance module. The samples were

89

measured in a quartz cuvette with 1 mm optical path length that is a standard accessory from this

90

spectrophotometer. The cuvette was first washed by distilled water when each sample was finished,

91

then washed by the sample for measurement at least three times before spectra collection. Each

92

spectrum was the average of 16 scanning spectra. The range of spectra was from 4000 to 10000 cm−1

93

and the data were measured in every 3.856 cm−1, which resulted in 1557 variables. The spectral data

94

were collected as absorbance values [ log(1 / T ) ], where T = transmittance. Result Software (Antaris

95

II System, Thermo Electron Co., USA) was used in NIR spectral data acquisition. The room

96

temperature was kept at around 25oC to avoid the influence of the outer environmental condition on

97

the spectrophotometer. Each sample was measured in triplicate, and the triplicate measurements were

98

averaged to generate a single spectrum for each sample used for the subsequent analysis.

99

Reference analysis

100

Reference analysis of NSS in samples was in accordance with the official analytical method in

101

China (GB/T 13662-2008). The NSS is the total solids minus the total sugar. The NSS content

102

was expressed with a unit of g/L. Blank tests were made with distilled water. All chemicals were of

103

analytical grade.

104

The measurement of the total solids content was as follows: 5 mL of each sample in a constant

105

weight weighing bottle (50 mm×30 mm) was dried in an electric oven at 103 oC±2 oC, after 4 h, the

106

volatile substances (i.e., water, ethanol and volatile acid) were evaporated, and the remaining was the

107

total solids. The total solids content was the total weight (sample + the weighing bottle) minus the 6

Manuscript

108

weight of the weighing bottle. Their weights were weighted using an electronic analytical balance

109

(BS224S, Sartorius instrument Co., Beijing, China).

110

The measurement of the total sugar content was as follows: (1) calibrating the Fehling's solution A

111

and B: 5 mL Fehling's solution A and 5 mL Fehling's solution B were poured in a 250 mL

112

Erlenmeyer flask, and added with 30 mL distilled water. After mixing, glucose standard solution (2.5

113

g/L) that was 1 mL less than the pre-titration was added. The mixture was then heated to boiling

114

using an electric furnace, next, two drops of methylene blue indicator solution (10 g/L) was added,

115

and kept boiling for 2 min. Then, glucose standard solution was continually titrated into the mixture

116

until the disappearance of blue color. All titration operation steps should be completed within 3 min.

117

The weight of 5 mL Fehling's solution A and 5 mL Fehling's solution B that was equivalent to the

118

weight of glucose, which can be calculated according to:

m1 =

119

m × V1 1000

(1)

120

where m (g) is the weight of glucose in the preparation of glucose standard solution (2.5g), and V1

121

(mL) is the total volume of consumption of glucose standard solution in titration. (2) The preparation

122

of hydrolysate: according to the preliminary experiments, 10 mL of each sample was put in a 100 mL

123

volumetric flask, and 10 mL distilled water and 1 mL hydrochloric acid (6 mol/L) were added, and

124

then

125

methyl red indicator solution (1 g/L) was added, and then added with sodium hydroxide (200 g/L)

126

until the disappearance of red color. After the mixed solution adding to a constant volume of 100 mL

127

using distilled water, the hydrolysate was obtained, and following, it was filtered with filter paper for

128

using. (3) The measurement of samples: referring to the method of calibrating the

heated

in a

68oC-70oC water bath

for

7

15

min.

After

cooling,

two drops of

Manuscript

129

Fehling's solution A and B, using hydrolysate instead of glucose standard solution, the total sugar

130

content (g/L) can be obtained according to the following equation: X=

131

100 × m1 ×1000 V2 × V3

(2)

132

where m1 (g) is the weight of 5 mL Fehling's solution A and 5 mL Fehling's solution B that was

133

equivalent to the weight of glucose, V2 (mL) is the volume of hydrolysate in titration, and V3 (mL) is

134

the volume of sample.

135

Multivariate analysis

136

NIR spectroscopy combined with Si-CARS-PLS was used to develop models for predicting NSS

137

content in Chinese rice wine. First, Si-PLS was used to select efficient spectral intervals; then, CARS

138

was used to select the optimal variables from these efficient intervals, for building PLS models. In

139

model calibration, the optimal combination of intervals, variables and the number of PLS factors

140

were optimized by cross validation, determined according to the lowest root mean square error of

141

cross validation (RMSECV) [28]. The performance of the final model was back-evaluated by the

142

samples in calibration set, and tested by the independent samples in prediction set. Correlation

143

coefficient (Rc) and root mean square error of calibration (RMSEC) in the calibration set, and

144

correlation coefficient (Rp) and root mean square error (RMSEP) in the prediction set were used to

145

evaluate the performances of models respectively. Generally, good models should have higher Rc and

146

Rp values and lower RMSEC and RMSEP values. In addition, the difference between Rc and Rp or

147

between RMSEC and RMSEP should be small. A minor difference between RMSEC and RMSEP

148

indicates that the robustness of the models is satisfactory [29]. All data processing and analysis were

149

conducted in Matlab Version 7.10.0 (Mathworks, Natick, USA) using Microsoft Windows 7. 8

Manuscript

150

Si-PLS. Si-PLS algorithm is an all-possible-interval-combinations procedure tests based on all

151

possible PLS of all subsets of intervals. The principle of this algorithm is to split the data set into a

152

number of intervals (variable-wise) and to calculate all possible PLS model combinations of two,

153

three or four intervals. The combination of intervals with the lowest RMSECV is chosen [11].

154

CARS. CARS was proposed by Liang and Li et al. [30] which employed the simple but effective

155

principle “survival of the fittest” based on Darwin's Evolution Theory. The absolute values of

156

regression coefficients of PLS model are used as an index for evaluating the importance of each

157

variable. Then, based on the importance level of each variable, CARS sequentially selects N subsets

158

of variables from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each

159

sampling run, a fixed ratio (usually 80-90%) of samples is first randomly selected to establish a

160

calibration model. Next, based on the regression coefficients, a two-step procedure, including

161

exponentially decreasing function (EDF) and adaptive reweighted sampling (ARS), is adopted to

162

select the key variables. In the first step, EDF is utilized to remove the variables, which are of

163

relatively small absolute regression coefficients by force. In the second step, ARS is further

164

employed in CARS to eliminate variables in a competitive way. Finally, the subset of variables with

165

the lowest RMSECV is considered as the best variable subset [31].

166 167

The RMSECV, Rc RMSEC, Rp and RMSEP were calculated as Equations. (3), (4), (5), (6) and (7), more details about them can be found in our previous study [32]. n

∑ ( yˆ 168

RMSECV =

c \i

− y ci )

i =1

n

2

(3)

169

where n was the number of samples in the calibration set, y ci was the reference measurement value

170

of sample i, and yˆ c \i was the estimated value for sample i by the model constructed when leaving 9

Manuscript

171

out sample i. nc

∑ (y 172

RMSEC =

2

ci

i =1

∑ ( yˆ Rc = 1 −

(4)

nc nc

173

− yˆ ci )

2

ci

− yci )

i =1 nc

(5)

2 ∑ ( yˆ ci − yc ) i =1

174

where nc was the number of samples in the calibration set, y ci was the reference measurement

175

value of the ith sample, yˆ ci was the estimated value of the ith sample, and yc was the average of

176

all reference measurements values in the calibration set. np

∑ (y 177

RMSEP =

i =1

∑ (yˆ Rp = 1 −

(6)

np np

178

− yˆ pi )

2

pi

− y pi )

2

pi

i =1 np

(7)

∑ (yˆ

−y p )

2

pi

i =1

179

where n p was the number of samples in the prediction set, y pi was the reference measurement

180

value of the ith sample, yˆ pi was the estimated value of the ith sample, and y p was the average of

181

all reference measurements values in the prediction set.

182

In model calibration, all the 120 samples were divided into two subsets namely, calibration set and

183

prediction set. Samples in the calibration set were used to establish the model while samples in the

184

prediction set were applied to test the robustness of the established model. To avoid bias in subset

185

division, this division was made as follows: all samples were sorted according to their respective

186

y-value (viz. the reference values of NSS content). In order to divide the calibration/prediction

187

spectra, one sample from every three samples was selected as the sample in the prediction set, and 10

Manuscript

188

other two samples entered the calibration set. Thus, the calibration set contained 80 samples, and the

189

prediction set contained 40 samples. Table 1 summarizes the reference data for NSS content in the

190

calibration set and prediction sets. As shown in this table, the range of y-value in the calibration set

191

covered that in the prediction set. Moreover, the distribution in the calibration and prediction sets

192

was uniform.

193

Results and discussion

194

Spectral data preprocessing

195

Fig.1A presents the raw spectra profile of all the samples. The absorptions at around 5200 cm-1

196

were saturated (off scale) and with high noise signals. Thus, one segment of the spectrum was

197

removed: from 5025 to 5280 cm-1 due to the saturation of the spectrum caused by the strong

198

combination band of -OH from water [33], remaining 1490 spectral variables. Additionally, raw

199

spectra acquired from NIR spectrometer contained background information and noises beside sample

200

information, and some extremely few or tiny particles/bubbles in the samples will cause light scatter.

201

Before the calibration stage, the spectral data should be preprocessed for building reliable, accurate

202

and stable models. In this study, standard normal variate (SNV) transformation was used to

203

preprocess the raw spectra data, in order to eliminate the differences between samples due to

204

base-line shift, noises information and scatter effects. SNV transformation was performed for each

205

spectrum, individually, by subtracting the mean of the spectrum and scaling with the standard

206

deviation of the spectrum, as illustrated in the following equation:

207

xi ,SNV =

xi − x n

∑ (x i =1

11

2

i

− x ) / (n − 1)

(8)

Manuscript

xi is the ith variable in the

208

where, xi ,SNV is the SNV transformed spectral value for the ith variable,

209

raw spectrum, and x is the mean of the raw spectrum. The spectra after SNV preprocessing are

210

presented in Fig.1B.

211

Efficient intervals selected by Si-PLS

212

In this paper, the number of intervals was optimized by cross validation. Herein, the 1490

213

spectrum variables of Chinese rice wine was divided into 10, 11, 12, …, 25 intervals combined with

214

two, three or four subintervals. Meanwhile, the number of PLS factors was also optimized by cross

215

validation.

216

The best Si-PLS model was achieved when the spectra set was split into 11 intervals and the

217

intervals number [4 and 9] were combined. The efficient spectral intervals were corresponding to

218

5831-6352 and 8442-8959 cm−1, as shown in Fig.2. Totally, there were 271 variables selected by

219

Si-PLS.

220

Optimal variables selected by CARS

221

As for the implementation of CARS in this work, after optimization, 90% of calibration samples

222

(72 samples) was randomly selected for building model; the number of MC sample runs was set as

223

50; and models were optimized by 5-fold cross validation. Fig.3A shows RMSECV values with the

224

increasing of sampling runs from the CARS running. As can be seen from it, the RMSECV values

225

first descended which could be ascribed to the elimination of uninformative variables, and then

226

increased rapidly because of the loss of some useful information. The lowest RMSECV was acquired

227

when the number of sampling runs was 28, which was noted using asterisks in Fig.3A. Fig.3B shows 12

Manuscript

228

the regression coefficient path of each variable with the increase of sampling runs from the CARS

229

running. As shown in this figure, at first, the absolute value of regression coefficient of each variable

230

was very small. With the number of sampling runs increased, the coefficients of some variables

231

became larger and larger while others became smaller and smaller. The coefficients of some variables

232

even dropped to zero when these variables were eliminated by CARS due to their incompetence.

233

Thus, the larger the absolute coefficient is, the more probable the corresponding variable can survive.

234

The best variables subset with the lowest RMSECV was achieved when the sampling runs were 28,

235

which was marked by the asterisk in Fig.3B. Fig.3C shows the changing trend of the number of

236

sampled variables, in which, the number of sampled variables decreased fast at the first and then very

237

slowly showing that the variables selection undergoes two phase selection, i.e. fast selection and

238

refined selection. Eventually, the number of selected variables was 18 when the sampling runs were

239

28, which was marked by the asterisk in Fig.3C. The 18 variables were corresponding to 5847, 5851,

240

5854, 5920, 5939, 5978, 5982, 5989, 6094, 6109, 6113, 6317, 8566, 8616, 8824, 8855, 8859 and

241

8959 cm−1, which are also marked with the blue line in Fig.4A.

242

The selected 18 variables were used for building PLS model namely Si-CARS-PLS model. This

243

model was achieved with Rc =0.95, RMSEC=1.12, Rp=0.95 and RMSEP=1.22 using 7 PLS factors.

244

Fig.4B is the scatter plot that showed the correlation between NSS content obtained from reference

245

methods and those predicted by NIR in the calibration and prediction sets of the optimal

246

Si-CARS-PLS model.

247

Discussion of the results

248

In order to highlight the superiority of Si-CARS-PLS model, it was compared with PLS, iPLS, 13

Manuscript

249

Si-PLS and CARS-PLS. The results from different PLS models for predicting NSS content in

250

Chinese rice wine are presented in Table 2. As investigated from Table 2, the variables selection

251

methods showed obvious superiority in comparison with PLS. The variables selection methods can

252

dramatically reduce the number of variables, and largely improve the performance of models.

253

Si-CARS-PLS provided the best result with the finest predictive ability, stability and with the

254

smallest variables. The main reasons can be summarized as follows:

255

For classical PLS algorithm, although the obvious uninformative variables related to water have

256

been removed in the preprocessing, the remaining 1490 variables were used to develop calibration

257

model. Among the 1490 variables, there were many variables those were collinear and irrelevant

258

with NSS in Chinese rice wine. Too much unwanted information would inevitably have weakened

259

the performance of PLS model.

260

For iPLS model, the best model was achieved when the spectrum was split into 10 equidistant

261

intervals and model was constructed on the 3th spectral interval. The optimal intervals were

262

5407-5978 cm−1, including 149 variables. The selected spectral interval just corresponds to the first

263

overtone of -CH3, -CH2 and -CH. NSS contains many substances including hydrocarbon groups. The

264

iPLS selected useful information and removed large uninformative information, so it improved the

265

performance of model, giving better result than PLS. However, only one interval (i.e. 149 variables)

266

cannot provide sufficient information about NSS. Many of “uninformative variables” and “redundant

267

variables” were eliminated; meanwhile, some useful variables were abandoned as well.

268

In contrast with PLS and iPLS, Si-PLS showed its incomparable superiority. Because not only

269

Si-PLS can remove some “uninformative variables” and “redundant variables”, it also stores more

270

valuable information by combining several subintervals from the whole spectrum. The optimal 14

Manuscript

271

spectral intervals were 5831-6352, 8442-8959 cm−1 for NSS, totaling 271 variables. NSS includes

272

dextrin, protein and its decomposition products, glycerin, non-volatile acid, also many aliphatics and

273

aromatic substances, which is a complex chemical compound. These substances include many

274

hydrocarbon groups and aromatic nucleus. Moreover, the region of 5831-6352 cm−1 not only

275

contains the absorption of the first overtone of -CH3, but also contains the absorption of the first

276

overtone of the –ArCH. The region of 8442-8959 cm−1 contains the absorption of the second

277

overtone of –CH3, –CH2 and –CH. Therefore, the Si-PLS model provided useful information in

278

comparison with the iPLS model, leading to better results. However, there were still collinear

279

variables from two adjacent wavebands even in a small subinterval.

280

CARS-PLS, as a variable optimization tool, also presented better result than PLS. Nevertheless,

281

due to the larger number of original variables (i.e., 1490 variables), it increases the searching

282

difficulty for CARS. In addition, there exists randomness in selection of variables. So, to select

283

useful spectra intervals and reduce the number of variables at first, then applying CARS to further

284

search for valuable variables from the selected spectra intervals may be a good choice.

285

As a matter of fact, when compared with the commonly used algorithms (i.e. PLS, iPLS, Si-PLS

286

and CARS-PLS), Si-CARS-PLS algorithm was the best in modeling. Si-CARS-PLS model was

287

constructed successively using two steps in this work: (1) two efficient spectral intervals were

288

selected from 11 intervals obtaining 271 variables; (2) 18 optimal variables were selected from the

289

271 variables. Although the computation time for building the Si-CARS-PLS model was larger than

290

other models, the final model was the simplest for including the fewest variables, and the time for

291

predicting the quality of Chinese rice wine in the final Si-CARS-PLS model will be actually reduced.

292

Conclusions 15

Manuscript

293

This work modeled the suitability of NIR spectroscopy for the determination of NSS content in

294

Chinese rice wine. This work proposed Si-CARS-PLS algorithm in processing data, which combined

295

the superiority of Si-PLS and CARS. Si-CARS-PLS algorithm can improve the performance of

296

model when NIR spectroscopy technique is used for real-time measurement of the active ingredients

297

in beverage food, and is of great significance for the practical usage.

298

Acknowledgements

299

This work has been financially supported by the National Natural Science Foundation of China

300

(31271875) and the China Postdoctoral Science Foundation (2015M571698). We are also grateful to

301

Jiangsu Danyang Winery Co., Ltd. for providing us the Chinese rice wine samples.

302

References

303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324

[1] H.Y. Cheng, J.H. Liu, Z.G. Xu, X.F. Yin, Sens. Actuators B Chem., 73 (2012) 55-61. [2] J. Zhong, X. Ye, Z. Fang, G. Xie, N. Liao, J. Shu, D. Liu, Food Control, 28 (2012) 151-156. [3] H.Y. Li, Z.Y. Jin, X.M. Xu, Food Control, 32 (2013) 563-568. [4] Z.Q. Huang, S. Jin, S. Huang, C. Zheng, L. Ni, J. Chin. Inst. Food Sci. Technol., 13 (2013) 148-152. [5] Q. Ouyang, Q.S. Chen, J.W. Zhao, H. Lin, Food Bioprocess Technol., 6 (2013) 2486-2493. [6] J.W. Zhao, Q. Ouyang, Q.S. Chen, H. Lin, Food Sci Technol Int, 19 (2013) 305-314. [7] H.Y. Yu, Y.B. Ying, X.P. Fu, H.S. Lu, J. Near Infrared Spectrosc., 14 (2006) 37-44. [8] H.Y. Yu, H.R. Lin, H.R. Xu, Y.B. Ying, B.B. Li, X.X. Pan, J. Agric. Food. Chem., 56 (2008) 307-313. [9] F. Shen, X.Y. Niu, D.T. Yang, Y.Y. Ying, B.B. Li, G.Q. Zhu, J.A. Wu, J. Agric. Food. Chem., 58 (2010) 9809-9816. [10] F. Shen, D.T. Yang, Y.B. Ying, B.B. Li, Y.F. Zheng, T. Jiang, Food Bioprocess Technol., 5 (2012) 786-795. [11] X.B. Zou, J.W. Zhao, M.J.W. Povey, M. Holmes, H.P. Mao, Anal. Chim. Acta, 667 (2010) 14-32. [12] E. Teye, X. Huang, H. Dai, Q. Chen, Spectrochim. Acta, Part A, 114 (2013) 183-189. [13] X.Y. Niu, F. Shen, Y.F. Yu, Z. Yan, K. Xu, H.Y. Yu, Y.B. Ying, J. Agric. Food. Chem., 56 (2008) 7271-7278. [14] G. Wang, M. Ma, Z. Zhang, Y. Xiang, P.d.B. Harrington, Talanta, 112 (2013) 136-142. [15] J. Li, C. Zhao, W. Huang, C. Zhang, Y. Peng, Anal Methods-Uk, 6 (2014) 2170-2180. [16] D. Jie, L. Xie, X. Fu, X. Rao, Y. Ying, J. Food Eng., 118 (2013) 387-392. [17] H. Xiaowei, Z. Xiaobo, Z. Jiewen, S. Jiyong, Z. Xiaolei, M. Holmes, Food Chem., 164 (2014) 536-543. [18] L. Norgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Appl. Spectrosc., 54 (2000) 413-419. [19] H. Jiang, G. Liu, C. Mei, S. Yu, X. Xiao, Y. Ding, Spectrochim. Acta, Part A, 97 (2012) 277-283. [20] L. Cséfalvayová, M. Pelikan, I. Kralj Cigić, J. Kolar, M. Strlič, Talanta, 82 (2010) 1784-1790. [21] P. Nie, D. Wu, D.-W. Sun, F. Cao, Y. Bao, Y. He, Sens., 13 (2013) 13820-13834. [22] K.Y. Zheng, Q.Q. Li, J.J. Wang, J.P. Geng, P. Cao, T. Sui, X. Wang, Y.P. Du, Chemom. Intell. Lab. Syst., 112 (2012) 48-54. 16

Manuscript

325 326 327 328 329 330 331 332 333 334 335 336 337 338

[23] X. Zhang, W. Li, B. Yin, W.Z. Chen, D.P. Kelly, X.X. Wang, K.Y. Zheng, Y.P. Du, Spectrochim. Acta, Part A, 114 (2013) 350-356. [24] A.d.A. Gomes, R.K. Harrop Galvao, M.C. Ugulino de Araujo, G. Veras, E.C. da Silva, Microchem. J., 110 (2013) 202-208. [25] D. Xu, W. Fan, H. Lv, Y. Liang, Y. Shan, G. Li, Z. Yang, L. Yu, Spectrochim. Acta, Part A, 123 (2014) 430-435. [26] D. Wu, D.-W. Sun, Talanta, 111 (2013) 39-46. [27] X. Wei, N. Xu, D. Wu, Y. He, Food Bioprocess Technol.

, 7 (2014) 184-190.

[28] J.R. Cai, Q.S. Chen, X.M. Wan, J.W. Zhao, Food Chem., 126 (2011) 1354-1360. [29] K.S. Chia, H.A. Rahim, R.A. Rahim, Chinese J. Zhejiang Uni. Sci. B 13 (2012) 145-151. [30] H. Li, Y. Liang, Q. Xu, D. Cao, Anal. Chim. Acta, 648 (2009) 77-84. [31] K. Zheng, Q. Li, J. Wang, J. Geng, P. Cao, T. Sui, X. Wang, Y. Du, Chemom. Intell. Lab. Syst., 112 (2012) 48-54. [32] Q.S. Chen, P. Jiang, J.W. Zhao, Spectrochim. Acta Part A, 76 (2010) 50-55. [33] M. Casale, M.-J. Sáiz Abajo, J.-M. González Sáiz, C. Pizarro, M. Forina, Anal. Chim. Acta, 557 (2006) 360-366.

339

17

Manuscript

340

Figures Captions

341

Fig.1. The raw NIR spectra (A) and the preprocessed spectra (B) of Chinese rice wine samples.

342

Fig.2. The efficient spectral intervals selected by Si-PLS for predicting NSS content in Chinese rice

343

wine.

344

Fig.3. RMSECV values (A), the regression coefficient path of each variable (B), and the changing

345

trend of the number of sampled variables (C) with the increasing of sampling runs from the CARS

346

running.

347

Fig.4. The 18 variables selected by Si-CARS-PLS (A), and the reference values versus NIR

348

predictive values of NSS content in the calibration set and prediction sets of Si-CARS-PLS model

349

(B).

18

Figure 1

Figure 2

Figure 3

Figure 4

Table 1 Reference values of NSS content in the calibration and prediction set. Subsets

Unit

S.N.a

Range

Mean

S.D.b

Calibration set

g/L

80

8.70-24.4

16.6

3.70

Prediction set

g/L

40

8.80-24.2

16.6

3.72

a

N.S., the number of samples.

b

S.D., standard deviation.

Table 2 Results of different PLS models for predicting NSS content in rice wine. Calibration set Methods

Variables

Prediction set

PLS factors Rc

RMSEC

Rp

RMSEP

PLS

1490

4

0.83

2.08

0.78

2.33

iPLS

149

11

0.94

1.21

0.94

1.31

Si-PLS

271

9

0.95

1.20

0.94

1.33

CARS-PLS

23

8

0.94

1.23

0.93

1.41

Si-CARS-PLS

18

7

0.95

1.12

0.95

1.22

Highlights

Highlights ► NIR spectroscopy was used for measuring non-sugar solids in Chinese rice wine. ► A new algorithm of Si-CARS-PLS was proposed for modeling. ► Si-CARS-PLS showed superiority in modeling when compared with other algorithms.

1