Accepted Manuscript Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Characteristic Variables Selection Algorithm Qin Ouyang, Jiewen Zhao, Quansheng Chen PII: DOI: Reference:
S1386-1425(15)30018-4 http://dx.doi.org/10.1016/j.saa.2015.06.071 SAA 13841
To appear in:
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
Received Date: Revised Date: Accepted Date:
14 January 2015 21 June 2015 23 June 2015
Please cite this article as: Q. Ouyang, J. Zhao, Q. Chen, Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Characteristic Variables Selection Algorithm, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2015), doi: http://dx.doi.org/10.1016/j.saa. 2015.06.071
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Manuscript
4
Measurement of Non-sugar Solids Content in Chinese Rice Wine using Near Infrared Spectroscopy Combined with an Efficient Characteristic Variables Selection Algorithm
5
Qin Ouyang, Jiewen Zhao, and Quansheng Chen∗
6
School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China;
1
2 3
∗ Corresponding author. Tel.: +86-511-88790318; fax: +86-511-88780201 E-mail address:
[email protected] (Q. Chen)
Manuscript
7
Abstract: The non-sugar solids (NSS) content is one of the most important nutrition indicators of
8
Chinese rice wine. This study proposed a rapid method for the measurement of NSS content in
9
Chinese rice wine using near infrared (NIR) spectroscopy. We also systemically studied the efficient
10
spectral variables selection algorithms that have to go through modeling. A new algorithm of
11
synergy interval partial least square with competitive adaptive reweighted sampling (Si-CARS-PLS)
12
was proposed for modeling. The performance of the final model was back-evaluated using root mean
13
square error of calibration (RMSEC) and correlation coefficient (Rc) in calibration set and similarly
14
tested by mean square error of prediction (RMSEP) and correlation coefficient (Rp) in prediction set.
15
The optimum model by Si-CARS-PLS algorithm was achieved when 7 PLS factors and 18 variables
16
were included, and the results were as follows: Rc=0.95 and RMSEC=1.12 in the calibration set,
17
Rp=0.95 and RMSEP=1.22 in the prediction set. In addition, Si-CARS-PLS algorithm showed its
18
superiority when compared with the commonly used algorithms in multivariate calibration. This
19
work demonstrated that NIR spectroscopy technique combined with a suitable multivariate
20
calibration algorithm has a high potential in rapid measurement of NSS content in Chinese rice wine.
21
Keywords: Chinese rice wine; Non-sugar solids; Near infrared spectroscopy; Synergy interval
22
partial least square; Competitive adaptive reweighted sampling
23
2
Manuscript
24
Introduction
25
Chinese rice wine, also known as yellow wine, fermented directly from glutinous rice with wheat
26
Qu (wheat Qu is made from raw wheat inoculated with moulds, bacteria, and yeast), is one of the
27
three most ancient alcoholic beverages in the world [1]. Chinese rice wine is classified in four
28
categories depending on the total sugar content: semi-dry, dry, semi-sweet and sweet [2]. They are
29
enjoyed by different consumers. Because of the high content of amino acids, proteins,
30
oligosaccharides, vitamins, and mineral elements, Chinese rice wine is known as a health beverage
31
[3]. Non-sugar solids (NSS) mainly include dextrin, protein and its decomposition products, glycerin,
32
non-volatile acid and so on. It is an important nutrition indicator to access the quality grade of
33
Chinese rice wine.
34
Currently, industries or agencies in China commonly employ the traditional analytical method
35
mentioned in GB/T 13662-2008 to detect NSS content in Chinese rice wine. Researchers also tried
36
some new methods to measure NSS content in Chinese rice wine [4]. Although the aforementioned
37
methods show good precision, accuracy and reliability, they are time-consuming, tedious and require
38
chemical use that is sometimes harmful to the environment and demand skilled manpower as well.
39
Thus, a simple, rapid and comparatively accurate method to detect NSS content in Chinese rice wine
40
is essentially required for food quality monitoring for the food industry and quality control agencies.
41
Near infrared (NIR) spectroscopy is a fast, easy, economical and non-destructive technique that
42
can be a suitable substitute for traditional analytical methods. This technique has been widely used in
43
food analysis and detection [5, 6]. Since 2006, Yu and Ying et al. [7-9] attempted to use NIR
44
spectroscopy technique for the determination of enological parameters (alcoholic degree, pH value,
45
total acid, amino acid nitrogen, degrees Brix and amino acids) in Chinese rice wine; their group also 3
Manuscript
46
applied NIR spectroscopy for the classification and identification analysis of Chinese rice wine [10].
47
While, the prediction of NSS content in Chinese rice wine using NIR spectroscopy remains scarce.
48
NIR spectra are mainly the absorption of the overtones and the combination of some functional
49
groups in samples, such as C-H (aliphatic), C-H (aromatic), C-O (carboxyl), O-H (hydroxyl) and
50
N-H (amine and amide) [11]. It is now well known that the amount of information derived from the
51
spectra data requires the use of multivariate calibration models to extract maximum understandable
52
data from the multivariate data set [12]. The previous studies about using NIR spectroscopy in the
53
analysis of Chinese rice wine mainly focus on models based on the full spectra or the manually
54
selected spectra [7, 8, 13]. The stability and prediction ability of full spectra models maybe
55
weakened because of involving the water absorption peak, other unrelated and collinear spectral
56
variables. Researchers have always been endeavoring in finding mathematical models with better
57
performance and stability [14, 15]. Variables selection methods are always the priority since they can
58
select useful information and/or eliminate variables mostly containing noise for improving the model
59
performance [16, 17], such as, interval PLS (iPLS) [18], synergy interval partial least squares
60
(Si-PLS) [19], genetic algorithms (GA) [20] and competitive adaptive reweighted sampling (CARS)
61
[21]. Different approaches and possible combinations differ in terms of accuracy. CARS as an
62
optimization tool, recently has been adopted for variables selection in spectroscopic multivariate
63
calibration [22-24]. Nevertheless, the published works mainly focused on selecting the efficient
64
variables by CARS from the full spectra [25-27], while, too many variables in full spectra may cause
65
that CARS cannot find the optimal variables. Si-PLS can help in selecting efficient spectral intervals
66
to achieve a good model; however, even in a small subinterval, there are still some collinear
67
variables. Combining the advantages of the two variables selection methods, a new algorithm, called 4
Manuscript
68
Si-CARS-PLS algorithm was proposed, which could improve the performance of models. This
69
algorithm includes two steps: the first is to select efficient spectral intervals by Si-PLS, and the
70
second is to select optimal variables from these efficient spectral intervals. Up to now, few studies on
71
the use of NIR spectroscopy with Si-CARS-PLS have been reported, and this algorithm has not yet
72
been applied to predict the quality of Chinese rice wine in modeling.
73
Therefore, the aim of this work was to provide a variables selection method, namely
74
Si-CARS-PLS algorithm, which can further improve the predictive ability of models and simplify the
75
models; additionally, apply NIR spectroscopy technique coupled with Si-CARS-PLS for the rapid
76
and accurate prediction of NSS content in Chinese rice wine.
77
Materials and methods
78
Samples
79
Totally 120 samples of Chinese rice wine, only from the semi-sweet category, were obtained from
80
“Danyang” brand, Jiangsu province Danyang Winery Co., Ltd., in order to keep the consistency of
81
experimental conditions and acquire good results as much as possible. In addition, Chinese rice wine
82
of the semi-sweet category is more popular to consumers in the region of Jiangsu province. Chinese
83
rice wine from Danyang region is well-known in China, which is made from high-quality glutinous
84
rice. These samples covered all types of semi-sweet products in this winery, in which, the same
85
product included three or four samples but from different manufacturing dates.
86
Spectral measurement
5
Manuscript
87
The NIR spectra of Chinese rice wine samples were acquired using the Antaris II Near-infrared
88
spectrophotometer (Thermo Electron Co., USA) with a transmittance module. The samples were
89
measured in a quartz cuvette with 1 mm optical path length that is a standard accessory from this
90
spectrophotometer. The cuvette was first washed by distilled water when each sample was finished,
91
then washed by the sample for measurement at least three times before spectra collection. Each
92
spectrum was the average of 16 scanning spectra. The range of spectra was from 4000 to 10000 cm−1
93
and the data were measured in every 3.856 cm−1, which resulted in 1557 variables. The spectral data
94
were collected as absorbance values [ log(1 / T ) ], where T = transmittance. Result Software (Antaris
95
II System, Thermo Electron Co., USA) was used in NIR spectral data acquisition. The room
96
temperature was kept at around 25oC to avoid the influence of the outer environmental condition on
97
the spectrophotometer. Each sample was measured in triplicate, and the triplicate measurements were
98
averaged to generate a single spectrum for each sample used for the subsequent analysis.
99
Reference analysis
100
Reference analysis of NSS in samples was in accordance with the official analytical method in
101
China (GB/T 13662-2008). The NSS is the total solids minus the total sugar. The NSS content
102
was expressed with a unit of g/L. Blank tests were made with distilled water. All chemicals were of
103
analytical grade.
104
The measurement of the total solids content was as follows: 5 mL of each sample in a constant
105
weight weighing bottle (50 mm×30 mm) was dried in an electric oven at 103 oC±2 oC, after 4 h, the
106
volatile substances (i.e., water, ethanol and volatile acid) were evaporated, and the remaining was the
107
total solids. The total solids content was the total weight (sample + the weighing bottle) minus the 6
Manuscript
108
weight of the weighing bottle. Their weights were weighted using an electronic analytical balance
109
(BS224S, Sartorius instrument Co., Beijing, China).
110
The measurement of the total sugar content was as follows: (1) calibrating the Fehling's solution A
111
and B: 5 mL Fehling's solution A and 5 mL Fehling's solution B were poured in a 250 mL
112
Erlenmeyer flask, and added with 30 mL distilled water. After mixing, glucose standard solution (2.5
113
g/L) that was 1 mL less than the pre-titration was added. The mixture was then heated to boiling
114
using an electric furnace, next, two drops of methylene blue indicator solution (10 g/L) was added,
115
and kept boiling for 2 min. Then, glucose standard solution was continually titrated into the mixture
116
until the disappearance of blue color. All titration operation steps should be completed within 3 min.
117
The weight of 5 mL Fehling's solution A and 5 mL Fehling's solution B that was equivalent to the
118
weight of glucose, which can be calculated according to:
m1 =
119
m × V1 1000
(1)
120
where m (g) is the weight of glucose in the preparation of glucose standard solution (2.5g), and V1
121
(mL) is the total volume of consumption of glucose standard solution in titration. (2) The preparation
122
of hydrolysate: according to the preliminary experiments, 10 mL of each sample was put in a 100 mL
123
volumetric flask, and 10 mL distilled water and 1 mL hydrochloric acid (6 mol/L) were added, and
124
then
125
methyl red indicator solution (1 g/L) was added, and then added with sodium hydroxide (200 g/L)
126
until the disappearance of red color. After the mixed solution adding to a constant volume of 100 mL
127
using distilled water, the hydrolysate was obtained, and following, it was filtered with filter paper for
128
using. (3) The measurement of samples: referring to the method of calibrating the
heated
in a
68oC-70oC water bath
for
7
15
min.
After
cooling,
two drops of
Manuscript
129
Fehling's solution A and B, using hydrolysate instead of glucose standard solution, the total sugar
130
content (g/L) can be obtained according to the following equation: X=
131
100 × m1 ×1000 V2 × V3
(2)
132
where m1 (g) is the weight of 5 mL Fehling's solution A and 5 mL Fehling's solution B that was
133
equivalent to the weight of glucose, V2 (mL) is the volume of hydrolysate in titration, and V3 (mL) is
134
the volume of sample.
135
Multivariate analysis
136
NIR spectroscopy combined with Si-CARS-PLS was used to develop models for predicting NSS
137
content in Chinese rice wine. First, Si-PLS was used to select efficient spectral intervals; then, CARS
138
was used to select the optimal variables from these efficient intervals, for building PLS models. In
139
model calibration, the optimal combination of intervals, variables and the number of PLS factors
140
were optimized by cross validation, determined according to the lowest root mean square error of
141
cross validation (RMSECV) [28]. The performance of the final model was back-evaluated by the
142
samples in calibration set, and tested by the independent samples in prediction set. Correlation
143
coefficient (Rc) and root mean square error of calibration (RMSEC) in the calibration set, and
144
correlation coefficient (Rp) and root mean square error (RMSEP) in the prediction set were used to
145
evaluate the performances of models respectively. Generally, good models should have higher Rc and
146
Rp values and lower RMSEC and RMSEP values. In addition, the difference between Rc and Rp or
147
between RMSEC and RMSEP should be small. A minor difference between RMSEC and RMSEP
148
indicates that the robustness of the models is satisfactory [29]. All data processing and analysis were
149
conducted in Matlab Version 7.10.0 (Mathworks, Natick, USA) using Microsoft Windows 7. 8
Manuscript
150
Si-PLS. Si-PLS algorithm is an all-possible-interval-combinations procedure tests based on all
151
possible PLS of all subsets of intervals. The principle of this algorithm is to split the data set into a
152
number of intervals (variable-wise) and to calculate all possible PLS model combinations of two,
153
three or four intervals. The combination of intervals with the lowest RMSECV is chosen [11].
154
CARS. CARS was proposed by Liang and Li et al. [30] which employed the simple but effective
155
principle “survival of the fittest” based on Darwin's Evolution Theory. The absolute values of
156
regression coefficients of PLS model are used as an index for evaluating the importance of each
157
variable. Then, based on the importance level of each variable, CARS sequentially selects N subsets
158
of variables from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each
159
sampling run, a fixed ratio (usually 80-90%) of samples is first randomly selected to establish a
160
calibration model. Next, based on the regression coefficients, a two-step procedure, including
161
exponentially decreasing function (EDF) and adaptive reweighted sampling (ARS), is adopted to
162
select the key variables. In the first step, EDF is utilized to remove the variables, which are of
163
relatively small absolute regression coefficients by force. In the second step, ARS is further
164
employed in CARS to eliminate variables in a competitive way. Finally, the subset of variables with
165
the lowest RMSECV is considered as the best variable subset [31].
166 167
The RMSECV, Rc RMSEC, Rp and RMSEP were calculated as Equations. (3), (4), (5), (6) and (7), more details about them can be found in our previous study [32]. n
∑ ( yˆ 168
RMSECV =
c \i
− y ci )
i =1
n
2
(3)
169
where n was the number of samples in the calibration set, y ci was the reference measurement value
170
of sample i, and yˆ c \i was the estimated value for sample i by the model constructed when leaving 9
Manuscript
171
out sample i. nc
∑ (y 172
RMSEC =
2
ci
i =1
∑ ( yˆ Rc = 1 −
(4)
nc nc
173
− yˆ ci )
2
ci
− yci )
i =1 nc
(5)
2 ∑ ( yˆ ci − yc ) i =1
174
where nc was the number of samples in the calibration set, y ci was the reference measurement
175
value of the ith sample, yˆ ci was the estimated value of the ith sample, and yc was the average of
176
all reference measurements values in the calibration set. np
∑ (y 177
RMSEP =
i =1
∑ (yˆ Rp = 1 −
(6)
np np
178
− yˆ pi )
2
pi
− y pi )
2
pi
i =1 np
(7)
∑ (yˆ
−y p )
2
pi
i =1
179
where n p was the number of samples in the prediction set, y pi was the reference measurement
180
value of the ith sample, yˆ pi was the estimated value of the ith sample, and y p was the average of
181
all reference measurements values in the prediction set.
182
In model calibration, all the 120 samples were divided into two subsets namely, calibration set and
183
prediction set. Samples in the calibration set were used to establish the model while samples in the
184
prediction set were applied to test the robustness of the established model. To avoid bias in subset
185
division, this division was made as follows: all samples were sorted according to their respective
186
y-value (viz. the reference values of NSS content). In order to divide the calibration/prediction
187
spectra, one sample from every three samples was selected as the sample in the prediction set, and 10
Manuscript
188
other two samples entered the calibration set. Thus, the calibration set contained 80 samples, and the
189
prediction set contained 40 samples. Table 1 summarizes the reference data for NSS content in the
190
calibration set and prediction sets. As shown in this table, the range of y-value in the calibration set
191
covered that in the prediction set. Moreover, the distribution in the calibration and prediction sets
192
was uniform.
193
Results and discussion
194
Spectral data preprocessing
195
Fig.1A presents the raw spectra profile of all the samples. The absorptions at around 5200 cm-1
196
were saturated (off scale) and with high noise signals. Thus, one segment of the spectrum was
197
removed: from 5025 to 5280 cm-1 due to the saturation of the spectrum caused by the strong
198
combination band of -OH from water [33], remaining 1490 spectral variables. Additionally, raw
199
spectra acquired from NIR spectrometer contained background information and noises beside sample
200
information, and some extremely few or tiny particles/bubbles in the samples will cause light scatter.
201
Before the calibration stage, the spectral data should be preprocessed for building reliable, accurate
202
and stable models. In this study, standard normal variate (SNV) transformation was used to
203
preprocess the raw spectra data, in order to eliminate the differences between samples due to
204
base-line shift, noises information and scatter effects. SNV transformation was performed for each
205
spectrum, individually, by subtracting the mean of the spectrum and scaling with the standard
206
deviation of the spectrum, as illustrated in the following equation:
207
xi ,SNV =
xi − x n
∑ (x i =1
11
2
i
− x ) / (n − 1)
(8)
Manuscript
xi is the ith variable in the
208
where, xi ,SNV is the SNV transformed spectral value for the ith variable,
209
raw spectrum, and x is the mean of the raw spectrum. The spectra after SNV preprocessing are
210
presented in Fig.1B.
211
Efficient intervals selected by Si-PLS
212
In this paper, the number of intervals was optimized by cross validation. Herein, the 1490
213
spectrum variables of Chinese rice wine was divided into 10, 11, 12, …, 25 intervals combined with
214
two, three or four subintervals. Meanwhile, the number of PLS factors was also optimized by cross
215
validation.
216
The best Si-PLS model was achieved when the spectra set was split into 11 intervals and the
217
intervals number [4 and 9] were combined. The efficient spectral intervals were corresponding to
218
5831-6352 and 8442-8959 cm−1, as shown in Fig.2. Totally, there were 271 variables selected by
219
Si-PLS.
220
Optimal variables selected by CARS
221
As for the implementation of CARS in this work, after optimization, 90% of calibration samples
222
(72 samples) was randomly selected for building model; the number of MC sample runs was set as
223
50; and models were optimized by 5-fold cross validation. Fig.3A shows RMSECV values with the
224
increasing of sampling runs from the CARS running. As can be seen from it, the RMSECV values
225
first descended which could be ascribed to the elimination of uninformative variables, and then
226
increased rapidly because of the loss of some useful information. The lowest RMSECV was acquired
227
when the number of sampling runs was 28, which was noted using asterisks in Fig.3A. Fig.3B shows 12
Manuscript
228
the regression coefficient path of each variable with the increase of sampling runs from the CARS
229
running. As shown in this figure, at first, the absolute value of regression coefficient of each variable
230
was very small. With the number of sampling runs increased, the coefficients of some variables
231
became larger and larger while others became smaller and smaller. The coefficients of some variables
232
even dropped to zero when these variables were eliminated by CARS due to their incompetence.
233
Thus, the larger the absolute coefficient is, the more probable the corresponding variable can survive.
234
The best variables subset with the lowest RMSECV was achieved when the sampling runs were 28,
235
which was marked by the asterisk in Fig.3B. Fig.3C shows the changing trend of the number of
236
sampled variables, in which, the number of sampled variables decreased fast at the first and then very
237
slowly showing that the variables selection undergoes two phase selection, i.e. fast selection and
238
refined selection. Eventually, the number of selected variables was 18 when the sampling runs were
239
28, which was marked by the asterisk in Fig.3C. The 18 variables were corresponding to 5847, 5851,
240
5854, 5920, 5939, 5978, 5982, 5989, 6094, 6109, 6113, 6317, 8566, 8616, 8824, 8855, 8859 and
241
8959 cm−1, which are also marked with the blue line in Fig.4A.
242
The selected 18 variables were used for building PLS model namely Si-CARS-PLS model. This
243
model was achieved with Rc =0.95, RMSEC=1.12, Rp=0.95 and RMSEP=1.22 using 7 PLS factors.
244
Fig.4B is the scatter plot that showed the correlation between NSS content obtained from reference
245
methods and those predicted by NIR in the calibration and prediction sets of the optimal
246
Si-CARS-PLS model.
247
Discussion of the results
248
In order to highlight the superiority of Si-CARS-PLS model, it was compared with PLS, iPLS, 13
Manuscript
249
Si-PLS and CARS-PLS. The results from different PLS models for predicting NSS content in
250
Chinese rice wine are presented in Table 2. As investigated from Table 2, the variables selection
251
methods showed obvious superiority in comparison with PLS. The variables selection methods can
252
dramatically reduce the number of variables, and largely improve the performance of models.
253
Si-CARS-PLS provided the best result with the finest predictive ability, stability and with the
254
smallest variables. The main reasons can be summarized as follows:
255
For classical PLS algorithm, although the obvious uninformative variables related to water have
256
been removed in the preprocessing, the remaining 1490 variables were used to develop calibration
257
model. Among the 1490 variables, there were many variables those were collinear and irrelevant
258
with NSS in Chinese rice wine. Too much unwanted information would inevitably have weakened
259
the performance of PLS model.
260
For iPLS model, the best model was achieved when the spectrum was split into 10 equidistant
261
intervals and model was constructed on the 3th spectral interval. The optimal intervals were
262
5407-5978 cm−1, including 149 variables. The selected spectral interval just corresponds to the first
263
overtone of -CH3, -CH2 and -CH. NSS contains many substances including hydrocarbon groups. The
264
iPLS selected useful information and removed large uninformative information, so it improved the
265
performance of model, giving better result than PLS. However, only one interval (i.e. 149 variables)
266
cannot provide sufficient information about NSS. Many of “uninformative variables” and “redundant
267
variables” were eliminated; meanwhile, some useful variables were abandoned as well.
268
In contrast with PLS and iPLS, Si-PLS showed its incomparable superiority. Because not only
269
Si-PLS can remove some “uninformative variables” and “redundant variables”, it also stores more
270
valuable information by combining several subintervals from the whole spectrum. The optimal 14
Manuscript
271
spectral intervals were 5831-6352, 8442-8959 cm−1 for NSS, totaling 271 variables. NSS includes
272
dextrin, protein and its decomposition products, glycerin, non-volatile acid, also many aliphatics and
273
aromatic substances, which is a complex chemical compound. These substances include many
274
hydrocarbon groups and aromatic nucleus. Moreover, the region of 5831-6352 cm−1 not only
275
contains the absorption of the first overtone of -CH3, but also contains the absorption of the first
276
overtone of the –ArCH. The region of 8442-8959 cm−1 contains the absorption of the second
277
overtone of –CH3, –CH2 and –CH. Therefore, the Si-PLS model provided useful information in
278
comparison with the iPLS model, leading to better results. However, there were still collinear
279
variables from two adjacent wavebands even in a small subinterval.
280
CARS-PLS, as a variable optimization tool, also presented better result than PLS. Nevertheless,
281
due to the larger number of original variables (i.e., 1490 variables), it increases the searching
282
difficulty for CARS. In addition, there exists randomness in selection of variables. So, to select
283
useful spectra intervals and reduce the number of variables at first, then applying CARS to further
284
search for valuable variables from the selected spectra intervals may be a good choice.
285
As a matter of fact, when compared with the commonly used algorithms (i.e. PLS, iPLS, Si-PLS
286
and CARS-PLS), Si-CARS-PLS algorithm was the best in modeling. Si-CARS-PLS model was
287
constructed successively using two steps in this work: (1) two efficient spectral intervals were
288
selected from 11 intervals obtaining 271 variables; (2) 18 optimal variables were selected from the
289
271 variables. Although the computation time for building the Si-CARS-PLS model was larger than
290
other models, the final model was the simplest for including the fewest variables, and the time for
291
predicting the quality of Chinese rice wine in the final Si-CARS-PLS model will be actually reduced.
292
Conclusions 15
Manuscript
293
This work modeled the suitability of NIR spectroscopy for the determination of NSS content in
294
Chinese rice wine. This work proposed Si-CARS-PLS algorithm in processing data, which combined
295
the superiority of Si-PLS and CARS. Si-CARS-PLS algorithm can improve the performance of
296
model when NIR spectroscopy technique is used for real-time measurement of the active ingredients
297
in beverage food, and is of great significance for the practical usage.
298
Acknowledgements
299
This work has been financially supported by the National Natural Science Foundation of China
300
(31271875) and the China Postdoctoral Science Foundation (2015M571698). We are also grateful to
301
Jiangsu Danyang Winery Co., Ltd. for providing us the Chinese rice wine samples.
302
References
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
[1] H.Y. Cheng, J.H. Liu, Z.G. Xu, X.F. Yin, Sens. Actuators B Chem., 73 (2012) 55-61. [2] J. Zhong, X. Ye, Z. Fang, G. Xie, N. Liao, J. Shu, D. Liu, Food Control, 28 (2012) 151-156. [3] H.Y. Li, Z.Y. Jin, X.M. Xu, Food Control, 32 (2013) 563-568. [4] Z.Q. Huang, S. Jin, S. Huang, C. Zheng, L. Ni, J. Chin. Inst. Food Sci. Technol., 13 (2013) 148-152. [5] Q. Ouyang, Q.S. Chen, J.W. Zhao, H. Lin, Food Bioprocess Technol., 6 (2013) 2486-2493. [6] J.W. Zhao, Q. Ouyang, Q.S. Chen, H. Lin, Food Sci Technol Int, 19 (2013) 305-314. [7] H.Y. Yu, Y.B. Ying, X.P. Fu, H.S. Lu, J. Near Infrared Spectrosc., 14 (2006) 37-44. [8] H.Y. Yu, H.R. Lin, H.R. Xu, Y.B. Ying, B.B. Li, X.X. Pan, J. Agric. Food. Chem., 56 (2008) 307-313. [9] F. Shen, X.Y. Niu, D.T. Yang, Y.Y. Ying, B.B. Li, G.Q. Zhu, J.A. Wu, J. Agric. Food. Chem., 58 (2010) 9809-9816. [10] F. Shen, D.T. Yang, Y.B. Ying, B.B. Li, Y.F. Zheng, T. Jiang, Food Bioprocess Technol., 5 (2012) 786-795. [11] X.B. Zou, J.W. Zhao, M.J.W. Povey, M. Holmes, H.P. Mao, Anal. Chim. Acta, 667 (2010) 14-32. [12] E. Teye, X. Huang, H. Dai, Q. Chen, Spectrochim. Acta, Part A, 114 (2013) 183-189. [13] X.Y. Niu, F. Shen, Y.F. Yu, Z. Yan, K. Xu, H.Y. Yu, Y.B. Ying, J. Agric. Food. Chem., 56 (2008) 7271-7278. [14] G. Wang, M. Ma, Z. Zhang, Y. Xiang, P.d.B. Harrington, Talanta, 112 (2013) 136-142. [15] J. Li, C. Zhao, W. Huang, C. Zhang, Y. Peng, Anal Methods-Uk, 6 (2014) 2170-2180. [16] D. Jie, L. Xie, X. Fu, X. Rao, Y. Ying, J. Food Eng., 118 (2013) 387-392. [17] H. Xiaowei, Z. Xiaobo, Z. Jiewen, S. Jiyong, Z. Xiaolei, M. Holmes, Food Chem., 164 (2014) 536-543. [18] L. Norgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Appl. Spectrosc., 54 (2000) 413-419. [19] H. Jiang, G. Liu, C. Mei, S. Yu, X. Xiao, Y. Ding, Spectrochim. Acta, Part A, 97 (2012) 277-283. [20] L. Cséfalvayová, M. Pelikan, I. Kralj Cigić, J. Kolar, M. Strlič, Talanta, 82 (2010) 1784-1790. [21] P. Nie, D. Wu, D.-W. Sun, F. Cao, Y. Bao, Y. He, Sens., 13 (2013) 13820-13834. [22] K.Y. Zheng, Q.Q. Li, J.J. Wang, J.P. Geng, P. Cao, T. Sui, X. Wang, Y.P. Du, Chemom. Intell. Lab. Syst., 112 (2012) 48-54. 16
Manuscript
325 326 327 328 329 330 331 332 333 334 335 336 337 338
[23] X. Zhang, W. Li, B. Yin, W.Z. Chen, D.P. Kelly, X.X. Wang, K.Y. Zheng, Y.P. Du, Spectrochim. Acta, Part A, 114 (2013) 350-356. [24] A.d.A. Gomes, R.K. Harrop Galvao, M.C. Ugulino de Araujo, G. Veras, E.C. da Silva, Microchem. J., 110 (2013) 202-208. [25] D. Xu, W. Fan, H. Lv, Y. Liang, Y. Shan, G. Li, Z. Yang, L. Yu, Spectrochim. Acta, Part A, 123 (2014) 430-435. [26] D. Wu, D.-W. Sun, Talanta, 111 (2013) 39-46. [27] X. Wei, N. Xu, D. Wu, Y. He, Food Bioprocess Technol.
, 7 (2014) 184-190.
[28] J.R. Cai, Q.S. Chen, X.M. Wan, J.W. Zhao, Food Chem., 126 (2011) 1354-1360. [29] K.S. Chia, H.A. Rahim, R.A. Rahim, Chinese J. Zhejiang Uni. Sci. B 13 (2012) 145-151. [30] H. Li, Y. Liang, Q. Xu, D. Cao, Anal. Chim. Acta, 648 (2009) 77-84. [31] K. Zheng, Q. Li, J. Wang, J. Geng, P. Cao, T. Sui, X. Wang, Y. Du, Chemom. Intell. Lab. Syst., 112 (2012) 48-54. [32] Q.S. Chen, P. Jiang, J.W. Zhao, Spectrochim. Acta Part A, 76 (2010) 50-55. [33] M. Casale, M.-J. Sáiz Abajo, J.-M. González Sáiz, C. Pizarro, M. Forina, Anal. Chim. Acta, 557 (2006) 360-366.
339
17
Manuscript
340
Figures Captions
341
Fig.1. The raw NIR spectra (A) and the preprocessed spectra (B) of Chinese rice wine samples.
342
Fig.2. The efficient spectral intervals selected by Si-PLS for predicting NSS content in Chinese rice
343
wine.
344
Fig.3. RMSECV values (A), the regression coefficient path of each variable (B), and the changing
345
trend of the number of sampled variables (C) with the increasing of sampling runs from the CARS
346
running.
347
Fig.4. The 18 variables selected by Si-CARS-PLS (A), and the reference values versus NIR
348
predictive values of NSS content in the calibration set and prediction sets of Si-CARS-PLS model
349
(B).
18
Figure 1
Figure 2
Figure 3
Figure 4
Table 1 Reference values of NSS content in the calibration and prediction set. Subsets
Unit
S.N.a
Range
Mean
S.D.b
Calibration set
g/L
80
8.70-24.4
16.6
3.70
Prediction set
g/L
40
8.80-24.2
16.6
3.72
a
N.S., the number of samples.
b
S.D., standard deviation.
Table 2 Results of different PLS models for predicting NSS content in rice wine. Calibration set Methods
Variables
Prediction set
PLS factors Rc
RMSEC
Rp
RMSEP
PLS
1490
4
0.83
2.08
0.78
2.33
iPLS
149
11
0.94
1.21
0.94
1.31
Si-PLS
271
9
0.95
1.20
0.94
1.33
CARS-PLS
23
8
0.94
1.23
0.93
1.41
Si-CARS-PLS
18
7
0.95
1.12
0.95
1.22
Highlights
Highlights ► NIR spectroscopy was used for measuring non-sugar solids in Chinese rice wine. ► A new algorithm of Si-CARS-PLS was proposed for modeling. ► Si-CARS-PLS showed superiority in modeling when compared with other algorithms.
1