Accepted Manuscript Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods Jiating Li, Susu Zhu, Shui Jiang, Jun Wang PII:
S0023-6438(17)30297-9
DOI:
10.1016/j.lwt.2017.04.070
Reference:
YFSTL 6208
To appear in:
LWT - Food Science and Technology
Received Date: 1 January 2017 Revised Date:
22 April 2017
Accepted Date: 22 April 2017
Please cite this article as: Li, J., Zhu, S., Jiang, S., Wang, J., Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods, LWT - Food Science and Technology (2017), doi: 10.1016/j.lwt.2017.04.070. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Prediction of egg storage time and yolk index based on electronic
2
nose combined with chemometric methods
3
Jiating Li, Susu Zhu, Shui Jiang, Jun Wang*
4
Department of Biosystems Engineering, Zhejiang University, 886 Yuhangtang Road, Hangzhou 310058, China
5
Abstract: Egg storage time and yolk index, two descriptors of egg freshness, were evaluated by an electronic nose combined with
6
chemometric methods. To obtain more useful information from collected data, the wavelet energy was extracted as feature signal
7
by the wavelet transform method for qualitative and quantitative analysis. For qualitative analysis, linear discriminant analysis
8
(LDA) was applied to evaluate the feature signals, and the result indicated that these feature signals had good classification
9
performance with the first two scores explaining 82.50% of total variance. Moreover, probabilistic neural network (PNN) was
10
performed to classify eggs with different storage times, and 92.86% of samples in testing set were classified correctly. For
11
quantitative analysis, back propagation neural networks (BPNN) and support vector machine (SVM) were applied to build
12
prediction models of yolk index, indicating that SVM models (R2 = 0.9641 in training set and R2 = 0.8339 in testing set) were
13
better than BPNN (R2 = 0.8629 in training set and R2 = 0.7863 in testing set). To further improve the performance of SVM models,
14
independent component analysis (ICA) and local linear embedding (LLE) were used to reduce dimension of feature data, and the
15
results showed that ICA-SVM model had satisfying prediction performance (R2 > 0.97).
16
Key words: Electronic nose; Storage time; Yolk index; Support vector machine; Dimension reduction
AC C
EP
TE D
M AN U
SC
RI PT
1
17
*
Corresponding author. E-mail address:
[email protected] (J. Wang)
1
ACCEPTED MANUSCRIPT
1. Introduction
19
Eggs have always been one of the most important food in our daily life. So far, there is much research
20
conducted on detection of egg quality, concerning interior changes of eggs over storage. Typically, reduction
21
of egg quality can be explained as the result of an enhanced interaction between lysozyme and ovomucin as
22
pH increases during storage (Soltani & Omid, 2015). Although a method called ‘candling’ could offer to
23
examine eggs by checking internal characteristics of egg on top of bright light shining (Wang, Jiang, & Yu,
24
2004; Wang, & Jiang, 2005; Zhang, Pan, Tu, Zhan, & Tu, 2015), it would be so arduous that mistakes occur
25
easily. Therefore, new detecting techniques are in great need to detect egg quality more efficiently.
26
Recently, many studies have been carried out to develop techniques for nondestructive detection of egg quality.
27
Most of them focused on spectroscopic and optical methods. Near infrared reflection (NIR) spectroscopy is a
28
fast and accurate technique for nondestructive detection, and it has been conducted to measure egg freshness
29
by means of an FT-NIR spectrometer and a fiber optic probe (Giunchi, Berardinelli, Ragni, Fabbri, & Silaghi,
30
2008). Others combined the NIR technique with different data analysis methods like multivariate analysis
31
(Lin, Zhao, Sun, Chen, & Zhou, 2011) and support vector data description (Zhao et al., 2010), all of which
32
demonstrated the feasibility of detecting egg freshness by the NIR method. As for the optical method, Liu,
33
Ying, Ouyang and Li (2007) had investigated the potential of applying the ultraviolet and visible (UV/VIS 200
34
- 800 nm) transmittance method to inspect the internal quality of intact chicken egg, and reached the
35
conclusion that the nondestructive inspection of egg freshness by transmittance properties is feasible in the
36
range of 400 - 600 nm. Besides these major methods, dielectric properties of eggs have also been studied in
37
determining egg quality. For instance, Soltani, Omid and Alimardani (2015) developed an egg qualifying
38
system based on dielectric technology. In evaluation of the qualifying system, the mean absolute percent errors
AC C
EP
TE D
M AN U
SC
RI PT
18
ACCEPTED MANUSCRIPT obtained from testing sets were 5.41, 6.84, 8.79, and 4.24% for the Haugh unit, yolk index, yolk/albumen, and
40
yolk weight, respectively. Evaluation results showed the designed device which was fabricated based on
41
dielectric measurement and the machine vision technique could be confidently used in predicting egg quality
42
indices.
43
Undeniably, these approaches present potential solutions for the nondestructive detection of egg quality. Yet
44
there are two critical problems: first, eggshell may affect detection precision of these optical and spectroscopic
45
methods; second, applicability of dielectric properties in detecting egg quality remains to be improved. So it is
46
necessary to search for more efficient and economical ways to detect egg quality. Given that a change of egg
47
quality will give rise to changes in its volatile gas components, an electronic nose system may be a potential
48
alternative strategy for detecting egg quality by sensing its volatile profile. Actually, some researches have
49
already proved the possibility of detecting egg quality with an electronic nose. Dutta, Hines, Gardner, Udrea
50
and Boilot (2003) employed an array of four tin oxide sensors to predict egg freshness, and suggested that
51
eggs can be categorized into one of three states with up to 95% accuracy. Yongwei, Wang, Zhou and Lu
52
(2009) demonstrated the potential of monitoring internal quality of eggs during storage and established
53
prediction models for quality indices. These studies provide references for determining egg quality by
54
electronic nose.
55
Both Dutta et al. (2003) and Yongwei et al. (2009) focused mainly on the feasibility of detecting egg quality
56
by an electronic nose combined with certain frequently used data analysis methods. However, limited detailed
57
information is available on analyzing collected egg data by data preprocessing methods, as well as on tha
58
comparison among the adopted chemometric methods. Therefore, combined with data preprocessing
59
approaches and chemometric methods, this research aimed to study the feasibility of using electronic nose
60
system to predict storage time and yolk index, which are both simple but representative indicators of egg
AC C
EP
TE D
M AN U
SC
RI PT
39
3
ACCEPTED MANUSCRIPT 61
quality.
62
2. Materials and methods
64
2.1. Sample preparation
65
All 160 eggs, bought at local supermarkets, were freshly laid and collected in Hongxing village, China. Once
66
arrived in the laboratory, these eggs were cleaned and then stored in a chamber with condition of 20 °C and
67
relative humidity of 70%. 20 eggs were used as spare samples in case of any broken ones, the other 140 eggs
68
were divided into seven groups and each group contained 20 eggs that were numbered from 1 to 20. Each new
69
group of eggs was analyzed weekly, the data-collecting experiment lasted for 6 weeks.
70
2.2. Electronic nose system and sample procedure
71
In this study, an Electronic nose (PEN2, Airsense Company, German) equipped with an array of metal oxide
72
semiconductor (MOS) sensors was adopted to detect sample gas. The name and performance of each sensor
73
are showed in Table 1. Sample gas is inhaled into the sensor channel from the air inlet by a built-in pump, then
74
flows through the sensor array at a certain rate and finally is out from the outlet. The reference gas is the clean
75
air filtered by activated carbon, and is inhaled at a certain rate by another pump, flows through and cleans the
76
sensor array to make the responsive signal return to zero. Meanwhile, the reference gas also helps to prevent
77
remnant gas from impacting the next process by cleaning the sensor array. The responsive signal is the ratio
78
between the conductivity G when sensors get in touch with the sample gas and the conductivity G0 when
79
reference gas flows through the sensors (G/G0).
80
Static head space sampling system was adopted for sensing volatile profile out of egg shell. Determined by a
AC C
EP
TE D
M AN U
SC
RI PT
63
ACCEPTED MANUSCRIPT preliminary experiment, the mass of each egg was 60 ± 3 g and the most suitable sealing time was one hour.
82
The first step was to place each egg in a 500 mL beaker which was then sealed by preservative film for an
83
hour and maintained at room temperature (25 - 27 oC). Then, the inlet tube was inserted into the beaker by
84
using a syringe needle and the gas transmitted into the electronic nose. The electronic nose sampled and
85
recorded data at the frequency of 1 Hz. Each sample was detected for 70 s. Finally, the detected eggs were
86
broken to take out the complete yolk, and to measure the yolk index.
87
2.3. Measurement of yolk index
88
During storage, one significant change of egg is the decrease of vitelline membrane elasticity, allowing easier
89
migration of water from the albumin through the weaker vitelline membrane (Jones & Musgrove, 2005). The
90
result of this process is yolk flattening, which can be indicated by yolk index (YI). The procedure is: gently
91
break an egg and pull apart the shell; then pour the egg liquid onto a big clean watch glass; finally measure the
92
thickness and the diameter of yolk by using a vernier caliper. The YI was defined as follows (Funk, 1948):
93
YI =
94
where h denotes the thickness of yolk, and d denotes the diameter of yolk.
95
According to Funk (1948), YI indicates the viscosity of yolk, and the higher YI is, the better egg quality is. In
96
this research, YI for each egg group was determined by averaging the value of 20 egg samples each time.
97
2.4. Data processing
98
2.4.1. Feature extraction
99
Generally, the maximum value or mean value is used as feature signal in analysis of electronic nose data. Yet
100
the response originated in electronic nose is non-static, these static features (maximum value and mean value)
101
are likely to be exclusive of some significant characteristics of original response. Therefore, to acquire more
TE D
M AN U
SC
RI PT
81
(1)
AC C
EP
h × 100% d
5
ACCEPTED MANUSCRIPT representative information, wavelet energy, a dynamic feature, was extracted by wavelet transform (WT)
103
method and used as feature signal in this study.
104
WT was developed for the analysis of non-static signals. By WT, a family of functions called wavelets could
105
be generated by translating and dilating a single base function called mother wavelet (Moreno-Barón et al.,
106
2006). That is, the original responsive signal could be decomposed into its component elements with an
107
applicable mother wavelet. These elements contain a series of cAj set and a series of cDj set where j represents
108
the decomposition level. The cAj set and cDj set retain the low-frequency and high-frequency content of the
109
signal respectively, as shown in Fig. 1. Among these sets, coefficients in cA3 set encompasses a large
110
proportion of energy, which means it accounts for the majority of original information. Therefore, the feature
111
signal, wavelet energy, was calculated by all coefficients in cA3. The computational formula (Yin, Yu, &
112
Zhang, 2008) is as follows:
113
E=
∑ (a k =1
3k
)
2
(2)
TE D
n
M AN U
SC
RI PT
102
where E is wavelet energy value of each sensor, corresponding to the third frequency band; n is the number of
115
coefficients in cA3 set; a3k is the k-th coefficient in cA3 set.
116
In this study, the fifth-order wavelet transform of the Daubechies’ family (db5) and three-scale decomposition
117
were adopted to decompose the original signal. To illustrate, the responsive signal of S2 of a certain sample, as
118
shown in Fig. 2a, was decomposed by the fifth-order wavelet. A series of coefficients sets could be obtained
119
and used to reconstruct new corresponding signal. In this research, new signal was reconstructed from
120
coefficients in cA3 set, as shown in Fig. 2b. Fig. 2c depicts the numerical difference between original and
121
reconstructed signal, with a crest value of 5.56 × 10-13. This crest value is small enough to prove that the
122
original signal was well represented by the reconstructed signal. In other words, coefficients in cA3 set are
123
feasible to represent the major original information.
AC C
EP
114
ACCEPTED MANUSCRIPT 2.4.2 Qualitative classification analysis
125
Qualitative classification for egg storage time was performed by linear discriminant analysis (LDA) and
126
probabilistic neural network (PNN). LDA explicitly models the difference between the classes of data, and
127
tries to maximize the variance between categories and minimize the variance within categories. It provides a
128
classification model, characterized by a linear dependence of the classification scores with respect to the
129
descriptors, and the eigenvalues of LDA were determined to get more information on the relation of the factors
130
in the model analyses (Qiu, Wang, & Gao, 2015). PNN, introduced by Specht (1990) in the early 1990s, is a
131
feed-forward neural network, which is derived from the Bayesian network and a statistical algorithm called
132
Kernel Fisher Discriminant Analysis. The performance of PNN is decided by several factors including
133
smoothing parameters and the number of hidden layers. By PNN, the operations are organized into a
134
multilayered feed-forward network with four layers: input layer, pattern layer, summation layer, and
135
decision-making layer.
136
2.4.3. Quantitative prediction analysis
137
Quantitative calibration with respect to yolk index was performed using back propagation neural networks
138
(BPNN) and support vector machine (SVM), as well as optimization of SVM model by dimension reduction
139
algorithms (independent component analysis and local linear embedding).
140
During the building of BPNN, the connection weights are amended according to the gradient descent,
141
diminishing the global error, which is also a state of network convergence. A typical BPNN consists of three
142
layers: input layer, hidden layer and output layer. By BPNN, input information firstly moves forward towards
143
the nodes in hidden layer to be processed by a certain function; then, the processed signal spreads to the output
144
layer as final result. SVM is based on the principle of structural minimization in the Statistical Learning
145
Theory and was firstly put forward by Vapnik (1998) and his partners. SVM is powerful in handling the
AC C
EP
TE D
M AN U
SC
RI PT
124
7
ACCEPTED MANUSCRIPT problem with small samples, non linear and high-dimensional data sets (Wu, 2009), just as the electronic nose
147
data in this study (Yu, Wan, Zhou, & Yang, 2015). To obtain a good performance, the penalty parameter C and
148
kernel parameter V in SVM model should be optimized (Liu, Wang, Wang, & Li, 2013).
149
ICA is a highly efficient blind signal separation method. The basic restriction is that the independent
150
components must be non-Gaussian in nature (Di Natale, Martinelli, & D’Amico, 2002). ICA can be used to
151
extract independent components from the observed data which, in sensing application, is basically a mixed
152
information from various unknown sources. Manifold learning, a newly developed nonlinear dimension
153
reduction approach, was proposed by Bregler in 1995. Its ability to learn the intrinsic essence and distribution
154
of the complex and high-dimensional nonlinear data makes manifold learning a new tool in data analysis.
155
Manifold learning techniques can be broadly categorized relative to global and local techniques. LLE (Roweis
156
& Saul, 2000) is methods that employ local manifold learning techniques. In this study, LLE was adopted to
157
reduce the dimension of feature signal for SVM model.
158
2.4.4. Distribution of data sets
159
As mentioned previously, the wavelet energy extracted by WT was selected as feature signal. Meanwhile,
160
distributions of data sets were the same in qualitative and quantitative prediction. That is, the data set of
161
feature signal was divided into two subsets: samples numbered from 1 to 16 in each group were selected as
162
training set which had a total of 112 samples, and the remaining with a total of 28 samples was considered as
163
the testing set.
164
For qualitative classification, discriminating efficiency of LDA was estimated by the percent of variance, an
165
index for discriminating power; performance of PNN model was measured by correct classification rate. The
166
higher percent of variance or higher correct rate, the more successful the classification is. For quantitative
167
prediction, predicting performances were estimated using parameters calculated from predicted and
AC C
EP
TE D
M AN U
SC
RI PT
146
ACCEPTED MANUSCRIPT experimental values: root mean square error (RMSE), square correlation coefficient (R2) and mean relative
169
error (MRE). The lower RMSE or MRE and the larger R2 indicate a better predicting model. Eqs. (3), (4) and
170
(5) represent the RMSE (Soltani & Omid, 2015), MRE (Zhang, Chang, Wang & Ye, 2008), and R2 formula
171
respectively.
RMSE =
173
MRE =
1 n
n
i =1
(t i − y i )
2
(3)
n ti − yi ti
n
∑ abs i =1
× 100
(4)
SC
172
∑
RI PT
168
(n ⋅ ∑ t ⋅ y − ∑ t ⋅ ∑ y ) = n ⋅ y − (∑ y ) ⋅ n ⋅ ∑ t − (∑ t ) ∑ 2
R2
2 i
i
i
2
i
2 i
i
2
(5)
M AN U
174
i
i
where n is the number of data in a given set, ti and yi are the measured and predicted values, respectively.
176
LDA was performed in SPSS (IBM SPSS Statistics 19); PNN, BPNN, Lib-SVM (Chang & Lin, 2011), ICA
177
and LLE were performed by MATLAB R2010b (MathWorks, USA).
178
3. Results and discussion
179
3.1. Electronic nose’s response to egg samples over storage
180
The 4th sample in the fresh group, two-week group, four-week group and six-week group were selected
181
randomly to depict the electronic nose’s typical responses. For those selected samples, responses (G/G0) of
182
each sensor during the detection period are depicted in Fig. 3. The response of S2 is the most significant. Two
183
possible reasons could be used to explain this phenomenon: firstly, as shown in Table 1, since S2 is very
184
sensitive and reacts on nitrogen oxides, the significant responses of S2 indicate a larger content of nitrogen
185
oxides inside egg; secondly, the significance might result from the stronger sensitivity of the sensor itself.
AC C
EP
TE D
175
9
ACCEPTED MANUSCRIPT Besides, as shown in Fig. 3, the response value of each sensor changed over storage, but in varying degrees.
187
The varying sensitivities of the sensors demonstrated that egg’s volatile profile changed over storage. So it is
188
possible to predict storage time and yolk index of egg by electronic nose.
189
3.2. Variation of yolk index
190
Fig. 4 shows the change of YI over storage. According to the grading standards mentioned by Lv and Li (1994),
191
eggs can be grouped into four grades based on YI: AA (YI ≥ 0.42), A (0.35 < YI ≤ 0.41), B (0.17 < YI ≤ 0.34), C
192
(YI ≤ 0.17). With this criterion, eggs displayed in Fig. 4 could be divided into two levels, namely level A (eggs
193
stored before and on the 7th day) and level B (eggs stored after the 7th day). An additional conclusion is that
194
the egg freshness declined over time, but with an inconstant declining rate.
195
3.3. Qualitative classification by LDA and PNN
196
3.3.1. Results of LDA
197
These 7 groups of samples were successfully classified by LDA, with a total variance of 82.50%. It can be
198
seen in Fig. 5 that 7 sets of eggs were basically distinguished from each other, except for the minor
199
overlapping between four-week and six-week group. According to Fig. 4, eggs underwent conspicuous
200
deterioration since the fourth week. It is possible that the volatile profiles of these spoiled eggs changed less
201
significantly on component or content. Therefore, data points from four-week group to six-week group are
202
relatively concentrated; also, there are several misjudgments.
203
3.3.2. Results of PNN
204
As shown in Fig. 6, with different value of smoothing parameter σ, the vertical axis represents the number of
205
precisely predicted samples in testing set which includes total 28 samples. It could be concluded that the best
206
predicting result comes along with an approximate value of 0.1.
AC C
EP
TE D
M AN U
SC
RI PT
186
ACCEPTED MANUSCRIPT With selected value of σ, a prediction model was built to establish relationship between feature signal and
208
storage time, and was later adopted to predict storage time of both training and testing sets. Results showed
209
that the correct rates of training set and testing set are 100% (112/112) and 92.86% (26/28), respectively.
210
Among the predicting results of the testing set, two samples, originally belonged to four-week and five-week
211
group, were wrongly predicted as samples of the six-week group. The results are basically consistent with
212
those of LDA, with concentrated data points from four-week group to six-week group.
213
3.4. Quantitative prediction of yolk index
214
3.4.1. Results of BPNN
215
Before modeling of BPNN, the number of neurons in hidden layer is determined by a series of tests and
216
revisions. During the tests, we found that 13 neurons were enough for preferable performance, more neurons
217
will just increase training time. Then a BPNN model with the structure of 10-13-1 was established. The ten
218
neurons of input layer represent feature signal (wavelet energy) of 10 sensors respectively and one neuron of
219
the output layer represents predicted value of YI. Other parameters are: the target error is 0.001; learning speed
220
is 0.01; the training iteration is 500.
221
BPNN model was trained based on the aforementioned parameters. Fig. 7a visualizes a rough linear
222
relationship between the predicted and observed values of yolk index. Evaluating indexes were calculated
223
from predicted and experimental values. As shown in Table 2, both MREs are smaller than 8%, which
224
indicates that yolk index could, in a certain degree, be predicted by BPNN model. However, given that the
225
numerical difference between both sets is not small enough, the generalization ability of this model is
226
unsatisfying.
227
3.4.2. Results of SVM
228
In this study, radial basis function was employed as the kernel function of SVM. To conduct the modeling
AC C
EP
TE D
M AN U
SC
RI PT
207
11
ACCEPTED MANUSCRIPT process, some parameters need to be set: searching ranges of C and V are both from 2-8 to 28, sharing the same
230
step length of 0.6; parameter of v-fold cross-validation is 5; step length of accuracy rate is 0.06. By grid search
231
method, the selected values of C and V are 0.25 and 6.9644.
232
With the optimal parameters, a SVM model was established by the training set and then predicted the yolk
233
index of both training and testing sets. The predicted values of yolk index versus observed values are depicted
234
in Fig.7b. Evaluating parameters are listed in Table 2. Compared with BPNN, performance of the training set
235
was improved relatively, which shows that SVM model is more suitable for predicting YI. The poor numerical
236
difference of these evaluating indexes between training and testing sets leads to a conclusion that optimization
237
is necessary to ameliorate the generalization ability.
238
3.4.3. Results of SVM based on dimension reduction by LLE algorithm
239
To obtain a better predicting result, two parameters of LLE algorithm are supposed to be debugged beforehand:
240
the number of neighborhood (K) per sample point, and dimension (M) of low dimensional manifold that is
241
embedded in a high dimensional data set. The value of K has great impact on the reducing process.
242
Specifically, an excessive K value will cause the loss of local information; whereas, if K is too small, the
243
original continuous manifolds will be split into disjointed sub-manifolds, that is, a ‘hollow phenomenon’ (Li &
244
Chen, 2007). To get better performance, different K values were tested and then the value of 11 was chosen to
245
perform LLE algorithm. With the selective value of K (11), performances under different dimensions (M) were
246
compared. Since the RMSEs of training and testing sets are both the smallest when the dimension of original
247
data is reduced to seven, the original 140 × 10 matrix was reduced to 140 × 7.
248
Afterwards, a SVM model was built based on the new matrix. As shown in Fig. 7c, there were closer
249
relationship between predicted values and observed ones than those in Fig. 7a and Fig. 7b. Also, comparing
250
the evaluating parameters in Table 2 with the anterior SVM model, the new model has improved its
AC C
EP
TE D
M AN U
SC
RI PT
229
ACCEPTED MANUSCRIPT performance on testing set and maintained the predicted effect on training set. In other words, generalization
252
ability of this new SVM model had increased relatively.
253
3.4.4. Results of SVM based on dimension reduction by ICA algorithm
254
The number of independent elements (a) by ICA is no greater than dimensions of observed signal (10 in this
255
research), so there are 9 different combinations of independent components and the value of a is from 1 to 9.
256
To compare effects of these combinations, prediction model of SVM was established by each combination,
257
and RMSE was employed to assess the results. Since seven independent elements showed best performance
258
on both training and testing sets, it was selected as the optimal number.
259
Fig. 7d depicts relationship between predicted and observed yolk index. Evaluating parameters of the newly
260
established SVM model are showed in Table 2. RMSE and MRE of both training set and testing set are all
261
smaller than those in the former SVM model without dimension reduction, reaching to the conclusion that
262
dimensional reduction by ICA can increase the data modeling efficiency to a certain degree.
263
3.4.5. Comparison of different prediction models
264
The evaluating parameters of BPNN model, SVM model and the other two models based on data of
265
dimension reduction are showed in Table 2. Fig. 7 visualizes the distributions between predicted values and
266
observed values of yolk index. Regarding the first two prediction models, it can be observed from Fig. 7a and
267
Fig. 7b that there is a closer relationship between the predicted and observed values for SVM mode. Besides,
268
the RMSE values of training and testing sets in SVM model are all smaller than those in BPNN model, which
269
indicates that SVM has a better performance on predicting egg freshness.
270
Besides that, efficiency of the other two data sets with dimensional reduction can be attested by SVM model.
271
The linear relationships in Fig. 7c and Fig. 7d are much more conspicuous than those in Fig. 7a and Fig. 7b,
272
indicating success in improving the SVM’s predicting ability by ICA and LLE. Since the RMSE and MRE in
AC C
EP
TE D
M AN U
SC
RI PT
251
13
ACCEPTED MANUSCRIPT ICA-SVM model are smallest and the R2 is the biggest, the performance of this new model is better than those
274
of original SVM model and LLE-SVM model; also, the evaluation parameters of testing set are acceptable.
275
Therefore, conclusion can be drawn that the ICA algorithm plays a significant role in improving SVM model’s
276
performance, and it has a minor advantage over LLE algorithm when handling electronic nose data of eggs.
277
4. Conclusion
278
In this study, the research on predicting storage time and yolk index was conducted based on electronic nose
279
system combined with data preprocessing approaches and different chemometric methods. The main
280
conclusions are as follows:
281
(1) The sensors of electronic nose exhibited varying sensitivities towards eggs under different storage time,
282
showing that the volatile profile of egg changed with the reduction of freshness. Therefore, it is possible to
283
predict storage time and yolk index of egg by adopting electronic nose.
284
(2) The classification results of LDA and PNN indicated that egg storage time could be well distinguished.
285
(3) The predicting effect of BPNN model testified its feasibility on predicting yolk index of egg, though the
286
performance of this model was not good enough.
287
(4) Finally, a SVM model with original feature signal (wavelet energy) and two new models with data of
288
dimension reduction by LLE and ICA were established. Using the original feature signal as input data, the
289
SVM model performed better on prediction than BPNN. Regarding these two dimension reduction algorithms,
290
they all improved the efficiency of original SVM model to a certain extent. Furthermore, compared with LLE,
291
the ICA algorithm had comparative advantages in strengthening the predicting ability of original SVM model.
292
Acknowledgments
AC C
EP
TE D
M AN U
SC
RI PT
273
ACCEPTED MANUSCRIPT The authors acknowledge the financial support of the National Key Technology R&D Program 2012BAD29B02-4.
294
References
295
Chang, C.-C. & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and
298 299 300 301 302 303 304
Di Natale, C., Martinelli, E., & D’Amico, A. (2002). Counteraction of environmental disturbances of electronic nose data by independent component analysis. Sensors and Actuators B: Chemical, 82, 158-165.
Dutta, R., Hines, E. L., Gardner, J. W., Udrea, D. D., & Boilot, P. (2003). Non-destructive egg freshness determination: an
SC
297
Technology, 2, 1-27.
electronic nose based approach. Measurement Science and Technology, 14, 190-198.
Funk, E. M. (1948). The relation of the yolk index determined in natural position to the yolk index, as determined after separating
M AN U
296
RI PT
293
the yolk from the albumen. Poultry Science, 27, 367.
Giunchi, A., Berardinelli, A., Ragni, L., Fabbri, A., & Silaghi, F. A. (2008). Non-destructive freshness assessment of shell eggs using FT-NIR spectroscopy. Journal of Food Engineering, 89, 142-148.
Jones, D. R., & Musgrove, M. T. (2005). Effects of extended storage on egg quality factors. Poultry Science, 84, 1774-1777.
306
Li, X. L., & Chen, D. S. (2007). Face Recognition Based on LLE + LDA. Computer Application, 27, 85-86.
307
Lin, H., Zhao, J. W., Sun, L., Chen, Q. S., & Zhou, F. (2011). Freshness measurement of eggs using near infrared (NIR)
308
TE D
305
spectroscopy and multivariate data analysis. Innovative Food Science and Emerging Technologies, 12, 182-186. Liu, M., Wang, M. J., Wang, J., & Li, D. (2013). Comparison of random forest, support vector machine and back propagation
310
neural network for electronic tongue data classification: application to the recognition of orange beverage and Chinese
311
vinegar. Sensors and Actuators B: Chemical, 177, 970-980.
313
Liu, Y. D., Ying, Y. B., Ouyang, A. G., & Li, Y. B. (2007). Measurement of internal quality in chicken eggs using visible
AC C
312
EP
309
transmittance spectroscopy technology. Food Control, 18, 18-22.
314
Lv, J. P., & Li, Y. J. (1994). A simple method for determining yolk index and Haugh unit. Meat Hygiene, 7, 13-14.
315
Moreno-Barón, L., Cartas, R., Merkoçi, A., Alegret, S., Del Valle, M., Leija, L., … Muñoz, R. (2006). Application of the wavelet
316
transform coupled with artificial neural networks for quantification purposes in a voltammetric electronic tongue. Sensors and
317
Actuators, B: Chemical, 113, 487-499.
318 319 320
Qiu, S., Wang, J., & Gao, L. (2015). Qualification and quantisation of processed strawberry juice based on electronic nose and tongue. LWT - Food Science and Technology, 60, 115-123. Roweis, S., Saul, K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323-2326. 15
ACCEPTED MANUSCRIPT 321 322 323 324
Soltani, M., & Omid, M. (2015). Detection of poultry egg freshness by dielectric spectroscopy and machine learning techniques. LWT - Food Science and Technology, 62, 1034-1042. Soltani, M., Omid, M. & Alimardani, R. (2015). Egg quality prediction using dielectric and visual properties based on artificial neural network. Food Anal. Methods, 8, 710-717. Specht, D. F. (1990). Probabilistic Neural Network. Neural Network, 3, 109-118.
326
Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.
327
Wang, J., Jiang, R. S. (2005). Eggshell crack detection by dynamic frequency analysis. European Food Research and Technology,
332 333 334 335 336 337 338 339 340 341 342 343 344 345
SC
Wu, Y. (2009). Application of support vector machine in coal and gas outburst area prediction. IEEE International Conference on
M AN U
331
Research International, 37(1): 45-50 .
Intelligent Computing and Intelligent Systems, 199-203.
Yin, Y., Yu, H., & Zhang, H. (2008). A feature extraction method based on wavelet packet analysis for discrimination of Chinese vinegars using a gas sensors array. Sensors and Actuators, B: Chemical, 134, 1005-1009. Yongwei, W., Wang, J., Zhou, B., & Lu, Q. (2009). Monitoring storage time and quality attribute of egg based on electronic nose. Analytica Chimica Acta, 650, 183-188.
TE D
330
Wang, J., Jiang, R. S., Yu, Y. (2004). Relationship between dynamic resonance frequency and egg physical properties. Food
Yu, W., Wan, D., Zhou, Y., & Yang, X. (2015). Research on electronic nose gas classification based on kernel PCA and online-SVM. Computer Application and Software, 32, 269-272. Zhang, H., Chang, M., Wang, J., Ye, S. (2008). Evaluation of peach quality indices using an electronic nose by MLR, QPST and
EP
329
221(1-2): 214-220.
BP network. Sensors and Actuators B: Chemical, 134, 332-338. Zhang, W., Pan, L., Tu, S., Zhan, G., & Tu, K. (2015). Non-destructive internal quality assessment of eggs using a synthesis of hyperspectral imaging and multivariate analysis. Journal of Food Engineering, 157, 41-48.
AC C
328
RI PT
325
Zhao, J., Lin, H., Chen, Q., Huang, X., Sun, Z., & Zhou, F. (2010). Identification of egg’s freshness using NIR and support vector data description. Journal of Food Engineering, 98, 408-414.
ACCEPTED MANUSCRIPT 346
Table 1. Electronic nose (PEN2), name and main performance of each sensor.
347
Table 2. Comparison among four chemometric methods based on performance of predicting yolk index.
348
Fig. 1. A series of approximation coefficients cAj and a series of detail coefficients cDj obtained from a
349
three-layer wavelet decomposition, where j represents the decomposition level. Fig. 2. The original responsive signal of sensor S2 of a certain sample was decomposed by the fifth-order
351
wavelet transform of the Daubechies’ family (db5) and three-scale decomposition, and then coefficients in
352
cA3 set were used to reconstruct signal: (a) original responsive signal (b) reconstructed signal by using
353
coefficients in cA3 set, and (c) numerical difference between original and reconstructed signal.
SC
RI PT
350
Fig. 3. Typical responses of ten sensors (from S1 to S10) to four egg samples were obtained by the electronic
355
nose: (a) the 4th sample in the fresh group, (b) the 4th sample in the two-week group, (c) the 4th sample in
356
the four-week group, and (d) the 4th sample in the six-week group.
357 358
M AN U
354
Fig. 4. Mean value and standard deviation of yolk index over storage, the mean values given are the average yolk index of 20 egg samples in each egg group.
Fig. 5. Two Dimensional scatter plot of egg groups over storage by using LDA scores.
360
Fig. 6. Predictive ability of PNN model when adopting different values of smoothing parameters, the
361
predictive ability was evaluated by number of accurate predicted samples. The more accurately predicted
362
samples indicate the better predictive ability.
363
TE D
359
Fig. 7. Predicted versus observed yolk index from four different models: (a) BPNN model, (b) SVM model, (c) LLE-SVM model, and (d) ICA-SVM model. The red circles stand for training data, and the blue triangles
365
stand for testing data. The black line is the line of equity (y=x).
367
AC C
366
EP
364
17
ACCEPTED MANUSCRIPT 368
Table 1. Electronic nose (PEN2), name and main performance of each sensor.
Name
Main performance
Reference
S1
W1C
Aromatic compounds
Toluene, 10 mg/kg
S2
W5S
Very sensitive, broad range sensitivity, react on nitrogen oxides, sensitive with negative signal
NO2, 1 mg/kg
S3
W3C
Ammonia, used as sensor for aromatic compounds
S4
W6S
Mainly hydrogen, selectively, (breath gases)
S5
W5C
Alkenes, aromatic compounds, less polar compounds
S6
W1S
Sensitive to methane (environment) ca. 10 mg kg-1. Broad range, similar to S8
S7
W1W
Reacts on sulfur compounds, H2S 0.1 mg kg-1. Otherwise sensitive to many Terrenes and sulfur organic
RI PT
Number
Benzene, 10 mg/kg H2, 100 mg/kg
SC
M AN U
compounds, which are important for smell, limonene, praline
Propane, 1 mg/kg CH3, 100 mg/kg H2S, 1 mg/kg
S8
W2S
Detects alcohol’s, partially aromatic compounds, broad range
CO, 100 mg/kg
S9
W2W
Aromatic compounds, sulfur organic compounds
H2S, 1 mg/kg
S10
W3S
Reacts on high concentrations > 100 mg kg-1, sometimes very selective (methane)
CH3, 10CH3, 100 mg/kg
EP AC C
370
TE D
369
ACCEPTED MANUSCRIPT 371
Table 2. Comparison among four chemometric methods based on performance of predicting yolk index.
Training set
Testing set
Chemometric method R2
MRE (%)
RMSE
R2
MRE (%)
BPNN
0.0227
0.8629
4.6881
0.0286
0.7863
7.2096
SVM
0.0123
0.9641
3.5687
0.0275
LLE-SVM
0.0122
0.9682
3.6396
0.0234
ICA-SVM
0.0112
0.9730
3.5638
0.0255
RI PT
RMSE
7.9649
0.8666
7.6106
0.9707
7.5648
SC
0.8339
Predicting performances of these four chemometric methods were estimated by the parameters: root mean square error (RMSE), square correlation
373
coefficient (R2) and mean relative error (MRE) between predicted and experimental values. The lower RMSE or MRE and the larger R2 indicate a
374
better predicting model.
M AN U
372
AC C
EP
TE D
375
19
M AN U
Fig. 1. A series of approximation coefficients cAj and a series of detail coefficients cDj obtained from a three-layer wavelet
EP
TE D
decomposition, where j represents the decomposition level.
AC C
376 377 378 379
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
RI PT
Fig. 2. The original responsive signal of sensor S2 of a certain sample was decomposed by the fifth-order wavelet transform of the Daubechies’ family (db5) and three-scale decomposition, and then coefficients in cA3 set were used to reconstruct signal: (a) original responsive signal (b) reconstructed signal by using coefficients in cA3 set, and (c) numerical difference between original
EP
TE D
M AN U
SC
and reconstructed signal.
AC C
380 381 382 383 384 385
21
Fig. 3. Typical responses of ten sensors (from S1 to S10) to four egg samples were obtained by the electronic nose: (a) the 4th
EP
sample in the six-week group.
TE D
sample in the fresh group, (b) the 4th sample in the two-week group, (c) the 4th sample in the four-week group, and (d) the 4th
AC C
386 387 388 389 390
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
Fig. 4. Mean value and standard deviation of yolk index over storage, the mean values given are the average yolk index of 20
EP
TE D
egg samples in each egg group.
AC C
391 392 393 394
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
23
EP
TE D
Fig. 5. Two Dimensional scatter plot of egg groups over storage by using LDA scores.
AC C
395 396 397 398
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
Fig. 6. Predictive ability of PNN model when adopting different values of smoothing parameters, the predictive ability was
EP
TE D
evaluated by number of accurate predicted samples. The more accurately predicted samples indicate the better predictive ability.
AC C
399 400 401 402
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
25
Fig. 7. Predicted versus observed yolk index from four different models: (a) BPNN model, (b) SVM model, (c) LLE-SVM model, and (d) ICA-SVM model. The circles stand for training data, and the triangles stand for testing data. The line is the line of equity
EP
TE D
(y=x).
AC C
403 404 405 406 407
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Highlights > Egg storage time and yolk index were evaluated using electronic nose system.
RI PT
> Wavelet energy was extracted as feature signal of sensors for data analysis. > LDA and PNN methods performed successful classification on egg storage time.
AC C
EP
TE D
M AN U
SC
> Yolk index was predicted by SVM model with dimension reduction methods.