Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods

Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods

Accepted Manuscript Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods Jiating Li, Susu Zhu, Shu...

7MB Sizes 0 Downloads 34 Views

Accepted Manuscript Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods Jiating Li, Susu Zhu, Shui Jiang, Jun Wang PII:

S0023-6438(17)30297-9

DOI:

10.1016/j.lwt.2017.04.070

Reference:

YFSTL 6208

To appear in:

LWT - Food Science and Technology

Received Date: 1 January 2017 Revised Date:

22 April 2017

Accepted Date: 22 April 2017

Please cite this article as: Li, J., Zhu, S., Jiang, S., Wang, J., Prediction of egg storage time and yolk index based on electronic nose combined with chemometric methods, LWT - Food Science and Technology (2017), doi: 10.1016/j.lwt.2017.04.070. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Prediction of egg storage time and yolk index based on electronic

2

nose combined with chemometric methods

3

Jiating Li, Susu Zhu, Shui Jiang, Jun Wang*

4

Department of Biosystems Engineering, Zhejiang University, 886 Yuhangtang Road, Hangzhou 310058, China

5

Abstract: Egg storage time and yolk index, two descriptors of egg freshness, were evaluated by an electronic nose combined with

6

chemometric methods. To obtain more useful information from collected data, the wavelet energy was extracted as feature signal

7

by the wavelet transform method for qualitative and quantitative analysis. For qualitative analysis, linear discriminant analysis

8

(LDA) was applied to evaluate the feature signals, and the result indicated that these feature signals had good classification

9

performance with the first two scores explaining 82.50% of total variance. Moreover, probabilistic neural network (PNN) was

10

performed to classify eggs with different storage times, and 92.86% of samples in testing set were classified correctly. For

11

quantitative analysis, back propagation neural networks (BPNN) and support vector machine (SVM) were applied to build

12

prediction models of yolk index, indicating that SVM models (R2 = 0.9641 in training set and R2 = 0.8339 in testing set) were

13

better than BPNN (R2 = 0.8629 in training set and R2 = 0.7863 in testing set). To further improve the performance of SVM models,

14

independent component analysis (ICA) and local linear embedding (LLE) were used to reduce dimension of feature data, and the

15

results showed that ICA-SVM model had satisfying prediction performance (R2 > 0.97).

16

Key words: Electronic nose; Storage time; Yolk index; Support vector machine; Dimension reduction

AC C

EP

TE D

M AN U

SC

RI PT

1

17

*

Corresponding author. E-mail address: [email protected] (J. Wang)

1

ACCEPTED MANUSCRIPT

1. Introduction

19

Eggs have always been one of the most important food in our daily life. So far, there is much research

20

conducted on detection of egg quality, concerning interior changes of eggs over storage. Typically, reduction

21

of egg quality can be explained as the result of an enhanced interaction between lysozyme and ovomucin as

22

pH increases during storage (Soltani & Omid, 2015). Although a method called ‘candling’ could offer to

23

examine eggs by checking internal characteristics of egg on top of bright light shining (Wang, Jiang, & Yu,

24

2004; Wang, & Jiang, 2005; Zhang, Pan, Tu, Zhan, & Tu, 2015), it would be so arduous that mistakes occur

25

easily. Therefore, new detecting techniques are in great need to detect egg quality more efficiently.

26

Recently, many studies have been carried out to develop techniques for nondestructive detection of egg quality.

27

Most of them focused on spectroscopic and optical methods. Near infrared reflection (NIR) spectroscopy is a

28

fast and accurate technique for nondestructive detection, and it has been conducted to measure egg freshness

29

by means of an FT-NIR spectrometer and a fiber optic probe (Giunchi, Berardinelli, Ragni, Fabbri, & Silaghi,

30

2008). Others combined the NIR technique with different data analysis methods like multivariate analysis

31

(Lin, Zhao, Sun, Chen, & Zhou, 2011) and support vector data description (Zhao et al., 2010), all of which

32

demonstrated the feasibility of detecting egg freshness by the NIR method. As for the optical method, Liu,

33

Ying, Ouyang and Li (2007) had investigated the potential of applying the ultraviolet and visible (UV/VIS 200

34

- 800 nm) transmittance method to inspect the internal quality of intact chicken egg, and reached the

35

conclusion that the nondestructive inspection of egg freshness by transmittance properties is feasible in the

36

range of 400 - 600 nm. Besides these major methods, dielectric properties of eggs have also been studied in

37

determining egg quality. For instance, Soltani, Omid and Alimardani (2015) developed an egg qualifying

38

system based on dielectric technology. In evaluation of the qualifying system, the mean absolute percent errors

AC C

EP

TE D

M AN U

SC

RI PT

18

ACCEPTED MANUSCRIPT obtained from testing sets were 5.41, 6.84, 8.79, and 4.24% for the Haugh unit, yolk index, yolk/albumen, and

40

yolk weight, respectively. Evaluation results showed the designed device which was fabricated based on

41

dielectric measurement and the machine vision technique could be confidently used in predicting egg quality

42

indices.

43

Undeniably, these approaches present potential solutions for the nondestructive detection of egg quality. Yet

44

there are two critical problems: first, eggshell may affect detection precision of these optical and spectroscopic

45

methods; second, applicability of dielectric properties in detecting egg quality remains to be improved. So it is

46

necessary to search for more efficient and economical ways to detect egg quality. Given that a change of egg

47

quality will give rise to changes in its volatile gas components, an electronic nose system may be a potential

48

alternative strategy for detecting egg quality by sensing its volatile profile. Actually, some researches have

49

already proved the possibility of detecting egg quality with an electronic nose. Dutta, Hines, Gardner, Udrea

50

and Boilot (2003) employed an array of four tin oxide sensors to predict egg freshness, and suggested that

51

eggs can be categorized into one of three states with up to 95% accuracy. Yongwei, Wang, Zhou and Lu

52

(2009) demonstrated the potential of monitoring internal quality of eggs during storage and established

53

prediction models for quality indices. These studies provide references for determining egg quality by

54

electronic nose.

55

Both Dutta et al. (2003) and Yongwei et al. (2009) focused mainly on the feasibility of detecting egg quality

56

by an electronic nose combined with certain frequently used data analysis methods. However, limited detailed

57

information is available on analyzing collected egg data by data preprocessing methods, as well as on tha

58

comparison among the adopted chemometric methods. Therefore, combined with data preprocessing

59

approaches and chemometric methods, this research aimed to study the feasibility of using electronic nose

60

system to predict storage time and yolk index, which are both simple but representative indicators of egg

AC C

EP

TE D

M AN U

SC

RI PT

39

3

ACCEPTED MANUSCRIPT 61

quality.

62

2. Materials and methods

64

2.1. Sample preparation

65

All 160 eggs, bought at local supermarkets, were freshly laid and collected in Hongxing village, China. Once

66

arrived in the laboratory, these eggs were cleaned and then stored in a chamber with condition of 20 °C and

67

relative humidity of 70%. 20 eggs were used as spare samples in case of any broken ones, the other 140 eggs

68

were divided into seven groups and each group contained 20 eggs that were numbered from 1 to 20. Each new

69

group of eggs was analyzed weekly, the data-collecting experiment lasted for 6 weeks.

70

2.2. Electronic nose system and sample procedure

71

In this study, an Electronic nose (PEN2, Airsense Company, German) equipped with an array of metal oxide

72

semiconductor (MOS) sensors was adopted to detect sample gas. The name and performance of each sensor

73

are showed in Table 1. Sample gas is inhaled into the sensor channel from the air inlet by a built-in pump, then

74

flows through the sensor array at a certain rate and finally is out from the outlet. The reference gas is the clean

75

air filtered by activated carbon, and is inhaled at a certain rate by another pump, flows through and cleans the

76

sensor array to make the responsive signal return to zero. Meanwhile, the reference gas also helps to prevent

77

remnant gas from impacting the next process by cleaning the sensor array. The responsive signal is the ratio

78

between the conductivity G when sensors get in touch with the sample gas and the conductivity G0 when

79

reference gas flows through the sensors (G/G0).

80

Static head space sampling system was adopted for sensing volatile profile out of egg shell. Determined by a

AC C

EP

TE D

M AN U

SC

RI PT

63

ACCEPTED MANUSCRIPT preliminary experiment, the mass of each egg was 60 ± 3 g and the most suitable sealing time was one hour.

82

The first step was to place each egg in a 500 mL beaker which was then sealed by preservative film for an

83

hour and maintained at room temperature (25 - 27 oC). Then, the inlet tube was inserted into the beaker by

84

using a syringe needle and the gas transmitted into the electronic nose. The electronic nose sampled and

85

recorded data at the frequency of 1 Hz. Each sample was detected for 70 s. Finally, the detected eggs were

86

broken to take out the complete yolk, and to measure the yolk index.

87

2.3. Measurement of yolk index

88

During storage, one significant change of egg is the decrease of vitelline membrane elasticity, allowing easier

89

migration of water from the albumin through the weaker vitelline membrane (Jones & Musgrove, 2005). The

90

result of this process is yolk flattening, which can be indicated by yolk index (YI). The procedure is: gently

91

break an egg and pull apart the shell; then pour the egg liquid onto a big clean watch glass; finally measure the

92

thickness and the diameter of yolk by using a vernier caliper. The YI was defined as follows (Funk, 1948):

93

YI =

94

where h denotes the thickness of yolk, and d denotes the diameter of yolk.

95

According to Funk (1948), YI indicates the viscosity of yolk, and the higher YI is, the better egg quality is. In

96

this research, YI for each egg group was determined by averaging the value of 20 egg samples each time.

97

2.4. Data processing

98

2.4.1. Feature extraction

99

Generally, the maximum value or mean value is used as feature signal in analysis of electronic nose data. Yet

100

the response originated in electronic nose is non-static, these static features (maximum value and mean value)

101

are likely to be exclusive of some significant characteristics of original response. Therefore, to acquire more

TE D

M AN U

SC

RI PT

81

(1)

AC C

EP

h × 100% d

5

ACCEPTED MANUSCRIPT representative information, wavelet energy, a dynamic feature, was extracted by wavelet transform (WT)

103

method and used as feature signal in this study.

104

WT was developed for the analysis of non-static signals. By WT, a family of functions called wavelets could

105

be generated by translating and dilating a single base function called mother wavelet (Moreno-Barón et al.,

106

2006). That is, the original responsive signal could be decomposed into its component elements with an

107

applicable mother wavelet. These elements contain a series of cAj set and a series of cDj set where j represents

108

the decomposition level. The cAj set and cDj set retain the low-frequency and high-frequency content of the

109

signal respectively, as shown in Fig. 1. Among these sets, coefficients in cA3 set encompasses a large

110

proportion of energy, which means it accounts for the majority of original information. Therefore, the feature

111

signal, wavelet energy, was calculated by all coefficients in cA3. The computational formula (Yin, Yu, &

112

Zhang, 2008) is as follows:

113

E=

∑ (a k =1

3k

)

2

(2)

TE D

n

M AN U

SC

RI PT

102

where E is wavelet energy value of each sensor, corresponding to the third frequency band; n is the number of

115

coefficients in cA3 set; a3k is the k-th coefficient in cA3 set.

116

In this study, the fifth-order wavelet transform of the Daubechies’ family (db5) and three-scale decomposition

117

were adopted to decompose the original signal. To illustrate, the responsive signal of S2 of a certain sample, as

118

shown in Fig. 2a, was decomposed by the fifth-order wavelet. A series of coefficients sets could be obtained

119

and used to reconstruct new corresponding signal. In this research, new signal was reconstructed from

120

coefficients in cA3 set, as shown in Fig. 2b. Fig. 2c depicts the numerical difference between original and

121

reconstructed signal, with a crest value of 5.56 × 10-13. This crest value is small enough to prove that the

122

original signal was well represented by the reconstructed signal. In other words, coefficients in cA3 set are

123

feasible to represent the major original information.

AC C

EP

114

ACCEPTED MANUSCRIPT 2.4.2 Qualitative classification analysis

125

Qualitative classification for egg storage time was performed by linear discriminant analysis (LDA) and

126

probabilistic neural network (PNN). LDA explicitly models the difference between the classes of data, and

127

tries to maximize the variance between categories and minimize the variance within categories. It provides a

128

classification model, characterized by a linear dependence of the classification scores with respect to the

129

descriptors, and the eigenvalues of LDA were determined to get more information on the relation of the factors

130

in the model analyses (Qiu, Wang, & Gao, 2015). PNN, introduced by Specht (1990) in the early 1990s, is a

131

feed-forward neural network, which is derived from the Bayesian network and a statistical algorithm called

132

Kernel Fisher Discriminant Analysis. The performance of PNN is decided by several factors including

133

smoothing parameters and the number of hidden layers. By PNN, the operations are organized into a

134

multilayered feed-forward network with four layers: input layer, pattern layer, summation layer, and

135

decision-making layer.

136

2.4.3. Quantitative prediction analysis

137

Quantitative calibration with respect to yolk index was performed using back propagation neural networks

138

(BPNN) and support vector machine (SVM), as well as optimization of SVM model by dimension reduction

139

algorithms (independent component analysis and local linear embedding).

140

During the building of BPNN, the connection weights are amended according to the gradient descent,

141

diminishing the global error, which is also a state of network convergence. A typical BPNN consists of three

142

layers: input layer, hidden layer and output layer. By BPNN, input information firstly moves forward towards

143

the nodes in hidden layer to be processed by a certain function; then, the processed signal spreads to the output

144

layer as final result. SVM is based on the principle of structural minimization in the Statistical Learning

145

Theory and was firstly put forward by Vapnik (1998) and his partners. SVM is powerful in handling the

AC C

EP

TE D

M AN U

SC

RI PT

124

7

ACCEPTED MANUSCRIPT problem with small samples, non linear and high-dimensional data sets (Wu, 2009), just as the electronic nose

147

data in this study (Yu, Wan, Zhou, & Yang, 2015). To obtain a good performance, the penalty parameter C and

148

kernel parameter V in SVM model should be optimized (Liu, Wang, Wang, & Li, 2013).

149

ICA is a highly efficient blind signal separation method. The basic restriction is that the independent

150

components must be non-Gaussian in nature (Di Natale, Martinelli, & D’Amico, 2002). ICA can be used to

151

extract independent components from the observed data which, in sensing application, is basically a mixed

152

information from various unknown sources. Manifold learning, a newly developed nonlinear dimension

153

reduction approach, was proposed by Bregler in 1995. Its ability to learn the intrinsic essence and distribution

154

of the complex and high-dimensional nonlinear data makes manifold learning a new tool in data analysis.

155

Manifold learning techniques can be broadly categorized relative to global and local techniques. LLE (Roweis

156

& Saul, 2000) is methods that employ local manifold learning techniques. In this study, LLE was adopted to

157

reduce the dimension of feature signal for SVM model.

158

2.4.4. Distribution of data sets

159

As mentioned previously, the wavelet energy extracted by WT was selected as feature signal. Meanwhile,

160

distributions of data sets were the same in qualitative and quantitative prediction. That is, the data set of

161

feature signal was divided into two subsets: samples numbered from 1 to 16 in each group were selected as

162

training set which had a total of 112 samples, and the remaining with a total of 28 samples was considered as

163

the testing set.

164

For qualitative classification, discriminating efficiency of LDA was estimated by the percent of variance, an

165

index for discriminating power; performance of PNN model was measured by correct classification rate. The

166

higher percent of variance or higher correct rate, the more successful the classification is. For quantitative

167

prediction, predicting performances were estimated using parameters calculated from predicted and

AC C

EP

TE D

M AN U

SC

RI PT

146

ACCEPTED MANUSCRIPT experimental values: root mean square error (RMSE), square correlation coefficient (R2) and mean relative

169

error (MRE). The lower RMSE or MRE and the larger R2 indicate a better predicting model. Eqs. (3), (4) and

170

(5) represent the RMSE (Soltani & Omid, 2015), MRE (Zhang, Chang, Wang & Ye, 2008), and R2 formula

171

respectively.

RMSE =

173

MRE =

1 n

n

i =1

(t i − y i )

2

(3)

n  ti − yi  ti

n

∑ abs i =1

  × 100  

(4)

SC

172



RI PT

168

(n ⋅ ∑ t ⋅ y − ∑ t ⋅ ∑ y ) = n ⋅ y − (∑ y )  ⋅ n ⋅ ∑ t − (∑ t )   ∑    2

R2

2 i

i

i

2

i

2 i

i

2

(5)

M AN U

174

i

i

where n is the number of data in a given set, ti and yi are the measured and predicted values, respectively.

176

LDA was performed in SPSS (IBM SPSS Statistics 19); PNN, BPNN, Lib-SVM (Chang & Lin, 2011), ICA

177

and LLE were performed by MATLAB R2010b (MathWorks, USA).

178

3. Results and discussion

179

3.1. Electronic nose’s response to egg samples over storage

180

The 4th sample in the fresh group, two-week group, four-week group and six-week group were selected

181

randomly to depict the electronic nose’s typical responses. For those selected samples, responses (G/G0) of

182

each sensor during the detection period are depicted in Fig. 3. The response of S2 is the most significant. Two

183

possible reasons could be used to explain this phenomenon: firstly, as shown in Table 1, since S2 is very

184

sensitive and reacts on nitrogen oxides, the significant responses of S2 indicate a larger content of nitrogen

185

oxides inside egg; secondly, the significance might result from the stronger sensitivity of the sensor itself.

AC C

EP

TE D

175

9

ACCEPTED MANUSCRIPT Besides, as shown in Fig. 3, the response value of each sensor changed over storage, but in varying degrees.

187

The varying sensitivities of the sensors demonstrated that egg’s volatile profile changed over storage. So it is

188

possible to predict storage time and yolk index of egg by electronic nose.

189

3.2. Variation of yolk index

190

Fig. 4 shows the change of YI over storage. According to the grading standards mentioned by Lv and Li (1994),

191

eggs can be grouped into four grades based on YI: AA (YI ≥ 0.42), A (0.35 < YI ≤ 0.41), B (0.17 < YI ≤ 0.34), C

192

(YI ≤ 0.17). With this criterion, eggs displayed in Fig. 4 could be divided into two levels, namely level A (eggs

193

stored before and on the 7th day) and level B (eggs stored after the 7th day). An additional conclusion is that

194

the egg freshness declined over time, but with an inconstant declining rate.

195

3.3. Qualitative classification by LDA and PNN

196

3.3.1. Results of LDA

197

These 7 groups of samples were successfully classified by LDA, with a total variance of 82.50%. It can be

198

seen in Fig. 5 that 7 sets of eggs were basically distinguished from each other, except for the minor

199

overlapping between four-week and six-week group. According to Fig. 4, eggs underwent conspicuous

200

deterioration since the fourth week. It is possible that the volatile profiles of these spoiled eggs changed less

201

significantly on component or content. Therefore, data points from four-week group to six-week group are

202

relatively concentrated; also, there are several misjudgments.

203

3.3.2. Results of PNN

204

As shown in Fig. 6, with different value of smoothing parameter σ, the vertical axis represents the number of

205

precisely predicted samples in testing set which includes total 28 samples. It could be concluded that the best

206

predicting result comes along with an approximate value of 0.1.

AC C

EP

TE D

M AN U

SC

RI PT

186

ACCEPTED MANUSCRIPT With selected value of σ, a prediction model was built to establish relationship between feature signal and

208

storage time, and was later adopted to predict storage time of both training and testing sets. Results showed

209

that the correct rates of training set and testing set are 100% (112/112) and 92.86% (26/28), respectively.

210

Among the predicting results of the testing set, two samples, originally belonged to four-week and five-week

211

group, were wrongly predicted as samples of the six-week group. The results are basically consistent with

212

those of LDA, with concentrated data points from four-week group to six-week group.

213

3.4. Quantitative prediction of yolk index

214

3.4.1. Results of BPNN

215

Before modeling of BPNN, the number of neurons in hidden layer is determined by a series of tests and

216

revisions. During the tests, we found that 13 neurons were enough for preferable performance, more neurons

217

will just increase training time. Then a BPNN model with the structure of 10-13-1 was established. The ten

218

neurons of input layer represent feature signal (wavelet energy) of 10 sensors respectively and one neuron of

219

the output layer represents predicted value of YI. Other parameters are: the target error is 0.001; learning speed

220

is 0.01; the training iteration is 500.

221

BPNN model was trained based on the aforementioned parameters. Fig. 7a visualizes a rough linear

222

relationship between the predicted and observed values of yolk index. Evaluating indexes were calculated

223

from predicted and experimental values. As shown in Table 2, both MREs are smaller than 8%, which

224

indicates that yolk index could, in a certain degree, be predicted by BPNN model. However, given that the

225

numerical difference between both sets is not small enough, the generalization ability of this model is

226

unsatisfying.

227

3.4.2. Results of SVM

228

In this study, radial basis function was employed as the kernel function of SVM. To conduct the modeling

AC C

EP

TE D

M AN U

SC

RI PT

207

11

ACCEPTED MANUSCRIPT process, some parameters need to be set: searching ranges of C and V are both from 2-8 to 28, sharing the same

230

step length of 0.6; parameter of v-fold cross-validation is 5; step length of accuracy rate is 0.06. By grid search

231

method, the selected values of C and V are 0.25 and 6.9644.

232

With the optimal parameters, a SVM model was established by the training set and then predicted the yolk

233

index of both training and testing sets. The predicted values of yolk index versus observed values are depicted

234

in Fig.7b. Evaluating parameters are listed in Table 2. Compared with BPNN, performance of the training set

235

was improved relatively, which shows that SVM model is more suitable for predicting YI. The poor numerical

236

difference of these evaluating indexes between training and testing sets leads to a conclusion that optimization

237

is necessary to ameliorate the generalization ability.

238

3.4.3. Results of SVM based on dimension reduction by LLE algorithm

239

To obtain a better predicting result, two parameters of LLE algorithm are supposed to be debugged beforehand:

240

the number of neighborhood (K) per sample point, and dimension (M) of low dimensional manifold that is

241

embedded in a high dimensional data set. The value of K has great impact on the reducing process.

242

Specifically, an excessive K value will cause the loss of local information; whereas, if K is too small, the

243

original continuous manifolds will be split into disjointed sub-manifolds, that is, a ‘hollow phenomenon’ (Li &

244

Chen, 2007). To get better performance, different K values were tested and then the value of 11 was chosen to

245

perform LLE algorithm. With the selective value of K (11), performances under different dimensions (M) were

246

compared. Since the RMSEs of training and testing sets are both the smallest when the dimension of original

247

data is reduced to seven, the original 140 × 10 matrix was reduced to 140 × 7.

248

Afterwards, a SVM model was built based on the new matrix. As shown in Fig. 7c, there were closer

249

relationship between predicted values and observed ones than those in Fig. 7a and Fig. 7b. Also, comparing

250

the evaluating parameters in Table 2 with the anterior SVM model, the new model has improved its

AC C

EP

TE D

M AN U

SC

RI PT

229

ACCEPTED MANUSCRIPT performance on testing set and maintained the predicted effect on training set. In other words, generalization

252

ability of this new SVM model had increased relatively.

253

3.4.4. Results of SVM based on dimension reduction by ICA algorithm

254

The number of independent elements (a) by ICA is no greater than dimensions of observed signal (10 in this

255

research), so there are 9 different combinations of independent components and the value of a is from 1 to 9.

256

To compare effects of these combinations, prediction model of SVM was established by each combination,

257

and RMSE was employed to assess the results. Since seven independent elements showed best performance

258

on both training and testing sets, it was selected as the optimal number.

259

Fig. 7d depicts relationship between predicted and observed yolk index. Evaluating parameters of the newly

260

established SVM model are showed in Table 2. RMSE and MRE of both training set and testing set are all

261

smaller than those in the former SVM model without dimension reduction, reaching to the conclusion that

262

dimensional reduction by ICA can increase the data modeling efficiency to a certain degree.

263

3.4.5. Comparison of different prediction models

264

The evaluating parameters of BPNN model, SVM model and the other two models based on data of

265

dimension reduction are showed in Table 2. Fig. 7 visualizes the distributions between predicted values and

266

observed values of yolk index. Regarding the first two prediction models, it can be observed from Fig. 7a and

267

Fig. 7b that there is a closer relationship between the predicted and observed values for SVM mode. Besides,

268

the RMSE values of training and testing sets in SVM model are all smaller than those in BPNN model, which

269

indicates that SVM has a better performance on predicting egg freshness.

270

Besides that, efficiency of the other two data sets with dimensional reduction can be attested by SVM model.

271

The linear relationships in Fig. 7c and Fig. 7d are much more conspicuous than those in Fig. 7a and Fig. 7b,

272

indicating success in improving the SVM’s predicting ability by ICA and LLE. Since the RMSE and MRE in

AC C

EP

TE D

M AN U

SC

RI PT

251

13

ACCEPTED MANUSCRIPT ICA-SVM model are smallest and the R2 is the biggest, the performance of this new model is better than those

274

of original SVM model and LLE-SVM model; also, the evaluation parameters of testing set are acceptable.

275

Therefore, conclusion can be drawn that the ICA algorithm plays a significant role in improving SVM model’s

276

performance, and it has a minor advantage over LLE algorithm when handling electronic nose data of eggs.

277

4. Conclusion

278

In this study, the research on predicting storage time and yolk index was conducted based on electronic nose

279

system combined with data preprocessing approaches and different chemometric methods. The main

280

conclusions are as follows:

281

(1) The sensors of electronic nose exhibited varying sensitivities towards eggs under different storage time,

282

showing that the volatile profile of egg changed with the reduction of freshness. Therefore, it is possible to

283

predict storage time and yolk index of egg by adopting electronic nose.

284

(2) The classification results of LDA and PNN indicated that egg storage time could be well distinguished.

285

(3) The predicting effect of BPNN model testified its feasibility on predicting yolk index of egg, though the

286

performance of this model was not good enough.

287

(4) Finally, a SVM model with original feature signal (wavelet energy) and two new models with data of

288

dimension reduction by LLE and ICA were established. Using the original feature signal as input data, the

289

SVM model performed better on prediction than BPNN. Regarding these two dimension reduction algorithms,

290

they all improved the efficiency of original SVM model to a certain extent. Furthermore, compared with LLE,

291

the ICA algorithm had comparative advantages in strengthening the predicting ability of original SVM model.

292

Acknowledgments

AC C

EP

TE D

M AN U

SC

RI PT

273

ACCEPTED MANUSCRIPT The authors acknowledge the financial support of the National Key Technology R&D Program 2012BAD29B02-4.

294

References

295

Chang, C.-C. & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and

298 299 300 301 302 303 304

Di Natale, C., Martinelli, E., & D’Amico, A. (2002). Counteraction of environmental disturbances of electronic nose data by independent component analysis. Sensors and Actuators B: Chemical, 82, 158-165.

Dutta, R., Hines, E. L., Gardner, J. W., Udrea, D. D., & Boilot, P. (2003). Non-destructive egg freshness determination: an

SC

297

Technology, 2, 1-27.

electronic nose based approach. Measurement Science and Technology, 14, 190-198.

Funk, E. M. (1948). The relation of the yolk index determined in natural position to the yolk index, as determined after separating

M AN U

296

RI PT

293

the yolk from the albumen. Poultry Science, 27, 367.

Giunchi, A., Berardinelli, A., Ragni, L., Fabbri, A., & Silaghi, F. A. (2008). Non-destructive freshness assessment of shell eggs using FT-NIR spectroscopy. Journal of Food Engineering, 89, 142-148.

Jones, D. R., & Musgrove, M. T. (2005). Effects of extended storage on egg quality factors. Poultry Science, 84, 1774-1777.

306

Li, X. L., & Chen, D. S. (2007). Face Recognition Based on LLE + LDA. Computer Application, 27, 85-86.

307

Lin, H., Zhao, J. W., Sun, L., Chen, Q. S., & Zhou, F. (2011). Freshness measurement of eggs using near infrared (NIR)

308

TE D

305

spectroscopy and multivariate data analysis. Innovative Food Science and Emerging Technologies, 12, 182-186. Liu, M., Wang, M. J., Wang, J., & Li, D. (2013). Comparison of random forest, support vector machine and back propagation

310

neural network for electronic tongue data classification: application to the recognition of orange beverage and Chinese

311

vinegar. Sensors and Actuators B: Chemical, 177, 970-980.

313

Liu, Y. D., Ying, Y. B., Ouyang, A. G., & Li, Y. B. (2007). Measurement of internal quality in chicken eggs using visible

AC C

312

EP

309

transmittance spectroscopy technology. Food Control, 18, 18-22.

314

Lv, J. P., & Li, Y. J. (1994). A simple method for determining yolk index and Haugh unit. Meat Hygiene, 7, 13-14.

315

Moreno-Barón, L., Cartas, R., Merkoçi, A., Alegret, S., Del Valle, M., Leija, L., … Muñoz, R. (2006). Application of the wavelet

316

transform coupled with artificial neural networks for quantification purposes in a voltammetric electronic tongue. Sensors and

317

Actuators, B: Chemical, 113, 487-499.

318 319 320

Qiu, S., Wang, J., & Gao, L. (2015). Qualification and quantisation of processed strawberry juice based on electronic nose and tongue. LWT - Food Science and Technology, 60, 115-123. Roweis, S., Saul, K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323-2326. 15

ACCEPTED MANUSCRIPT 321 322 323 324

Soltani, M., & Omid, M. (2015). Detection of poultry egg freshness by dielectric spectroscopy and machine learning techniques. LWT - Food Science and Technology, 62, 1034-1042. Soltani, M., Omid, M. & Alimardani, R. (2015). Egg quality prediction using dielectric and visual properties based on artificial neural network. Food Anal. Methods, 8, 710-717. Specht, D. F. (1990). Probabilistic Neural Network. Neural Network, 3, 109-118.

326

Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.

327

Wang, J., Jiang, R. S. (2005). Eggshell crack detection by dynamic frequency analysis. European Food Research and Technology,

332 333 334 335 336 337 338 339 340 341 342 343 344 345

SC

Wu, Y. (2009). Application of support vector machine in coal and gas outburst area prediction. IEEE International Conference on

M AN U

331

Research International, 37(1): 45-50 .

Intelligent Computing and Intelligent Systems, 199-203.

Yin, Y., Yu, H., & Zhang, H. (2008). A feature extraction method based on wavelet packet analysis for discrimination of Chinese vinegars using a gas sensors array. Sensors and Actuators, B: Chemical, 134, 1005-1009. Yongwei, W., Wang, J., Zhou, B., & Lu, Q. (2009). Monitoring storage time and quality attribute of egg based on electronic nose. Analytica Chimica Acta, 650, 183-188.

TE D

330

Wang, J., Jiang, R. S., Yu, Y. (2004). Relationship between dynamic resonance frequency and egg physical properties. Food

Yu, W., Wan, D., Zhou, Y., & Yang, X. (2015). Research on electronic nose gas classification based on kernel PCA and online-SVM. Computer Application and Software, 32, 269-272. Zhang, H., Chang, M., Wang, J., Ye, S. (2008). Evaluation of peach quality indices using an electronic nose by MLR, QPST and

EP

329

221(1-2): 214-220.

BP network. Sensors and Actuators B: Chemical, 134, 332-338. Zhang, W., Pan, L., Tu, S., Zhan, G., & Tu, K. (2015). Non-destructive internal quality assessment of eggs using a synthesis of hyperspectral imaging and multivariate analysis. Journal of Food Engineering, 157, 41-48.

AC C

328

RI PT

325

Zhao, J., Lin, H., Chen, Q., Huang, X., Sun, Z., & Zhou, F. (2010). Identification of egg’s freshness using NIR and support vector data description. Journal of Food Engineering, 98, 408-414.

ACCEPTED MANUSCRIPT 346

Table 1. Electronic nose (PEN2), name and main performance of each sensor.

347

Table 2. Comparison among four chemometric methods based on performance of predicting yolk index.

348

Fig. 1. A series of approximation coefficients cAj and a series of detail coefficients cDj obtained from a

349

three-layer wavelet decomposition, where j represents the decomposition level. Fig. 2. The original responsive signal of sensor S2 of a certain sample was decomposed by the fifth-order

351

wavelet transform of the Daubechies’ family (db5) and three-scale decomposition, and then coefficients in

352

cA3 set were used to reconstruct signal: (a) original responsive signal (b) reconstructed signal by using

353

coefficients in cA3 set, and (c) numerical difference between original and reconstructed signal.

SC

RI PT

350

Fig. 3. Typical responses of ten sensors (from S1 to S10) to four egg samples were obtained by the electronic

355

nose: (a) the 4th sample in the fresh group, (b) the 4th sample in the two-week group, (c) the 4th sample in

356

the four-week group, and (d) the 4th sample in the six-week group.

357 358

M AN U

354

Fig. 4. Mean value and standard deviation of yolk index over storage, the mean values given are the average yolk index of 20 egg samples in each egg group.

Fig. 5. Two Dimensional scatter plot of egg groups over storage by using LDA scores.

360

Fig. 6. Predictive ability of PNN model when adopting different values of smoothing parameters, the

361

predictive ability was evaluated by number of accurate predicted samples. The more accurately predicted

362

samples indicate the better predictive ability.

363

TE D

359

Fig. 7. Predicted versus observed yolk index from four different models: (a) BPNN model, (b) SVM model, (c) LLE-SVM model, and (d) ICA-SVM model. The red circles stand for training data, and the blue triangles

365

stand for testing data. The black line is the line of equity (y=x).

367

AC C

366

EP

364

17

ACCEPTED MANUSCRIPT 368

Table 1. Electronic nose (PEN2), name and main performance of each sensor.

Name

Main performance

Reference

S1

W1C

Aromatic compounds

Toluene, 10 mg/kg

S2

W5S

Very sensitive, broad range sensitivity, react on nitrogen oxides, sensitive with negative signal

NO2, 1 mg/kg

S3

W3C

Ammonia, used as sensor for aromatic compounds

S4

W6S

Mainly hydrogen, selectively, (breath gases)

S5

W5C

Alkenes, aromatic compounds, less polar compounds

S6

W1S

Sensitive to methane (environment) ca. 10 mg kg-1. Broad range, similar to S8

S7

W1W

Reacts on sulfur compounds, H2S 0.1 mg kg-1. Otherwise sensitive to many Terrenes and sulfur organic

RI PT

Number

Benzene, 10 mg/kg H2, 100 mg/kg

SC

M AN U

compounds, which are important for smell, limonene, praline

Propane, 1 mg/kg CH3, 100 mg/kg H2S, 1 mg/kg

S8

W2S

Detects alcohol’s, partially aromatic compounds, broad range

CO, 100 mg/kg

S9

W2W

Aromatic compounds, sulfur organic compounds

H2S, 1 mg/kg

S10

W3S

Reacts on high concentrations > 100 mg kg-1, sometimes very selective (methane)

CH3, 10CH3, 100 mg/kg

EP AC C

370

TE D

369

ACCEPTED MANUSCRIPT 371

Table 2. Comparison among four chemometric methods based on performance of predicting yolk index.

Training set

Testing set

Chemometric method R2

MRE (%)

RMSE

R2

MRE (%)

BPNN

0.0227

0.8629

4.6881

0.0286

0.7863

7.2096

SVM

0.0123

0.9641

3.5687

0.0275

LLE-SVM

0.0122

0.9682

3.6396

0.0234

ICA-SVM

0.0112

0.9730

3.5638

0.0255

RI PT

RMSE

7.9649

0.8666

7.6106

0.9707

7.5648

SC

0.8339

Predicting performances of these four chemometric methods were estimated by the parameters: root mean square error (RMSE), square correlation

373

coefficient (R2) and mean relative error (MRE) between predicted and experimental values. The lower RMSE or MRE and the larger R2 indicate a

374

better predicting model.

M AN U

372

AC C

EP

TE D

375

19

M AN U

Fig. 1. A series of approximation coefficients cAj and a series of detail coefficients cDj obtained from a three-layer wavelet

EP

TE D

decomposition, where j represents the decomposition level.

AC C

376 377 378 379

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

RI PT

Fig. 2. The original responsive signal of sensor S2 of a certain sample was decomposed by the fifth-order wavelet transform of the Daubechies’ family (db5) and three-scale decomposition, and then coefficients in cA3 set were used to reconstruct signal: (a) original responsive signal (b) reconstructed signal by using coefficients in cA3 set, and (c) numerical difference between original

EP

TE D

M AN U

SC

and reconstructed signal.

AC C

380 381 382 383 384 385

21

Fig. 3. Typical responses of ten sensors (from S1 to S10) to four egg samples were obtained by the electronic nose: (a) the 4th

EP

sample in the six-week group.

TE D

sample in the fresh group, (b) the 4th sample in the two-week group, (c) the 4th sample in the four-week group, and (d) the 4th

AC C

386 387 388 389 390

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Fig. 4. Mean value and standard deviation of yolk index over storage, the mean values given are the average yolk index of 20

EP

TE D

egg samples in each egg group.

AC C

391 392 393 394

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

23

EP

TE D

Fig. 5. Two Dimensional scatter plot of egg groups over storage by using LDA scores.

AC C

395 396 397 398

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Fig. 6. Predictive ability of PNN model when adopting different values of smoothing parameters, the predictive ability was

EP

TE D

evaluated by number of accurate predicted samples. The more accurately predicted samples indicate the better predictive ability.

AC C

399 400 401 402

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

25

Fig. 7. Predicted versus observed yolk index from four different models: (a) BPNN model, (b) SVM model, (c) LLE-SVM model, and (d) ICA-SVM model. The circles stand for training data, and the triangles stand for testing data. The line is the line of equity

EP

TE D

(y=x).

AC C

403 404 405 406 407

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Highlights > Egg storage time and yolk index were evaluated using electronic nose system.

RI PT

> Wavelet energy was extracted as feature signal of sensors for data analysis. > LDA and PNN methods performed successful classification on egg storage time.

AC C

EP

TE D

M AN U

SC

> Yolk index was predicted by SVM model with dimension reduction methods.