Sensors and Actuators B 134 (2008) 1005–1009
Contents lists available at ScienceDirect
Sensors and Actuators B: Chemical journal homepage: www.elsevier.com/locate/snb
A feature extraction method based on wavelet packet analysis for discrimination of Chinese vinegars using a gas sensors array Yong Yin ∗ , Huichun Yu, Hongshun Zhang School of Food & Bioengineering, Henan University of Science & Technology, No. 48 Xiyuan Road, Luoyang, Henan 471003, China
a r t i c l e
i n f o
Article history: Received 9 April 2008 Received in revised form 6 July 2008 Accepted 7 July 2008 Available online 31 July 2008 Keywords: Gas sensor array Feature extraction Wavelet packet analysis Vinegar discrimination
a b s t r a c t A feature extraction method is proposed for discriminating three kinds of Chinese vinegars based on a gas sensor array composed of 13 Taguchi gas sensors (TGS). It employs three-scale wavelet packet analysis to decompose each signal of the sensor array into eight difference frequency bands, and the feature values can be obtained by computing the maximum of relative energy corresponding to each frequency band. Using the method, feature vectors of 13 dimensions were extracted from response signals of the array. At the same time, principal component analysis (PCA) and radial basis function neural network (RBFNN) were also employed to analyze these data so as to verify the validity of the method. The result of data processing indicated that both PCA and RBFNN could correctly discriminate the three kinds of vinegars. Therefore we think the feature extraction method is effective in respect of vinegars discrimination. © 2008 Published by Elsevier B.V.
1. Introduction Vinegar is favorite food and a large amount of vinegar is produced and consumed every year in China, so that the quality assessment of vinegars has been more and more attracting the attention of people. To date, besides various chemical analysis methods (such as atomic absorption spectrum and gas chromatography) and sensory analysis, gas sensor arrays combined with proper pattern recognition algorithms (or called electronic noses, in short, eNoses) are also employed to discriminate many different kinds of vinegars [1–4], and the eNoses are a very promising technique for vinegar characterization [2]. In terms of a gas sensors array, transient or dynamic responses are often selected as test signals [5], and it uses the global information formed by the responses of all sensors to discriminate different vinegars. However, since a typical dynamic signal corresponding to each sensor is comprised of numerous measured values, it may be called a big and complex dataset. So, a preceding stage of feature-extraction embedded in these datasets is frequently required for improving the performance of subsequent pattern recognition algorithms and facilitating the discrimination task. In the field of gas sensor arrays, mostly captured features are only a portion of the global information and cannot entirely reflect the whole of sensors’ responses. In late years, some stronger feature extraction methods have been proposed in literature [5–10],
∗ Corresponding author. Tel.: +86 379 6421 9608; fax: +86 379 6423 1373. E-mail address:
[email protected] (Y. Yin). 0925-4005/$ – see front matter © 2008 Published by Elsevier B.V. doi:10.1016/j.snb.2008.07.018
but those methods are satisfactory only for some applications. Because the types of sensors and practical applications are different to some extent, the feature extraction methods given in literature do not always meet the multifarious purposes and are not the best techniques to perform different kinds of investigations either. The literature [8] also pointed out that the shape of a signal depends on the type of sensor, the type of sensitizer, the physical arrangement of the apparatus, and the way of sensitizer introduction to the sensor, and thus we have to accept the fact that a different project needs a different method of feature extraction. In Ref. [9], discrete wavelet transformation was used to extract the important features from dynamic sensor response and then to achieve accurate discrimination between hydrogen and carbon monoxide as well as their binary mixtures over a wide concentration range. This inspires us to think of a new idea for analysis of dynamic responses; the reason is that the data set analyzed is the whole of dynamic responses. By reason of high resolution, the wavelet packet analysis is internationally recognized as the minute tool for time-frequency dynamic responses can be decomposed via the wavelet packet transform to make fine distinctions. Therefore, in this paper, we present a feature extraction method for the purpose of discriminating three kinds of Chinese vinegars effectively. That is, the measured signals of the gas sensor array are decomposed on a three-scale by the wavelet packet of Daubechies, and then the feature values of signals are extracted with the help of an energy idea corresponding to the difference discriminating frequency band. At the same time, to check the performance of this method, we carried out the work of discriminating three kinds of Chinese vinegars using prin-
1006
Y. Yin et al. / Sensors and Actuators B 134 (2008) 1005–1009
Table 1 The details of the vinegar samples utilized in the experiment Sample label
Type
Raw materials
Total acidity (g/100 ml)
Fermentation method
Production area
sxlcc zjmc szc
Mature vinegar Rice vinegar Fruit vinegar
Water, broomcorn, barley, pea, wheat bran, sodium benzoate Water, rice, wheat bran, sugar, salt, sodium benzoate Spring water, persimmon, wheat mouldy bran, sodium benzoate
5.0 3.6 3.5
Solid fermentation Solid fermentation Solid fermentation
Shanxi province Jiangsu province Henan province
cipal component analysis (PCA) and radial basis function neural network (RBFNN). 2. Experimental 2.1. Materials and gas sensor array Three kinds of commercial Chinese vinegars were selected as identified objects, i.e. Shanxilaochencu, Zhenjiangmicu and Luoyanghuaguoshanshizicu, and they are labeled as sxlcc, zjmc and szc for convenient expression, respectively. The sample label, type, raw materials, total acidity, fermentation method and production area are listed in Table 1. In constructing a gas sensor array, tin oxide gas sensors are still good options, because of advantages of low-cost, higher sensitivity, better stability, easy maintenance and simple operating circuit. So, we selected 13 TGS sensors made in Japan (Figaro Engineering Inc.) to make up a gas sensor array in order to classify the above vinegars; they were TGS-800, TGS-812, TGS-813, TGS-821, TGS-822, TGS-824, TGS-825, TGS-826, TGS-830, TGS-831, TGS-832, TGS-842 and TGS-880. The gas sensor array was placed in a stainless steel test chamber, the size of which was 2 l volume and 20 cm dm. A 16channel and 12-bit high precision data acquisition system (DAS) was employed for 13 TGS sensors, a humidity sensor and a temperature sensor. Heater voltage of each sensor was 5 ± 0.05 V, the circuit voltage was 10 ± 0.01 V, and the circuit voltages of a humidity sensor and a temperature sensor were also 10 ± 0.1 V. 2.2. Measurement method 2.2.1. Sampling An accurate quantity of samples is the foundation of analysis during measurement. But the volatiles of vinegars are a kind of compressed gas and are diffusive, so that it is quite difficult for us to carry out the sampling accurately. According to literature [5], we directly sampled fixed amounts of vinegars contained in an evaporating dish of 10.0 cm diameter into the test chamber; the amount of each testing sample was 5 ml. 2.2.2. Testing method and test results We selected transient or dynamic responses as test signals, because dynamic responses are a very important part of sensor signals containing abundant features of communication [10,11]. 35 samples from the same batch were tested for each kind of vinegar, the sum of the three kinds of vinegars was 105 samples, and the measurement of these samples was carried out by turns, that is, a cycle of sxlcc → zjmc → szc → sxlcc. During the measurement, one response result of each sensor was the mean of triplicate response values captured in rapid succession by DAS in order to partially reduce the effect of white noise, and the total number of response data of a sensor to one sample was 1500 (i.e. 1500 data). It took 1500 s or so for the 1500 response measurements, and the interval between two neighboring response measurements was 1 s. The response results of each sensor in 1500 s can reflect basically its process of dynamic response. Fig. 1 shows the response results of a sensor to sample sxlcc. For the 105 samples, it took 35 days to discontinuously test them. In addition, it took 15 min or so
to recuperate these sensors before every measurement. The recuperation was carried out by cleaning the test chamber with clean air. To reduce the effects of temperature and humidity on the gas sensor responses, two steps were adopted. Firstly, the response of each sensor was also measured to corresponding temperature and humidity before measurement to volatile samples, giving the baseline value. Then, the baseline value was subtracted from the corresponding 1500 response values of each sensor to one sample, and 1500 difference values were obtained. The one difference data was called as one test result of each sensor to one sample. This treatment method is named “baseline-removing pretreatment”. Secondly, the values of temperature and humidity were simultaneously measured so as to further compensate their effects. The details will be discussed later. 3. Feature extraction and discussion 3.1. A brief of wavelet packet decomposition The wavelet packet method [12,13] is a generalization of wavelet decomposition that offers a richer signal analysis. For a given orthogonal wavelet function, a library of wavelet packet bases is generated. Each of these bases offers a particular way of coding signal, preserving global energy and reconstructing exact features. In the orthogonal wavelet decomposition procedure, the generic step splits the approximation coefficients into two parts. After splitting, a vector of approximation coefficients and a vector of detail coefficients are obtained, both at a coarser scale. The information lost between two successive approximations is captured in the detail coefficients. The next step consists of splitting the new approximation coefficient vector; successive details are never reanalyzed. But, in the corresponding wavelet packet situation, each detail coefficient vector is also decomposed into two parts using the same approach as in approximation vector splitting. This offers the richest analysis: the complete binary tree is produced in the
Fig. 1. Response curve of a sensor to sample sxlcc.
Y. Yin et al. / Sensors and Actuators B 134 (2008) 1005–1009
1007
Fig. 2. Three-scale wavelet packet decomposition tree.
one-dimensional case. The binary tree of three-scale wavelet packet decomposition is illustrated in Fig. 2. In Fig. 2, each node of the decomposition tree (such as A1 , D1 , AAA3 , DDA3 , AAD3 , DDD3 ) corresponds to a set of decomposition coefficients or difference frequency band. In order to explain how to obtain these coefficients (or vectors), an example is given with the help of Fig. 3. In Fig. 3, cA1 denotes a set of approximation coefficients, cD1 a set of detail coefficients, and cA1 and cD1 are corresponding to A1 and D1 in Fig. 2. These vectors are obtained by convolving s with the low-pass filter Lo D for approximation and with the high-pass filter Hi D for detail, followed by dyadic decimation (Fig. 3). The symbol “↓2” denotes down sampling. The length of each filter is equal to 2N. If n is the length of s, the signals F and G are of length n + 2N − 1, and then the coefficients cA1 and cD1 are of length: floor
n − 1 2
+N
The signals cA1 and cD1 can also be split into two parts using the same scheme, replacing s by cA1 and cD1 , respectively, and so on. In Fig. 3, low-pass filter Lo D and high-pass filter Hi D are related to the selected wavelet function. In this paper, features were just extracted from the coefficient sets of three-scale wavelet packet decomposition (cAAA3 , cDAA3 , cADA3 , cDDA3 , cAAD3 , cDAD3 , cADD3 and cDDD3 ) for discriminating three kinds of Chinese vinegars.
coefficient set can be given:
E3j =
m
A key stage in wavelet packet decomposition is the selection of wavelet-base. In our investigation, we selected Daubechies wavelet [13] by comparing and computing different wavelet bases. Another reason is that it possesses some fine characteristics: orthogonality, losslessness, power complementary, biorthogonality, compact support, etc. According to the three-scale wavelet packet decomposition tree, while one signal of a gas sensor is decomposed on a three-scale, 8 coefficient sets corresponding to 8 frequency bands can be obtained. Then, the following definition as a consequence of
|c3jk |
j = 0, 2, . . . , 7,
(1)
k=1
where E3j is the energy value corresponding to the j-th frequency band, m the number of coefficients, c3jk the k-th coefficient of the j-th frequency band corresponding to three-scale decomposition. Therefore, after three-scale wavelet packet decomposition, eight energy values will be obtained for each gas sensor: T = [E30 , E31 , E32 , E33 , E34 , E35 , E36 , E37 ] Because of 13 gas sensors, every test sample will give birth to a vector of 104 dimensions (8 × 13 = 104), so that much higher dimensions must be disadvantageous for succeeding data-processing. At the same time, by analyzing all T of 13 gas sensors to 105 samples, we found that all E30 were the largest and the other seven values of energy were very small; there was a great deal of difference between E30 and others in quantity. We think this is consistent with the idea of wavelet packet analysis, since these coefficients corresponding to E30 are a kind of approximation of sensor signal and represent its mainstream traits. So, all E30 were extracted as features for discriminating the three kinds of Chinese vinegars. Due to the large value of E30 and for the purpose of simple and convenient computation, a relative energy value was adopted as a computing feature. The relative energy was computed by: RE30 =
3.2. Features extraction
2
E30 E
(2)
where RE30 is the relative energy, E the total energy of T, and E is given by:
E=
7
E3j
(3)
j=0
Thus, the feature vector of one test sample makes up 13 different RE30 values corresponding to 13 gas sensors.
Fig. 3. One-scale decomposition for a signal.
1008
Y. Yin et al. / Sensors and Actuators B 134 (2008) 1005–1009 Table 2 Comparison between test results and target results of 15 samples No.
Testing values of network
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0.9974 1.0005 1.0005 1.0000 1.0001 −0.0009 0.0017 −0.0017 0.0031 0.0012 0.0001 0.0000 0.0000 0.0000 0.0000
0.0027 −0.0001 −0.0001 0.0001 0.0002 1.0001 0.9985 1.0017 0.9968 0.9988 0.0000 0.0000 0.0000 0.0000 0.0000
Target values −0.0002 −0.0004 −0.0004 −0.0001 −0.0003 0.0008 −0.0001 0.0000 0.0001 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
Fig. 4. The PC1 and PC2 for discrimination of three kinds of Chinese vinegars.
3.3. Discrimination In order to further compensate the effects of temperature and humidity on the gas sensor responses, 13 different RE30 and humidity values as well as temperature values that corresponded to one test sample were considered as input parameters of PCA and RBFNN in the lump. Here, PCA and RBFNN were employed as pattern recognition algorithms for three kinds of vinegars. Therefore the discrimination information will include the information of humidity and temperature, and then we may preferably achieve the compensation for the effects of humidity and temperature. We call this as the compensation method based on learning of humidity and temperature. 3.3.1. Discrimination based on PCA According to the above, X was assumed an analysis matrix of dimensions 105 × 15 including the information of features and humidity as well as temperature; namely, X had 105 data patterns and each pattern contained 15 parameters. The first two eigenvalues of X were 9.6834 and 1.9406, and the percent of variance explained by the first two principal components was 77.4931%; it seems not to be larger than usual cases. We think that a gas sensor array holds the characteristic of cross-reactivity, the features were extracted from the whole dynamic responses of each sensor, and the features may implicate the characteristic of cross-reactivity, so that such a percent of variance may reflect the cross-reactivity of the gas sensor array. On the other hand, first two principal components can denote the maximum variance of data set in two-directions; as a consequence, PC1 and PC2 were selected to perform the analysis of X. The plots of the first two principal components for the three kinds of Chinese vinegars are shown in Fig. 4. From Fig. 4, the three kinds of vinegars can be more accurately discriminated. The feature extraction method is proved to be an effective method, and the effects of temperature and humidity are also well compensated. 3.3.2. Discrimination based on RBFNN 30 samples were selected at random from each kind of vinegar to construct training data of RBFNN, and the other 5 samples were used to construct testing data. So, the number of training samples was 90, and the number of testing samples was 15. In order to easily train and test RBFNN, all of training data and testing data were treated with standardization. We took use of the newrb function to create a radial basis network. The target vectors of the network were [1 0 0], [0 1 0] and [0 0 1]. The three target vectors denoted
three kinds of vinegars of sxlcc, zjmc and szc, respectively. After training the network with the help of Neural Network Toolbox of MATLAB, the test samples were tested. Table 2 makes comparisons between the test results and the target results of 15 test samples. From Table 2, we find that all the test samples can be correctly discriminated, and there is a very small error. Therefore the availability of the feature extract method is proved once again. 4. Conclusions The feature extraction is a very important step for the practical applications of gas sensor arrays. In order to carry out discrimination of three kinds of Chinese vinegars, we advanced a feature extraction method based on the wavelet packet decomposition; that is, with the help of three-scale wavelet packet decomposition, eight coefficient sets of each sensor response were obtained, the eight coefficient sets corresponded to eight frequency bands, and then eight relative energy values corresponding to eight coefficient sets were defined. The maximum of the eight relative energy values was selected as a feature value. The discrimination results using PCA and RBFNN showed that the feature extraction method was very effective, and the three kinds of vinegars were correctly discriminated. Acknowledgement This work was supported by the Outstanding Youth Science Foundation of Henan, China, under Grant No. 0612000400. References [1] E. Anklam, M. Lipp, B. Radovic, E. Chiavaro, G. Palla, Characterisation of Italian vinegar by pyrolysis-mass spectrometry and a sensor device (‘electronic nose’), Food Chem. 61 (1998) 243–248. [2] Q. Zhang, S. Zhang, C. Xie, D. Zeng, C. Fan, D. Li, Z. Bai, Characterization of Chinese vinegars by electronic nose, Sens. Actuators B 119 (2006) 538– 546. [3] Q. Zhang, S. Zhang, C. Xie, C. Fan, Z. Bai, Sensory analysis of Chinese vinegars using an electronic nose, Sens. Actuators B 128 (2008) 586–593. [4] X. Zou, J. Zhao, S. Wu, The study of gas sensor array signal processing with new genetic algorithms, Sens. Actuators B 87 (2002) 437–441. [5] Y. Yin, X. Tian, Classification of Chinese drinks by a gas sensors array and combination of the PCA with Wilks distribution, Sens. Actuators B 124 (2007) 393– 397. [6] A. Leone, C. Distante, N. Ancona, K.C. Persaud, E. Stella, P. Siciliano, A powerful method for feature extraction and compression of electronic nose responses, Sens. Actuators B 105 (2005) 378–392. [7] R. Haddad, L. Carmel, D. Harel, A feature extraction algorithm for multi-peak signals in electronic noses, Sens. Actuators B 120 (2007) 467–472.
Y. Yin et al. / Sensors and Actuators B 134 (2008) 1005–1009 [8] L. Carmel, S. Levy, D. Lancet, D. Harel, A feature extraction methods for chemical sensors in electronic noses, Sens. Actuators B 93 (2003) 67– 76. [9] H. Ding, H. Ge, J. Liu, High performance of gas identification by wavelet transform-based fast feature extraction from temperature modulated semiconductor gas sensors, Sens. Actuators B 107 (2005) 749–755. [10] M. Padilla, I. Montoliu, A. Pardo, A. Perera, S. Marco, Feature extraction on three way enose signals, Sens. Actuators B 116 (2006) 145–150. [11] R. Gutierrez-Osuna, H.T. Nagle, S.S. Schiffman, Transient response analysis of an electronic nose using multi-exponential models, Sens. Actuators B 61 (1999) 170–182. [12] J. Zarei, J. Poshtan, Bearing fault detection using wavelet packet transform of induction motor stator current, Tribol. Int. 40 (2007) 763–769. [13] Z. Ge, W. Sha, Wavelet Analysis Theory and MATLAB R2007 Implementations, Publishing House of Electronics Industry, Beijing, 2007, pp. 115–126 (in Chinese).
1009
Biographies Yong Yin received his PhD degree in Food Science and Engineering in Jiangsu University of Science & Technology, China, in 1999. Currently, he is a professor in Henan University of Science & Technology, China. His major research fields are artificial olfactory systems and their applications. Huichun Yu received her PhD degree in Food Science and Engineering in Zhejiang University, China, in 2007. Her current interests are non-destructive technologies in food engineering include using electronic noses Hongshun Zhang is currently completing his MS degree studies in School of Food & Bioengineering in Henan University of Science & Technology, China. His scientific interest is in electronic noses and their applications for food and agricultural products.