Enhancing Classification Performance of Multisensory Data through Extraction and Selection of Features

Enhancing Classification Performance of Multisensory Data through Extraction and Selection of Features

Available online at www.sciencedirect.com Procedia Chemistry 6 (2012) 132 – 140 2nd International Conference on Bio-Sensing Technology Enhancing Cl...

255KB Sizes 0 Downloads 20 Views

Available online at www.sciencedirect.com

Procedia Chemistry 6 (2012) 132 – 140

2nd International Conference on Bio-Sensing Technology

Enhancing Classification Performance of Multisensory Data through Extraction and Selection of Features M.J. Masnana* , N.I. Mahatb, A. Zakariaa, A.Y.M. Shakaffa, A.H. Adoma, F.S.A. Sa'ada a

Centre of Excellence for Advanced Sensor Technology, Universiti Malaysia Perlis,Perlis 01000, Malaysia b College of Arts and Sciences, Universiti Utara Malaysia, Kedah 06010, Malaysia

Abstract Linear discriminant analysis (LDA) has been widely used in the classification of multi sensor data fusion. This paper discusses the performance of LDA when the classifications were performed based on feature extraction and feature selection methods. Comparisons were also made based on single sensor modality. These strategies were studied using a honey dataset along with two types of sugar concentration collected from two types of sensors namely electronic nose (e-nose) and electronic tongue (e-tongue). Assessment of error rate was achieved using the leave-one-out procedure.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of the Institute of Bio-Sensing Technologies, UWE Bristol. Keywords: Linear discriminant analysis, multi sensor data fusion; feature extraction; feature selection, leave-one-out error rate

1. Introduction Multi sensor data fusion is an evolving technique related to problem of how to combine data from one or multiple (and possibly diverse) sensors in order to make inferences about a physical event, activity or situation. Multi sensor data fusion defines as the theory, techniques, and tools which are used for combining sensor data, or data derived from sensory data into a common representational format [1]. The definition also includes multiple measurements produced at different time instants, by a single sensor as

* Corresponding author. Tel.: +604-9855430; fax: +604-9855432. E-mail address: [email protected].

1876-6196 © 2012 Elsevier B.V. doi:10.1016/j.proche.2012.10.139

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

Correct Classification Rate (%)

described by [2]. One of the widely used applications of multi sensor data fusion is in the field of pattern recognition and classification. With multi sensor data fusion system, it is possible to merge a wide spectrum of sensors that results in a larger and complex system compared to single sensor modality. Such attempt indirectly introduce highly redundant data especially if it involved fusion of array sensors such as in the application of electronic nose (e-nose) and electronic tongue (e-tongue). Throughout this article, the words features, sensor arrays, or variables are referred to a common meaning. Theoretically, having more features implies more discriminative power in classification. However, this is not always true in practical experience, because not all features are important for understanding or representing the underlying phenomenon of interest [3]. Please refer to Fig. 1 for the comparison of classification performance to the number of selected features. It is obvious that selection of a number of parameters (i.e. features) that represent the optimal position would increase the performance of correct classification. Optimal position Few parameters Good performance

Redundancy Many parameters Good performance

Inadequate array Few parameters Poor performance

Poor configuration Many parameters Poor performance

Number of Parameters Fig. 1. Configuration performance plot for sensor reduction [4]

1.1. Linear Discriminant Analysis Linear discriminant analysis (LDA) is a well known classical statistical technique to find the projection that maximizes the ratio of scatter among the data of different classes to scatter within the data of the same class. It is the most widely used method in practice. As there is no such best method to fit a particular classification problem, LDA is preferable in this experiment due to its simplicity and good performance especially when the normality assumption is fulfilled. Furthermore, LDA works when the measurements of the independent variables of each observation are continuous quantities which are the case of sensor data. Features obtained by LDA are useful pattern classification since they bring the data of the same class more closely, and the data of different classes farther [5]. 1.2. Feature Extraction One of the most commonly applied feature extraction in the multi sensor data fusion context is the principal component analysis (PCA). PCA is also known as an unsupervised linear dimensional reduction technique. The basic idea of PCA is to describe the dispersion of an array of n-points in p-dimensional

133

134

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

space by introducing a new set of orthogonal linear coordinates so that the sample variance of the given points with respect to these derived coordinates are in decreasing order of magnitude [6]. Relevant components are selected based on the magnitude of the eigenvalues that cater the highest proportion of variation in the data. Even though there is no objective way of deciding how many principal components are relevant for further analysis [7], the number or selected principal components can be chosen based on the sum of variances of the first few principal that has high percentage (e.g. 80% or more) or principal components with eigenvalues greater or equal to one [8]. 1.3. Feature Selection Feature selection is a dimensionality reduction technique that can be used to configure small sensor arrays for specific measurement of e-nose and e-tongue. The goal of feature selection is to find an optimal subset of features that maximizes the information contents or predictive accuracy. There are two strategies in the feature selection technique; 1) sequential and 2) randomized. Sequential search algorithms are greedy search strategies that reduce the number of features to be visited during the search by applying local search. Among the methods include sequential forward selection, sequential backward selection and stepwise selection. Randomized search algorithms attempts to overcome the computational costs of exponential methods. These techniques include simulated annealing and genetic algorithms. Sequential forward selection begins with an empty set and sequentially adds features. This method adds one feature at a time to the model. The procedure continues until the selection criterion has reached a minimum or all features are added to the model. The procedure begins by considering each of the features individually and selecting the feature z1 , which gives the lowest value for the selection criterion where the selection criterion is defined as the prediction error of a classifier model. The next step is to calculate all possible two feature models including z1 . Again, the feature that gives the lowest selection criteria (together with z1 ) is added to the model. This process continues until the optimum subset is found. The backward elimination method is the opposite compared to forward selection. In this case, all p features are included in the beginning. Features are then removed, one at a time, based on their contribution to the selection criterion.

Identity Declaration

Feature Extraction

E-Tongue

Data Level Fusion

E-Nose

Association

1.4. Low Level Data Fusion

Joint Identity Declaration

Fig. 2. Framework of low level data fusion by Hall (1992)

In low level data fusion, different sensors observe the target object independently, and the recorded raw sensor data are combined. In order to fuse raw data, the original sensor data must be commensurate i.e. must be observation of the similar physical quantities [9]. However, in the case of e-nose and e-tongue sensor, different physical identity is being observed containing different number of features, i.e. volatile gas by an array of sensors and taste chemical also by an array of sensors respectively. One way to make

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

the fused data commensurate, datasets of the same size sample is collected. It is important to ensure the new dataset is formed from the original non-normalized data. Usually low level data fusion in identity fusion provides the most accurate result [9], although it might not be the case all the time. This may be due to the fact that the originality information from each sensor is maintained and used in further processes. The entire process of low level data fusion can be observed from Fig. 2. 2. Materials and Methods 2.1. Sample Selection In this experiment, four samples of each honey type were prepared from 11 different brands of honey, and two different brands of syrup. In total, about 52 samples of honey, and syrup were prepared for this experiment, as summarized in Table 1. Table 1. Description of different samples variety of honey and syrup applied in the experiment Item

Label

Description

Leaf Honey

1

Honey from different floral origin

Durian Honey

2

Honey from different floral origin

Maluka Honey

3

Honey from different floral origin

Coconut Honey

4

Honey from different floral origin

Starfruit Honey

5

Honey from different floral origin

Wax Apple Honey

6

Honey from different floral origin

Tualang Tiga A

7

Tualang honey from different make

Tualang Tiga B

8

Tualang honey from different make

Tualang AgroMas

9

Tualang honey from different make

Tualang As-Syifa

10

Tualang honey from different make

Tualang Rosebee

11

Tualang honey from different make

Nona

12

sugar concentrate

Bunga Raya

13

sugar concentrate

2.2. E-Nose Measurements The Cyranose320 (e-nose) from Smith DetectionTM which uses 32 non-selective sensors of different types of polymer matrix blended with carbon black was employed. The combination of these 32 sensors as an array allows the qualitative and capable of performing quantitative assessments of complex solutions. The e-nose setup for this experiment is illustrated in Fig. 3 and the setting on the sniffing cycle is also indicated in Table 2. Each sample was drawn from the bottle using 10ml syringe and kept in a 13 x 100 mm test tube and seal with a silicone stopper. Each sample was replicated four times. Before measurement, each sample was placed in a heater block and heat up for 10 minutes to generate sufficient headspace volatiles. The temperature of sample was controlled at 50 r °C during the headspace collection. Preliminary experiments were performed to determine the optimal experimental setup for the purging, baseline purge and sample draw durations. Ten seconds baseline purge with 30 seconds sample draw

135

136

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

produced an optimal result (result is not shown). Baseline purge was set longer to ensure residual gases were properly removed since all the samples are in a liquid form and contains moisture. Cyranose320

RS232 Purge Inlet

Purge Outlet

Charcoal Filter Honey sample Heater block Computer

Hotplate

Fig. 3. E-nose setup for headspace evaluation of honey and sugar concentration.

The pump setting was set to medium speed during sample draw. The filter used is made up of activated carbon granules and has large surface area which is effective to remove a wide range of volatile organic compounds and moisture in the ambient air. The experiment was carried out using e-nose on a variety of honey samples followed by syrup. Table 2. E-nose parameter setting for honey, sugar and adulterated samples assessment

Sampling Setting

Cycle

Time (s)

Pump Speed

Baseline Purge

10

120 mL/min

Sample Draw

30

120 mL/min

Idle Time

3

-

Air Intake Purge

80

160 mL/min

2.3. E-Tongue Measurements The chalcogenide-based potentiometric e-tongue was made up of seven distinct ion-selective sensors from Sensor Systems (St. Petersburg, Russia). The same principle explained in Section B for e-nose was adopted for e-tongue to discriminate the complex solutions. Table 3 describes the potentiometric sensors used in this experiment. The e-tongue system shown in Fig. 4 was implemented by arranging an array of potentiometric sensors around the reference probe. Each sensor output was connected to the analogue input of a data acquisition board (NI USB-6008) from National Instruments (Austin TX, USA). A 5% (w/v) solution of honey in distilled water was prepared and stirred for three minutes at 1,000 rpm before making any measurements. Each sample was replicated five times. For each measurement, the e-tongue was steeped simultaneously and left over for five minutes, and the potential readings were recorded for the whole duration. After each sampling, the e-tongue was dipped for one minute in 10% ethanol, stirred at 400 rpm and rinsed twice using distilled water (stirred at 400 rpm for two minutes) to

137

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

remove any sticky residues from previous samples sticking on the sensor surface to avoid contaminating the next sample. Arrangement of chalcogenide sensor array

Ag/AgCl

NI USB 6008 (NiDaQ)

Chalcogenide Sensor array

Virtual Instrument (VI) Interface Pattern Recognition Multivariate

100ml Honey solution

analysis

Fig. 4. E-tongue setup for headspace evaluation of honey, sugar concentration and adulterated sample Table 3. Chalcogenide-based potentiometric electrodes used in e-tongue Sensor Label

Description

Fe3+

Ion-selective sensor for Iron ions

Cd2+

Ion-selective sensor for Cadmium ions

Cu2+

Ion-selective sensor for Copper ions

Hg2+

Ion-selective sensor for Mercury ions

Ti+

Ion-selective sensor for Titanium ions

S2-

Ion-selective sensor for Sulfur ions

Cr (VI)

Ion-selective sensor for Chromium ions

HI 5311

Reference probe using Ag/AgCl electrode

2.4. Data Analysis The fractional measurement method is essential when using a multi-modalities sensor fusion. This technique is often known as baseline manipulation and was applied to pre-process the data of both modalities [11]. The maximum sensor response, St is subtracted from the baseline, S0 and then divided again by the S0. The formula for this dimensionless and normalized Sfrac, is determined as follows: Sfrac = [St – S0]/S0

(1)

138

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

This gives a unit response for each sensor array output with respect to the baseline, which compensates for sensors that have intrinsically large varying response levels. It can also further minimize the effect of any temperature, humidity and temporal drifts [11]. The data from different modalities were processed separately and all sensors were used in this analysis. In the case of the e-nose, S0 is the minimum value taken during the baseline purge with ambient air and St was measured during the sample draw. Each sampling cycle was repeated three times and the average was obtained for the four replicated samples. For the e-tongue measurements, S0 (baseline reading) is the average reading of distilled water, while St is the sensor reading when steeped in the solution. The steeping cycle was repeated three times for each sample and the average was obtained for each five of the replicated samples. Each Sfrac data point from each e-nose and e-tongue sensor formed the Sfrac matrix. This Sfrac matrix was processed separately and scaled using z-score (Sfrac,1) to zero mean and one standard deviation (taken from MATLAB statistical toolbox). This is to ensure that all sensor responses were commensurate and no particular sensor dominates the results. Classification for each of the single sensor data and fusion data were performed using the LDA rules with backward selection and forward selection for the feature selection approach. For each iteration, evaluation was made based on p-value of at least 0.05. For the feature extraction approach, the number of principle components selected for classification was based on eigenvalues greater or equal to one. Principle components were calculated based on the correlation matrix. In this case, four principle components were selected for single sensor data and fusion data. Later, the selected principle components were further classified using the LDA rules. The error rates were evaluated based on leave-one out procedure for unbiased estimators. All related algorithms were executed using a comprehensive statistical and graphical programming language R 2.10.1. 2.5. Results and Discussions Results for each of the investigated strategies on the honey dataset are shown in Table 4 and Table 5. For single sensor modality, LDA with forward selection approach for e-nose data outperformed the rest of the strategies with perfect classification. For LDA with backward selection, still the e-nose data performed better with 98.08% correct classification. However, the performance of LDA with feature extraction was not convincing with more or less equal performance for e-nose and e-tongue data with 41.89% and 42.30% correct classification, respectively. Table 4. Performance measured for single sensor modality Correct Classification Criteria Backward Selection + LDA

E-Tongue

E-Nose

(Selected Features)

(Selected Features)

86.54%

98.08%

(T1, T2, T6, T7)

(S2, S6, S7, S14, S18, S20, S31)

Forward Selection + LDA

94.23%

100.00%

(T1, T4)

(S1, S6, S31)

PCA + LDA

57.70%

58.11%

(PC1, PC2, PC3, PC4)

(PC1, PC2, PC3, PC4)

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

In the case of fusion data, LDA with backward selection outperformed the rest of the strategies with 98.8% correct classification. While for LDA with forward selection and LDA with PCA approach, the correct classification are about 96.15% and 57.69% respectively. Table 5. Performance measured for fusion approach Correct Classification Criteria Backward Selection + LDA

E-Tongue + E-Nose (Selected Features) 98.08% (T1, T2, T6, S10, S14, S20, S26)

Forward Selection + LDA

96.15% (T1, S6)

PCA + LDA

57.69% (PC1, PC2, PC3, PC4)

3. Conclusion It is clear for the honey data set, the performance of LDA with forward selection is the best for e-nose data compared to e-tongue data and feature extraction approach. While for the fusion data, LDA with backward selection is better with a slight different with forward selection. However, LDA with PCA were unable to discriminate the 13 groups of honeys and syrups using four principle components. Further analyses shall be carried out to investigate the performance of LDA with PCA for greater number of principle components selected for classification. Since in this research only four components were manipulated for the classification, may be with more number of principle component the error rate can be reduced. Acknowledgements The equipment used in this project was provided by Universiti Malaysia Perlis (UniMAP). The writer acknowledges the financial sponsorship provided by UniMAP and MOHE, under the Academic Staff Training Scheme. References [1] Mitchell HB. Multi-sensor data fusion: an introduction. Berlin, Heidelberg: Springer; 2007. [2] Smith CR, Ericks GJ. Multisensor data fusion: concepts and principles. IEEE Pacific Rim Conference on Communications, Computers and Signal Processing 1991; 235-237. [3] Liu H, Sun J, Liu L, Zhang H. Feature selection with dynamic mutual information. Pattern Recognition 2009; 42: 1330-339. [4] Boilot P, Hines EL, Gongora MA, Folland RS. Electronic noses inter-comparison, data fusion and sensor selection in discrimination of standard fruit solutions. Sensors and Actuators B 2003; 88, 80-8. [5] Kim H, Kim D, Bang SY. Extension of LDA by PCA mixture model and class-wise features. Pattern Recognition 2003; 36, 1095-105. [6] Gnanadesikan R. Methods for statistical data analysis of multivariate observations. USA: John Wiley and Sons Inc; 1997.

139

140

M.J. Masnan et al. / Procedia Chemistry 6 (2012) 132 – 140

[7] Chatfield C, Collins AJ. Introduction to multivariate analysis. Chapman and Hall: Great Britain; 1980. [8] Manly BFJ. Multivariate statistical methods, a primer. USA: Chapmal & Hall; 2005. [9] Hall DL, Llinas J. An introduction to multisensor data fusion. Proceedings of the IEEE 1997; 58: 6-22. [10] Hall DL. Mathematical techniques in multisensor data fusion. Boston: Artec House Inc., 1992. [11] Gardner JW, Bartlett PN. Electronic Noses: Principles and Applications. Oxford UK: Oxford University Press; 1999.