Analytica Chimica Acta 449 (2001) 69–80
Electronic nose based on metal oxide semiconductor sensors and pattern recognition techniques: characterisation of vegetable oils Yolanda Gonz´alez Mart´ın, M. Concepci´on Cerrato Oliveros, Jos´e Luis P´erez Pav´on, Carmelo Garc´ıa Pinto, Bernardo Moreno Cordero∗ Departamento de Qu´ımica Anal´ıtica, Nutrici´on y Bromatolog´ıa, Facultad de Ciencias Qu´ımicas, Universidad de Salamanca, Plaza de la Merced s/n, 37008 Salamanca, Spain Received 7 May 2001; received in revised form 19 July 2001; accepted 21 August 2001
Abstract Different supervised pattern recognition treatments were applied to the signals generated by an electronic nose for the classification of vegetable oils. The system, comprising six metal oxide semiconductor sensors, was used to generate a pattern of the volatile compounds present in the samples. Feature selection techniques were employed to choose a set of optimally discriminant variables. The K-nearest neighbours (KNN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), soft independent modelling of class analogy (SIMCA) and artificial neural networks (ANN) were applied to model the different classes. The results obtained indicated good classification and prediction capabilities, the neural networks being those that afforded the best results. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Electronic nose; Metal oxide semiconductor gas sensors; Chemometric techniques; Vegetable oils
1. Introduction The analysis of volatile compounds by analytical methods for the characterisation of food products in most cases involves processes of extraction and chromatographic separation [1–4]. Sensorial analysis is commonly used for quality control in foods and beverages [5], but this technique has some drawbacks derived from the subjectivity of human panels. Both processes are usually time-consuming and costly, and hence there is a large demand for rapid, cheap and effective techniques for quality control in food products. In this sense, in recent years, considerable efforts have been devoted to the development of so-called ∗ Corresponding author. Tel.: +34-23-294-483; fax: +34-23-294-574. E-mail address:
[email protected] (B. Moreno Cordero).
“electronic noses”. This term is used to refer to a system that mimics human olfaction by combining the response of a set of chemical sensors, with partial specificity for the measurement of volatiles, and techniques able to recognise patterns for data interpretation [6–8]. Different types of sensors, such as metal oxide semiconductors, conducting polymers and surface acoustic wave sensors, have been used [9–11]. Recently, new systems based on the utilisation of a mass detector for the collection of sample data have been developed. Chemometric techniques have been used in analytical chemistry to treat experimentally obtained information mathematically [12–14]. The set of sensors of an electronic nose affords a large amount of information and the processing of the data generated by the system is an essential part of the concept of electronic olfactometry [7]. Different statistical
0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII: S 0 0 0 3 - 2 6 7 0 ( 0 1 ) 0 1 3 5 5 - 1
70
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
packages — both commercial and custom-designed by the suppliers themselves of the devices — based on pattern recognition techniques have been used [15]. In this context, artificial neural networks (ANN) seem to be especially appropriate for the treatment of the responses generated by the sensors [16]. Electronic olfactometry has found applications in different fields of science, such as food and beverage quality control [17–21], environmental analysis [22–25], the follow-up of biological processes [26,27] and even the diagnosis of different diseases [28,29]. Chemometric treatment of chromatographic data [30,31], pyrolysis mass spectra [32], Fourier transform Raman spectra [33] and NMR spectra [34] have been used in the classification of vegetable oils. Also, conducting polymer sensors have been specially designed to achieve the characterisation of different olive oil types [35,36]. In a previous work [37], we proposed the use of an apparatus equipped with an array of metal oxide semiconductor gas sensors (FOX 2000 Electronic Nose) for the classification of a reduced number of vegetable oils commercialised in Spain using linear discriminant analysis (LDA) to verify the classification capacity of the proposed method. In view of the good results, in the present work, we used an electronic nose with six metal oxide sensors to obtain the responses of a larger number of samples (141) in an attempt to demonstrate the capacity of the electronic nose to characterise vegetable oils. The signals obtained were subjected to different pattern recognition techniques such as KNN, LDA, BAYES, SIMCA and ANN and the different results are discussed.
2. Experimental 2.1. Apparatus An Alpha M.O.S. Electronic Nose system, the FOX 3000, was used to obtain the chemical fingerprints of the samples. This system comprises the following modules. • An air conditioning unit (ACU 500) which allows control of the humidity of the air of the carrier gas. This control improves the stability of the baseline and this is reflected in an increase in the
reproducibility of the system. The temperature and relative humidity values were set at 36◦ C and 20%, respectively. • A model 720 sample chamber with sample vials that can be heated up to 200◦ C. Rigorous control of time and heating temperature afforded a reproducible headspace. This chamber permits operation with sample volumes up to 120 ml. Plastic weighing boats (44.4 mm × 44.4 mm × 9.4 mm, Sigma) were used as sample holders. • A sensor chamber for the measurement of the odour characteristics of the samples. The basic device consists of six metal oxide gas sensors covering a broad range of molecules that can be present in the headspace of the samples. The nomenclature and characteristics of the sensors used were as follows [38]: P30/1, particularly sensitive to organic solvents and light polar molecules; P10/1, sensitive to hydrocarbons; P10/2, sensitive to methane and propane and aliphatic non polar molecules; P40/2, sensitive to heteroatom, chloride and aldehydes; SX21, sensitive to hydrogen bonds and T70/2, sensitive to alcoholic vapours. The operation temperature of the sensors was 300◦ C. The presence in this chamber of a further two sensors permitted control of internal temperature and relative humidity. • A PC computer running the FOX 4000 software was used to collect the response (resistance changes, R) of the sensors and to control the functioning of the whole system. 2.2. Oil samples A set of 141 samples of commercially available oil of various origins were obtained from different suppliers. The distribution of the different samples was as follows: 41 of them were virgin olive oil, 59 were non-virgin olive oil and the rest 41 were seed oil. Samples were stored in the dark and none of them was subjected to any treatment that might alter their composition. 2.3. Procedure Aliquots of 1.0 ml of each oil sample in the sample holder were introduced into the sample vial and the sample chamber was set at 35◦ C. Ambient air was eliminated by pumping a stream of synthetic air
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
(relative humidity, 20%) for 1 min at a flow rate of 300 ml min−1 . After a headspace generation time of 7 min, the volatile compounds generated were pumped at a speed of 300 ml min−1 through the measurement chamber containing the array of gas sensors. During the headspace generation time, the stream of synthetic air, of controlled relative humidity, was blown into the measuring chamber to enable the gas sensor signal to return to the baseline. Upon injecting the sample, data were acquired every second over 60 s. Chemical calibration of the sensors was carried out to reduce the influence of external parameters such as variations in the relative humidity of the air, changes in temperature and the drift of the sensors over time. The calibration transfer procedure used has been reported elsewhere [37]. After this treatment, the differences between the signals obtained for replicates of the same sample (even when measured on different days) are less than 1%. As an example, Fig. 1 shows the signals corresponding to a sample of virgin olive oil (a) and to the same sample measured a week later (b). Parts (c)
71
and (d) correspond to the signals of a different virgin olive oil and a non-virgin olive oil. 2.4. Data analysis and chemometric procedures The number of data per sample available was 360, 1 point per second over 60 s and six sensors. However, the high correlation observed between the data corresponding to different points of the same sensor allowed a reduction in the number of data per sample, 8 points per sensor being selected at the following times: 5, 10, 15, 20, 25, 30, 45 and 60 s. In this way, more data were included from the ascending zone of the recording and many of those corresponding to the flat zone were discarded. Thus, the initial matrix used for the different chemometric treatments had a dimension of N × 48, where the rows represent the different samples of oil analysed and the columns correspond to the analytical signals of the eight times selected for each of the six sensors. A second step in data reduction, specific for each task, was carried out and
Fig. 1. Variation of signal with time upon sample injection for different samples: (a) virgin olive oil; (b) the same sample, a week later; (c) other virgin olive oil; (d) non-virgin olive oil.
72
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
variables were selected in order to reduce the number of data and to use only those containing differential information. The different classification tasks addressed were as follows: olive oil/seed oil (N = 141), virgin olive oil/other oil (N = 141), virgin olive oil/ non-virgin olive oil/seed oil (N = 141), virgin olive oil/non-virgin olive oil (N = 100). In this part of the work, different pattern recognition techniques were used. 2.4.1. K-nearest neighbours (KNN) This is a non-parametric method based on the distance between objects in a space of dimension equal to the number of variables explored. The class to which the sample is assigned is that of the samples of the training group closest to it. Only the closest to K objects are used to make the assignation. The distance criterion used in the present work was Euclidean distance. 2.4.2. Linear discriminant analysis (LDA) This is a classification procedure in which the classes are considered to have normal distribution and equal dispersion (covariance matrix). The discriminant functions, which are obtained by linear combinations of initial variables, are constructed in such a way that that the sample sets are differentiated as much as possible, reducing the number of dimensions with a minimum loss of information. The mathematical classification rule divides the space of dimension N into as many subspaces as classes and is therefore defined by a straight line, a plane or a hyperplane. The criterion used to calculate the discriminant functions was to maximise the ratio of variance between categories to variance within categories.
2.4.4. Soft independent modelling of class analogy (SIMCA) The SIMCA method constructs a model for each class independently by principal component analysis. The number of principal components used for each class may be pre-set or may be selected in such a way that they explain a given percentage of the variance of the data. In this way, a closed space is constructed at a level of significance (95%) by a critical distance. Each object considered is assigned to one category according to its Euclidean distance from the model. The same concepts of sensitivity and specificity can be associated with this chemometric treatment. 2.4.5. Artificial neural networks (ANN) ANN can be defined as a set of very simple calculation units (nodes) that start out from a data set and transform it into a set of response values. In chemometrics, neural networks have been used to solve problems of both supervised and unsupervised pattern recognition. For classification purposes, the network builds a model based on a set of input objects (the training set) with known outputs, adjusting the weights associated with each connection so that output values as similar as possible to the real values are generated. These weights contain information about the relationships between the input variable set (inputs) and the categories studied (outputs). Pattern recognition analysis was performed by means of the Parvus statistical software packages [39]. Neural network analysis was accomplished using the Neudesk program [40].
3. Results and discussion 3.1. K-nearest neighbours (KNN)
2.4.3. Quadratic discriminant analysis (QDA) In this classification process, it is assumed that each class has normal distribution and that dispersion is different for each class and that the hypersurfaces separating the class are therefore quadratic. For each class and its model, sensitivity is defined as the percentage of samples that, belonging to that class, are correctly recognised by the mathematical model, whereas, the concept of specificity refers to the percentage of samples that, belonging to a different class, are recognised as being foreign to the model.
This supervised pattern recognition method was applied, with raw and normalised data (raw data divided by their maximum value), for the different classification tasks described above. Initially, a classification model was constructed in which all the samples were used as the training set. In a second stage, to validate the classification model thus obtained and its stability in predicting, a cross-validation step was performed with 4 cancellation groups (the samples are randomly divided into 4 groups, each of them
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
73
Table 1 Percentages of classification (A) and prediction (B) for KNN Raw values
Olive oils/seed oils
1a 2b
Virgin olive oils/other oils
1a 2b
Virgin olive oils/non-virgin olive oils/seed oils
1a 2b
Virgin olive oils/non-virgin olive oils
1a 2b
a b
Normalised values
A
B
A
B
80.1 82.3
79.3
87.2 88.0
83.6
87.2 88.7
85.7
87.2 87.9
84.3
72.3 71.0
75.0
75.9 73.4
79.4
88.0 88.8
88.5
86.0 86.8
84.4
Correct classifications: all samples in training set. Cross-validation: 4 cancellation groups.
containing 25% of the total). In order to perform this cross-validation procedure, the same process was performed four times with four different training and prediction sets, ensuring that all the samples were included at least once in the prediction set. The results obtained with this non-parametric method are summarised in Table 1. According to these data, similar results were obtained with the two different types of data treatment. Additionally, as the complexity of the classification task increases, the percentages of correct recognition and prediction ability for KNN are seen to decrease. The similar values in classification and prediction indicate that the model is fairly stable, although the percentages of classification are not very good, above all in the case in which, with the complete set of samples, the aim was to differentiate between virgin and non-virgin and seed oil (lower than 80%). 3.2. Linear discriminant analysis (LDA) The second step of data analysis consisted of the application of LDA to the classification of the samples. As in the case of KNN, the mathematical treatment was performed for both raw and normalised data. With the 48 variables selected, LDA was implemented, using all the samples available as a training set to check the classification capability. For each of the tasks addressed, the percentages of classification obtained were above 95%. A second variable selection process was carried out using the StepLDA (stepwise linear discriminant
analysis) program from Parvus. This process allows one to use variables containing relevant information. Thus, the 15 variables that in each case produced the greatest Mahalanobis distances between the closest two classes were selected. Table 2 shows, as an example, the selected variables for the different tasks addressed. The results obtained in classification after the process of variable reduction were slightly worse than those obtained before, although in all cases they were above 90%. To validate the model, two different processes were used. In the first, a cross-validation process, the samples were randomly divided into 4 groups and the analysis was repeated four times, with each of the 4 sample groups successively used as the prediction set. In this way, each sample was classified three times and predicted once. The second one consisted of repeating LDA on the data 10 times; at the beginning of each new analysis, 20% of the samples were randomly selected and used as the test set. The results obtained in both steps for each of the tasks are summarised in Table 3. It may be seen that the results are similar with both processes. When 48 variables per sample were used, the classification capacity ranged between 97 and 100% and was greater than when the 15 variables selected by StepLDA were used (90–98%). However, in the first case the prediction capacity of the model decreased considerably, the best results being obtained when working with normalised values. The models generated with 15 variables proved to be much more stable since the prediction capacity coincided with that of classification.
74
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
Table 2 Variables selected by StepLDA Time (s)
Olive oils/seed oils
Virgin olive oils/other oils
Sensor
5
P30/1 P10/1 P10/2 P40/2 SX21 T70/2
䊏 䊏
P30/1 P10/1 P10/2 P40/2 SX21 T70/2
Virgin olive oils/non-virgin olive oils/seed oils
P30/1 P10/1 P10/2 P40/2 SX21 T70/2
Virgin olive oils/non-virgin olive oils
P30/1 P10/1 P10/2 P40/2 SX21 T70/2
10
15
20
25
30
45
60
䊏
䊏
䊏 䊏
䊏
䊏 䊏
䊏
䊏
䊏 䊏
䊏
䊏 䊏
䊏 䊏
䊏 䊏 䊏
䊏 䊏 䊏
䊏 䊏 䊏
䊏
䊏 䊏
䊏
䊏
䊏
䊏
䊏
䊏 䊏
䊏
䊏 䊏
䊏
䊏
䊏 䊏
䊏
䊏
䊏
䊏
䊏
䊏
䊏
䊏
䊏 䊏
䊏
䊏 䊏
䊏
䊏
䊏
Table 3 Percentages of classification (A) and prediction (B) for LDA Raw values
Normalised values
48 Variables
Five variables
48 Variables
Five variables
A
B
A
B
A
B
A
B
2b
97.4 96.2
84.4 81.9
90.8 91.3
85.8 83.6
98.8 99.6
98.6 84.3
95.9 96.4
92.2 90.2
Virgin olive oils/other oils
1a 2b
99.8 99.6
87.2 89.9
97.9 97.9
96.4 95.8
98.1 97.7
88.6 87.0
95.3 95.0
91.5 92.0
Virgin olive oils/non-virgin olive oils/seed oils
1a 2b
97.6 96.9
76.6 82.6
91.0 90.1
80.9 82.6
97.9 98.0
82.3 82.9
90.8 90.8
83.7 83.6
Virgin olive oils/non-virgin olive oils
1a 2b
99.7 99.6
80.0 85.3
97.7 97.7
95.0 96.3
99.3 99.3
75.0 86.6
98.0 97.6
96.0 97.2
Olive oils/seed oils
a b
1a
Cross-validation: 4 cancellation groups. Ten times 20% samples in the prediction set.
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
Fig. 2. Discriminant plot for the olive oil–seed oil classification and prediction: (䊐) olive oil samples; (䊊) seed oil samples. Empty symbols: training set; filled symbols: test set.
The worst results were obtained for the most complex task (virgin olive oil/non-virgin olive oil/seed oil), in which the percentage in prediction was between 8 and 10% lower than in classification. In the rest of the cases, the differences lay at around 2%. As an example, Fig. 2 shows the plot of the discriminant scores for one of the four cycles of the cross-validation step for the virgin olive oil/non-virgin olive oil differentiation. In classification, only one non-virgin olive oil sample was mis-classified; all samples in the prediction set were correctly classified with the exception of two corresponding to a virgin olive oil. 3.3. Quadratic discriminant analysis (QDA) QDA was also applied to the classification of the samples. In this case, the software does not allow one to work with the matrices originated when the 48 original variables are used. Accordingly, it was necessary to implement a process of variable selection by stepwise Bayesian analysis (BASTEP). The variables were selected in such a way that they would originate the least number of errors in the classification (misclassification probability) and their number was determined by pre-setting a classification error of 5%.
75
The number of variables selected for each of the tasks performed was different. Thus, for the differentiation between olive oil/seed oil the number of variables was 14 and 97% of the olive oil samples and 100% of the seed oil samples being correctly classified. The sensitivity for both classes lay at around 85%. The specificity for olive oil was 91%, whereas, for the seed oil samples the model was found to be almost non-specific, with a percentage as low as 7.3%. For the differentiation between virgin olive oil/other oils, nine variables were selected; the overall percentage of classification was 94.3% and sensitivity was 90%, with specificities of 83 and 86%, respectively. Likewise, for the classification of the samples in 3 different groups (virgin olive oil/non-virgin olive oil/olive oil), the overall classification with 12 variables was 96% and mean sensitivity was greater than 83%. The specificity for virgin olive oil class was 100% both for non-virgin and for seed oil. For non-virgin olive oil class, specificity was 76 and 68% for virgin olive oil and seed oil, respectively and for seed oil class, specificity was 95% for virgin and only 59% for non-virgin olive oil. The results obtained in the classification of samples of olive oils into virgin–non-virgin afforded classification results of 100%, a sensitivity close to 90%, and a specificity of 100% for virgin olive oil and 81% for non-virgin olive oil. As in the case of linear discriminant analysis, the process was carried out in three steps. In the first, all the samples were using as a training set to check the classification capability. The validation of the classifications rules was done using the cross-validation process and that of QDA 10 times, randomly selecting 20% of the samples, different for each analysis, as the test set. Table 4 shows the results obtained upon applying the two validation processes studied. With both procedures, percentages of classification greater than 95% were obtained, regardless of the task performed and of the previous treatment of the data. The prediction capacity, which was slightly lower, ranged between 87 and 95%, except in the case of differentiation of the samples of oil into 3 groups, in which case these percentages decreased to 77% with self-scaled data. In this case, the best overall results were obtained by normalising the values of the variables. As an example,
76
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
Table 4 Percentages of classification (A) and prediction (B) for QDA Raw values
Normalised values
A
B
A
B
2b
98.1 98.1
89.4 87.8
97.9 98.5
88.6 87.8
Virgin olive oils/other oils
1a 2b
95.3 94.9
87.9 91.3
97.9 98.0
93.6 89.9
Virgin olive oils/non-virgin olive oils/seed oils
1a 2b
97.9 96.4
82.3 86.9
96.7 96.9
87.2 87.1
Virgin olive oils/non-virgin olive oils
1a 2b
99.7 99.4
93.0 95.4
99.7 99.5
94.0 95.4
Olive oils/seed oils
a b
1a
Cross-validation: 4 cancellation groups. Ten times 20% samples in the prediction.
Fig. 3 shows the Coomans diagram for the differentiation of samples of virgin olive oil/non-virgin olive oil corresponding to one of the 10 cycles in which 20% of the samples were used for prediction. It may be seen that for the virgin olive oil class the percentage of correctly classified samples of the training sets and the percentage of correctly predicted samples of the test set was 100%. However, for non-virgin olive oils these percentages were respectively 88 and 77%.
Fig. 3. Coomans plot for the BAYES distances: (䊐) virgin olive oil samples; (䊊) non-virgin olive oil samples. Empty symbols: training set; filled symbols: test set.
3.4. Soft independent modelling of class analogy (SIMCA) SIMCA with raw data afforded models based on three components for each category (variance explained >97%), normal range and 5% as the significance level for critical distance. With normalised data, the number of principal components that accounted for more than 95% of the variance was four or five, depending on the classification task. To study the predictive capability of SIMCA, the cross-validation procedure was applied in four steps. Table 5 shows the results obtained for the different classifications carried out using all the samples as the training set (1) and with the cross-validation process (2). Better results were obtained for the virgin olive oil/others and virgin olive oil/non-virgin olive oil classifications, with recognition capabilities and prediction capabilities higher than 90 and 88% respectively. For the other two tasks studied, the classification and prediction percentages were lower and never greater than 70%. Regarding the sensitivity of the models, this varied as a function of the tasks performed and in general lay between 73 and 83%. The specificity values were quite varied: thus, for example in the olive oil/seed oil classification, they were 36.6 and 53%, respectively, whereas, for the virgin–non-virgin olive oil classification the values were respectively 88 and 83%. The poorer results provided by SIMCA can be explained by talking into account that this technique
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
77
Table 5 Percentages of classification (A) and prediction (B) for SIMCA Raw values
Normalised values
A
B
A
B
2b
77.3 82.3
79.3
83.0 81.4
80.0
Virgin olive oils/other oils
1a 2b
90.8 90.8
92.1
92.9 91.4
88.6
Virgin olive oils/non-virgin olive oils/seed oils
1a
71.6 67.8
66.2
77.3 77.6
71.3
92.0 90.8
88.5
95.0 93.4
86.1
Olive oils/seed oils
1a
2b Virgin olive oils/non-virgin olive oils
1a 2b
a b
Correct classifications: all samples in training set. Cross-validation: 4 cancellation groups.
3.5. Artificial neural networks (ANN)
squared error (RMSE), whose mathematical expression is as follows: m n ˆ 2 1 1 (Vij − Vij ) RMSE = mn
ANN were employed to predict the category on the basis of inputs consisting of the raw values of the sensor response. Optimisation of the structure of the most suitable network for the different classifications studied was done by varying the number of input nodes, the number of hidden layers and their corresponding nodes and the number of output nodes. Different algorithms were studied for the training of the different networks (standard back propagation, stochastic back propagation, quick propagation and weigend weight eliminator), the best results being obtained with the first of these. The stability of the model built for prediction was performed as follows: all samples were randomly divided into 3 groups, 2 of them with 25% of the samples and the third one with 50% of them. In this way, the networks were trained with 25% of all the samples (first group); another (second group) was used as the internal validation group to avoid over-training the network studied; 50% of the remaining samples (third group), which did not participate either in the training step or in the validation process, were used as the test set. Assessment of the results provided by the different networks studied was done using the root mean
where m and n are respectively the number of samples and of classes, V and Vˆ represent the expected values and those calculated by the network. The expected values are 1 for the class to which the sample belongs and 0 for the classes to which the sample does not belong. The training step ends when the validation error increases in 40 consecutive cycles. The prediction capacity was evaluated using the same parameter applied to the prediction sample group (RMSEP). A sample was considered correctly classified when the probability of belonging to a group was greater than 75% for that group, and at the same time lower than 25% for the other groups. Each of the networks studied were trained six times in order to ensure the reproducibility of the results obtained. Table 6 shows the structure of the networks that provided the best results for each of the classifications made, together with the error in prediction and the overall percentage of correctly classified samples corresponding to the prediction set. The best results were obtained with different structures, depending on the classification task. The neural architecture used to model the olive oil/seed oil and virgin olive oil/other oils classifications had four layers with 3 nodes in each of the two hidden layers while for the other two
is a disjoint class modelling procedure and that more emphasis was placed on similarity within a class than on discrimination between classes.
78
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
Table 6 Summary of ANN results
Olive oils/seed oils Virgin olive oils/other oils Virgin olive oils/non-virgin olive oils/seed oils Virgin olive oils/non-virgin olive oils
Network structure
RMSEP
Correct predictions (%)
36:3:3:2 30:3:3:2 30:6:3 24:6:2
0.135 0.138 0.198 0.151
93.8 98.5 80.0 95.4
classification tasks, the best results were obtained with 6 nodes in a hidden layer, and hence the neural network had three layers. The overall percentages obtained in prediction can be considered highly satisfactory. In the olive oil/seed oil classification, the proposed network afforded two mis-predictions for each class, as may be seen in Fig. 4, in which the outputs for the olive oil class are represented (in the seed class, the outputs for olive oil were lower than 0.2). Fig. 5 shows the outputs for the virgin olive oil class corresponding to a neuronal network trained for the virgin olive oil/other oils differentiation; only one sample of olive oil was incorrectly classified (the outputs for other oils were in all cases lower than 0.10). Moreover, the network trained for the classification of the samples in 3 groups led to a classification error of 20%, corresponding to 13 mis-classified samples (only two for the virgin olive oil and seed oil classes but nine samples of non-virgin
Fig. 5. Virgin olive oil–other oils classification. Outputs of the neural network for the samples of the prediction set: (䉫) virgin olive oil samples; ( ) other oils samples.
olive oil). This percentage increased to 86.2% when the classification criterion was made less restrictive and a sample was considered to be correctly classified when the score in its class was greater than 0.5 and lower than 0.5 in the others. Finally, in the optimum neural network for the virgin olive oil/non-virgin olive oil differentiation, three samples (one of virgin olive oil and two of non-virgin olive oil) were incorrectly classified. Accordingly, the neuronal networks offer a chemometric technique of great potential for the treatment of the signals generated by electronic noses based on sensors that afford non-linear responses.
4. Conclusion Fig. 4. Olive oil–seed oil classification. Outputs of the neural network for the samples of the prediction set: (䉫) olive oil samples; ( ) seed oil samples.
A set of 141 samples of vegetable oils was analysed by an electronic nose with a selected array of six metal oxide sensors. The signals obtained were
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
79
Table 7 Summary of results obtained with the different chemometric treatments
Olive oils/seed oils Virgin olive oils/other oils Virgin olive oils/non-virgin olive oils/seed oils Virgin olive oils/non-virgin olive oils
KNN
LDA
QDA
SIMCA
ANN
83.6 85.7 79.4 88.5
98.6 96.4 83.7 97.2
89.4 93.6 87.2 95.4
80.0 92.1 71.3 92.7
93.8 98.5 80.0 95.4
used to classify the samples in different categories: olive oil/seed oil, olive oil/other oils, virgin olive oil/non-virgin olive oil/seed oils and virgin olive oil/non-virgin olive oil. Multivariate chemometric techniques such as KNN, LDA, QDA, SIMCA and ANN were applied to modelling classes on the basis of a set of 48 variables per sample. A summary of the best results obtained in prediction with each of the chemometric techniques used is shown in Table 7. As the difficulty of the classification task increases, the prediction capacity of the model decreases slightly. Accordingly, the discrimination between the three classes (virgin olive oil/non-virgin olive oil/seed oil) would be better attained by a two-step two-category process (first, olive oil/seed oil and then virgin olive oil/non-virgin olive oil) than by only a one-step three-category process. With this, the KNN and SIMCA procedures provide prediction results that are higher than 80%. The prediction capacity of the model generated by QDA varies between 89 and 94% and therefore provides better results than the previous two pattern recognition methods. The best results are obtained with linear discriminant analysis and ANN, with success percentages higher than 92 and 94%, respectively. The combination of chemometric treatments and electronic noses seems to be a promising technique for the characterisation and differentiation of vegetable oils.
Acknowledgements This work was supported by the DGICYT (Project PB97-1322) and the Consejer´ıa de Cultura y Turismo of the Junta of Castilla y Le´on (European Social Funds, Project SA19/99). Y.G.M. and M.C.C.O. acknowledge financial support by the Spanish Government.
References [1] R. Goodacre, D.B. Kell, G.J. Bianchi, J. Sci. Food Agric. 63 (1993) 297. [2] M.T. Morales, M.V. Alonso, J.J. R´ıos, R. Aparicio, J. Agric. Food Chem. 43 (1995) 2925. [3] S.V. Overton, J.J. Maura, J. Agric. Food Chem. 43 (1995) 1314. [4] F. Angerosa, L. Di Giacinto, R. Vito, S. Cunitini, J. Sci. Food Agric. 72 (1996) 323. [5] M. Meilgaard, G.V. Civille, B.T. Carr (Eds.), Sensory Evaluation Techniques, Vols. 1 and 2, CRC Press, Boca Raton, FL, 1987. [6] J.W. Gardner, P.N. Bartlett, Sens. Actuators B 46 (1994) 211. [7] J.W. Gardner, P.N. Bartlett, Electronic Noses: Principles and Applications, Oxford University Press, New York, 1999. [8] C. Garc´ıa Pinto, M.E. Fern´andez Laespada, J.L. P´erez Pav´on, B. Moreno Cordero, Quim. Anal. 20 (2001) 3. [9] M.A. Craven, J.W. Gardner, P.N. Bartlett, Trends Anal. Chem. 15 (1996) 486. [10] J.W. Grate, S.L. Rose-Pehrsson, D.L. Venezky, M. Klusty, H. Wohltjen, Anal. Chem. 65 (1993) 1868. [11] E.T. Zellers, S.A. Batterman, M. Han, S. Patrash, Anal. Chem. 67 (1995) 1092. [12] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. de Jong, P.J. Lewi, J. Smeyers-Verbeke, Elsevier, Amsterdam, 1998. [13] R.G. Brereton, Chemometrics: Application of Mathematics and Statistics to Laboratory Systems, Ellis Horwood, New York, 1990. [14] S.D. Brown, R.S. Bear, T.B. Blank, Anal. Chem. 64 (1992) 22R. [15] A. Ortega, S. Marco, T. Sundic, J. Samitier, Sens. Actuator B 69 (2000) 302. [16] Y. Lu, L. Bian, P. Tang, Anal. Chim. Acta 417 (2000) 101. [17] T. Borjesson, T. Ekl¨ov, A. Jonsson, H. Sundgren, J. Schnurer, Cereal Chem. 73 (1996) 457. [18] P.N. Bartlett, J.M. Elliot, J.W. Gardner, Food Technol. 51 (1997) 44. [19] R.T. Marsilli, J. Agric. Food Chem. 47 (1999) 648. [20] A. Guadarrama, J.A. Fern´andez, M. I˜niguez, J. Souto, J.A. de Saja, Anal. Chim. Acta 411 (2000) 193. [21] I. Gonz´alez Mart´ın, J.L. P´erez Pav´on, C. Gonz´alez P´erez, ´ J. Hern´andez M´endez, N. Alvarez Garc´ıa, Anal. Chim. Acta 424 (2000) 279. [22] R.A. Fenner, R.M. Stuetz, Water Environ. Res. 71 (1999) 282.
80
Y. Gonz´alez Mart´ın et al. / Analytica Chimica Acta 449 (2001) 69–80
[23] K.C. Persaud, A.M. Pisanelli, A. Pauly, V. Demarne, A. Grisel, Sens. Actuator B 55 (1999) 118. [24] A.C. Romain, J. Nicolas, V. Wiertz, J. Maternova, Ph. Andr´e, Sens. Actuator B 62 (2000) 73. [25] R.E. Baby, M. Cabezas, E.N. Wals¨oe de Reca, Sens. Actuator B 69 (2000) 214. [26] C.F. Mandenius, H. Liden, T. Ekl¨ov, M.J. Taherzadeh, G. Liden, Biotechnol. Prog. 15 (1999) 617. [27] J. Schnurer, J. Olsson, T. Borjesson, Fungal Genet. Biol. 27 (1999) 209. [28] W. Ping, T. Yi, H.B. Xie, F.R. Shen, Biosens. Bioelectron. 12 (1997) 1031. [29] J.W. Gardner, H.W. Shin, E.L. Hines, Sens. Actuator B 70 (2000) 19. [30] O. Eddib, G. Nickless, Analyst 112 (1987) 391. [31] L. Kryger, Talanta 28 (1981) 871. [32] R. Goodacre, D. Kell, G. Bianchi, Analysis Europa June (1995) 35.
[33] V. Baeten, M. Meurens, M.T. Morales, R.J. Aparicio, Agric. Food Chem. 44 (1996) 2225. [34] D.A. Shaw, A. di Camillo, G. Vlahov, A. Jones, G. Bianchi, J. Rowland, D.B. Kell, Anal. Chim. Acta 348 (1997) 357. [35] R. Stella, J.N. Barisci, G. Serra, G.G. Wallace, D. de Rossi, Sens. Actuator B 63 (2000) 1. [36] A. Guadarrama, M.L. Rodr´ıguez-M´endez, C. Sanz, J.L. Rios, J.A. de Saja, Anal. Chim. Acta 432 (2001) 283. [37] Y. Gonz´alez Mart´ın, J.L. P´erez Pav´on, B. Moreno Cordero, C. Garc´ıa Pinto, Anal. Chim. Acta 384 (1999) 83. [38] FOX 3000 Electronic Nose Operation Manual, Alpha M.O.S.S.A., June, 1996. [39] M. Forina, R. Leardi, C. Armanino, S. Lanteri, Parvus: an Extendable Package of Programs for Data Exploration, Classification and Correlation, Version 1.1, Elsevier, Amsterdam, 1990. [40] A. Edmons, NeuDesk, Version 2.11, Neural Network Development Tool. Neural Computer Sciences, 1991/93.