Analytica Chimica Acta 468 (2002) 105–117
Quantitative fuzzy neural network for analytical determination Liangyi Zhang, Lijing Wen, Yu Lu, Pengyuan Yang∗ Department of Chemistry, Fudan University, Shanghai 200433, China Received 13 September 2001; received in revised form 24 June 2002; accepted 1 July 2002
Abstract A quantitative fuzzy neural network (Q-FNN) for pattern recognition in analytical determination is reported in this paper. The fuzzy neural network (FNN) combines a fuzzy logic system with an artificial neural network (ANN) so that it has both advantages of a high training speed and strong anti-interference. Importantly, the analytical concept of relative error (RE) in quantitative determination has been integrated into FNN so that the Q-FNN provides a very good quantitative capability in chemical analysis, and prevents the system from an over-fitting problem. The logarithm curve with noise in terms of analytical response versus concentration is calibrated by trained FNN and a close approximation to the ideal one without noise is obtained. The Q-FNN has been applied to the concentration determination of freon in the presence of interference gases. The prediction error for a test set in quantification is less than 10% while no qualitative mistake is observed, implying that the quantitative FNN has sustained the feature of pattern recognition. The results indicate that the Q-FNN has obvious advantages not only in converging speed, but also in the quantitative accuracy over the ANN. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Fuzzy neural network; Quantification; Pattern recognition; Sensor array
1. Introduction Pattern recognition algorithms have played an important role in the performance of a chemical sensor array or an electronic nose [1]. There have been a number of approaches to pattern recognition such as nearest neighbor (NN), linear discriminant analysis (LDA), principal components analysis (PCA, e.g. soft independent modeling of class analogy, SIMCA) and back-propagation artificial neural network (BP-ANN) [1,2]. Based on linear systems, most algorithms have ∗ Corresponding author. Tel.: +86-21-65642009; fax: +86-21-65641740. E-mail address:
[email protected] (P. Yang).
troubles in multi-modal and overlapping class distributions. Thus, BP-ANN is progressively regarded as a standard for chemical pattern recognition due to its inherent superiorities in modeling complex and nonlinear data spaces. However, the artificial neural network (ANN) has left much to be desired for its time-consuming training and risks of over-fitting for a given task [3]. During the past 20 years, fuzzy logic-based systems have emerged as a promising and powerful solution to real decision-making problems [4]. The major advantage of fuzzy logic-based systems due to their non-linear character lies in the adaptation and integration of expert knowledge leading to their efficiency in solving complex and non-linear problems. Thus, the
0003-2670/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved. PII: S 0 0 0 3 - 2 6 7 0 ( 0 2 ) 0 0 6 0 4 - 9
106
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
fuzzy logic-based system can better represent the real world through class membership. However, a fuzzy modeling is rather difficult to be trained due to its lack of a proper training approach to learn from experience, i.e. data collected in advance. To search for a training approach for the fuzzy modeling, a fuzzy neural network (FNN) has been developed [4,5]. With the integration of a neural network and a fuzzy logic system, the FNN displays both the great computational power of an ANN and the excellent anti-inference attributes of fuzzy logic. Thus, the FNN becomes an intelligent tool which incorporates the expert knowledge in the form of IF–THEN rules as those in a fuzzy logic system, and meanwhile adopts the training approach similar to that of ANN, such as back-propagation (BP) [5]. Although the FNN has enjoyed a wide application in a variety of areas [6–11], comparatively few is reported relative to chemical determination [10–13]. In addition to qualitative discrimination, accurate quantification is also highly required for the approaches of pattern recognition such as BP-ANN [14,15]. In the real world, the concentration of an analyte usually varies from a few ppm to more than a thousand ppm, which would be 2–4 orders of magnitude over the whole range. Obviously, a conventional BP-ANN model requires some conceptual modification to integrate the quantitative feature. The ANN system should incorporate a prior chemical theory into the neural network structure in order to store chemical knowledge when building a model [16]. A quantitative ANN model has been described (Q-ANN) in our previous report [17], which integrates an analytical concept of relative error (RE) for quantitative determination so that the Q-ANN has a quantitative capability better than a normal ANN [18] for chemical analysis. In this paper, a quantitative fuzzy neural network (Q-FNN) is proposed and studied for quantitative chemical pattern recognition. It is noticed that the FNN system for the quantitative determination has not been fully applied in the field of analytical chemistry. Combined with the RE concept, the Q-FNN is constructed and investigated while its ability of simulation to non-linear system is tested. Its application to quantitative recognition of sensor-array data is presented by the determination of freon gas and the two systems, Q-FNN and Q-ANN are also compared.
2. Theoretical 2.1. The fuzzy system employed for the FNN The fuzzy system employed [18] is a set of IF–THEN rules as follows: Rule Rj :
j
IF x1 is F1 ...
xn is
AND j Fn ,
j
x2 is F2
THEN y = v
AND j
(1)
where x, y are the input and output of a fuzzy system, respectively, n is the number of input for the fuzzy system, for j = 1, 2, . . . , r, r the number of rules, j Fi one of fuzzy conditions belonging to the whole fuzzy set and v j is one of fuzzy consequences accordingly. The logical operator AND means that all the fuzzy conditions run parallel so that the product al gorithm ( ) is used for a rule generation. With the weighted-average algorithm acting as the defuzzification method, the mathematical formulation of the fuzzy system employed for the FNN goes as follows: r n j =1 βj i=1 uF j (xi ) i y= (2) r n u (x ) j i j =1 i=1 F i
where β j are the weights utilized in the weightedaverage algorithm for defuzzification, uF j is the memi
j
bership function for the fuzzy condition Fi . 2.2. The architecture of the fuzzy neural network The architecture of FNN, shown in Fig. 1, is the integration of a fuzzy system and a forward three-layer ANN [17]. In this model, the FNN is composed of four layers, which are namely input layer (A), fuzzification layer (B), rule layer (C), and output layer (D, containing only one node in this study). The signals are directly input into the first layer A without any adjustment. There are total n nodes in the input layer and each node Ai (i = 1, 2, . . . , n) corresponds with one of the n sensor signals. The second layer B is the fuzzification layer corresponding to the IF part in a fuzzy system and the signals input are fuzzified by the membership functions. The membership functions considered in this work belong to Gaussian type (bell-shaped) with a specific mean value and standard
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
107
Fig. 1. Architecture of FNN. The input layer is defined as the first layer, with signals inputted into the network and the fuzzification layer is formed as the second one corresponding to the IF part in a fuzzy system, and the signals inputted are fuzzified with the membership functions. The rule layer is utilized as the third layer, in combination of the THEN part in a fuzzy system with the hidden layer of an ANN. The followed last layer is the output layer. Signals are transferred in order from the first layer to the last layer and the real output of FNN is obtained at the output layer.
deviation. As to each input node Ai , there are Ni membership functions of Gaussian type with different mean values and standard deviations (note: Ni can be different for individual input node Ai , however, Ni is the same in this work). The fuzzy mode can be defined as one of Ni nodes in the fuzzification layer connected with Ai , thus there are total Ni fuzzy modes and each input node Ai in the input layer corresponds with a group of Ni different fuzzy modes accordingly. There are total Ni nodes in the fuzzification layer and each node Bl (l = 1, 2, . . . , Ni )is corresponding with one fuzzy mode. Thus, the output of the fuzzification node Bl can be evaluated as follows: xl − ml 2 2 (3) outl = ul (x) = exp − σl where ml and σ l are mean value and standard deviation of the Gaussian-shaped membership function, respectively (note: the function ul (x) is the simplified form of uF j (xi ) in Eq. (2) for generalization). i The third layer C is the rule layer which connects the IF part and THEN part of the fuzzy system. Each rule node Cj (j = 1, 2, . . . , ni=1 Ni ) is connected
with n different fuzzy modes, each of which belongs to an individual group of Ni fuzzy modes, thus there are total ni=1 Ni nodes in the rule layer accordingly. Therefore, each fuzzification node Bl layer is connected with ni=1 Ni nodes in the rule layer (note: the prime ( ) in the symbol means that in the prodnode uct, the number Nl belonging to the fuzzification Bl is omitted), thus there would be total n ni=1 Ni inputs for the rule layer. Product algorithm is used for the rule generation and all the weights between the fuzzification layer and the rule layer are constant of 1, thus the output of the rule node Cj can be evaluated as follows: out3j = uk (x) (4) k
where k represents the combination of n different fuzzy modes corresponding with the Rule Rj , each of which belongs to an individual group of Ni fuzzy modes, so that there are n different expressions of uk (x) evolved in the product of Eq. (4). The last layer D is the output layer whereeach output node is connected with all the r (r = ni=1 Ni ) rule nodes by a specific defuzzification method.
108
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
The fuzzy system is defuzzified by the means of weighted-average in this work which is shown as follows: y=
r
j =1
βj out 3j
(5)
(14)
σl (t + 1) = σl (t) + η3 σl (t)
(15)
where η1 , η2 , η3 are the learning rates. 2.4. Relative error in analytical determination
2.3. The training algorithm of the FNN The training of the FNN is performed with the well-known error BP algorithm widely used in the training of ANN. There are two types of parameters in the Q-FNN model, including structure parameters such as the mean value (ml ) and standard deviation (σ l ) of the Gaussian-function, and connecting weight parameters (β j ) between the rule layer and output layer. Gradient-Descent algorithm is adopted to adjust these parameters in the training of FNN. First the error criterion function E is defined as follows: E = 21 (d − y)2 = 21 e2
(6)
where d is the target signal of the sample, y the output of FNN and e the error of each output of FNN. Combined with the mathematic formulation of each layer, the error signals (δ) for BP are inferred as follows: δ4 = d − y = e
(7)
δj3 = δ 4 βj
(δp3 out 3p ) δl2 =
(8)
where n p represents the corresponding amount of i=1 Ni rule-layer nodes which connects the node Bl in the fuzzification layer so that there are total ni=1 Ni different δp3 corrections evolved in the summation of Eq. (9). Thus, the parameters of FNN would be adjusted by the following method: β j = δ 4 out 3j
(10)
2(xl − ml ) (σl )2
(11)
2(xl − ml )2 (σl )3
(12)
βj (t + 1) = βj (t) + η1 βj (t)
(13)
ml = δl2
Generally, whether the results of practical measurements are satisfactory can be evaluated through the comparison of the experimental result (Rx ) with the ideal one (Ri ), then an analytical conclusion can be drawn according to the value of RE: RE =
Rx − Ri Ri
(16)
It is noticed that the value of RE is a concentrationrelated parameter. Generally speaking, low concentrations especially for a concentration reaching the limit of detection (LOD) demand a large or relatively large RE. In contrast, higher concentrations prefer a smaller RE. Clearly, the RE cannot be a constant over the whole concentration range. A hyperbolic type profile would well describe the relation of inverse proportion between the RE requirement and corresponding concentration [17]. This RE concept has been implemented into the FNN model mainly for the convergence criterion of iteration, as described in Section 2.5. 2.5. The evaluation and control of the training of FNN
(9)
p
σl = δl2
ml (t + 1) = ml (t) + η2 ml (t)
The criterion to evaluate and control the convergence in the training of Q-FNN is the RE value for any predicted concentration, and the passed ratio for training set. There could be some abnormal data points in data collection, which lead to an over-fitting problem in the training of Q-FNN. The passed ratio can be used to minimize the contribution from abnormal data, and defined as below. It is regarded that when the RE of a predicted result is smaller than the permitted one for this concentration level, the predicted result is believed to be correct. When all the data in the training pass this check, the FNN is considered fully trained. To apply this rule in FNN, the permitted RE for each output must be examined in each epoch. Thus, in numerical iterations, the convergence of training process is judged by the RE value of the Q-FNN output
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
as follows: yc − yik ≤ RE(c) y c
109
3.2. Data processing (17)
where yc is the known output value in the training set and yk the output of Q-FNN. At the end of each epoch, each value in the training set is checked up by Eq. (14) whether it passes the permitted RE. The passed ratio for a training set is estimated by the error rate (ee) as shown in Eq. (18), used to evaluate the convergence of a training process and the accuracy, as the second criterion for the trained FNN. ne ee = (18) nt where ne refers to the number of mis-predicted samples, which means an exceeding of the permitted RE, and nt refers to the number of all predicted samples. Obviously in an ideal situation, the value of ee should be close to zero after the FNN has been trained. However, in fact, an ee of less than 10% might also be satisfactory in a real training.
x =
log x + m , 2m
The selection of a pre-process and post-process function has been studied for quantification to improve the training efficiency of the FNN. A clever transformation has to be considered between the model input/output and real-world response/concentration. The pre-process function is to transform the ideal output (target signal) into the proper value range of FNN, and the post-function is to transform the FNN output back into the real output (concentration) accordingly. In our previous paper [17], a combination function of logarithm function and sigmoid function was successfully used as the pre-process and post-process function of ANN. However, FNN can deal with issues in a wider range, so that the output range of the pre-process function can be enlarged to enhance the quantitative accuracy. In addition, a linear function is applied to the present Q-FNN instead of an original sigmoid function in the previous Q-ANN. Thus, a new pre-process equation, which is composed of logarithm function and linear function, is shown in Eq. (19). The corresponding post-process equation can be found in late Eq. (20). x ≤ lim
log lim + m x − lim log lim + m + 4− , 2m max − lim 2m
3. Experimental 3.1. Hardware and software All calculations and data processing were done with an IBM-compatible PC (Intel PIII 650 MHz or above).
x > lim
where lim is the transform point which combines the two functions together. Before training, all data in the training set were pre-processed by Eq. (19). The parameter of m presets the maximum range for the applied concentration with which the Q-FNN can deal. Thus, for a given m, the concentration range can be from 10−m to 10m . For example, when m was set to 3 as in this study, the Q-FNN can cover a concentration range from 0.001 to 1000. In the meantime all outputs of FNN were transformed accordingly back into real values through the post-process relation, as indicated in Eq. (20).
6x −3 , 10 x= max − lim x − log lim + 3 + lim, 6 (4 − (log lim + 3)/6) The programs to implement Q-FNN were written in Microsoft visual C++ 6.0 for Windows 9x or above.
(19)
x ≤ lim x > lim
(20)
By these means, a better quantitative result will be likely reached.
110
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
3.3. Quantitative determination of freon
Table 1 Concentrations of tested samples
3.3.1. Set-up and procedure of experiment In our experiment, FNN is used to implement quantitative detection of freon gas through a sensor array with four chemical sensors. Four metal oxide semiconductor sensors were utilized to exhibit individual sensitivity for freon, alcohol, gasoline and combustible gas. The FNN structure is therefore composed of four inputs for sensor responses and one output only for freon concentration. The set-up of the experiment is similar to our previous report [17]. Gas samples are quantitatively prepared and injected, and then passed through pipes and finally entered a sensor-array chamber. A gas chromatograph (GC-9790, Shanghai Analytical Instrument Corp.) was connected in a cascade way to monitor online gas concentration. The metal oxide gas sensors used are resistive sensors and the signal (Ggas ) is the sensor electrical conductance in the presence of gas while Gair is the sensor conductance in the air. The signals of the sensor array were processed and gathered by a micro-controller unit (MCU). Eventually, the stable signals Ggas of the sensor array were collected by an IBM-PC with the MCU control, and the responses of the four sensors were normalized as Eq. (18) and recorded:
Number
Concentration (ppm)
1 2 3 4 5 6 7 8 9 10 11 12 M1 M2 M3 M4 M5 M6
0 5 10 20 30 60 80 100 200 400 500 1000 0 30 0 30 0 20
G=
Ggas − Gair Gmax
(21)
where Gmax is the maximum input signal allowed by the electronic system. The concentrations of freon gas samples tested in experiments are listed in Table 1. Freon samples with 12 levels of concentration were used without any interference gases. Other freon samples in the presence of water or gasoline or lubricant oil vapors, were also prepared. Altogether 106 samples of these concentrations were measured and were taken as the training set. The training set, together with their concentration used as target signal, was employed to train the Q-FNN. Also, another 32 samples, 18 measured data and 14 interpolated data that were selected from the response curve of freon, were taken as the test set (‘unknown’ samples). All the data in the test set were not trained. Data in the training set and the test set (‘unknown’ samples) are all composed of responses of four sensors. Eventually, the concentrations of the test set are predicted by the well-trained Q-FNN and
(with (with (with (with (with (with
520 ppm gasoline) 520 ppm gasoline) 300 ppm lubricant oil vapor) 300 ppm lubricant oil vapor) saturated water vapor) saturated water vapor)
the results of prediction are compared with the ideal concentrations of 32 ‘unknown’ samples. 3.3.2. The FNN model and training mode The FNN model used here is shown in Fig. 1. There are four nodes in the input layer, each of which represents one semiconductor sensor. In the fuzzification layer, there are total three membership functions connected to input such that there are 12 nodes in this layer. To extract the fuzzy rules, three membership functions are selected and there are 81 nodes in the rule layer. Because freon concentration is concerned only, there is one node in the output layer. A successful training can be made as following. All the membership functions are initiated evenly in the whole range of the sensor response and all the defuzzification-weights between the rule layer and output layer (β j ) were randomized between −1 and 1. The initiatory learning rates were all set to 0.0001. At each end of iteration, the convergence of training process was checked by Eq. (14). The training data which passed the RE requirement were marked and would not be trained in the next run but would be checked again for its validation of marker. When the error rate was lower than 0.35, the algorithm would automatically reduce the learning rate based on the error rate. Altogether about 7000 epochs were needed for the whole training process.
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
111
Fig. 2. Simulation of sinusoid profile. The solid curve is the standard curve of sinusoid; and the dark-circle dot, the simulation curve by the trained Q-FNN. In the range of 0–2π , 50 points on the sinusoid were picked up as the training set, and another 50 points were picked up as the test set. A total of 96% of the training set have passed the RE requirement after 20,000 epochs and 5% error rate exits in the prediction of the test set.
4. Result and discussion 4.1. Non-linear simulation 4.1.1. Simulation of standard sinusoid To illustrate the ability of Q-FNN to recognize the unknown non-linear system, the sinusoid simulation is chosen for Q-FNN as shown in Fig. 2, because the sinusoid curve is one of the most complex curves with a number of inflexions. In the range from 0 to 2π , 50 points at the standard sinusoid were picked up as the training set, and another 50 points as the test set. There are only one input and one output of the Q-FNN, and the number of membership functions (nMem) is set to 3, thus the model of the Q-FNN is in a 1-3-3-1 structure where each figure represents the number of nodes in each layer. In order to avoid negative data, each output of the sine-wave is added by 1 in computation for convenience, and the output of Q-FNN is subtracted by 1 accordingly. The RE values permitted are set to 5% in the range of 1.0–2.0, 10% in the range of 0.1–1.0 and 15% in the range of 0–0.1. The simulation is done with Q-FNN under the following conditions: training data passed in the last iteration would not be trained in the current training turn and the error rate of training set is assigned to be 5% for the converged Q-FNN. After about 20,000 epochs, 96% data in the training set has passed the RE check, which indicates
that the Q-FNN has been well trained. All data in the test set were predicted then by the trained Q-FNN with an error rate of 5%. It can be seen from Fig. 2 that in one period of sinusoid curve the simulation curve approximates the standard curve quite accurately. Thus, the trained Q-FNN can simulate the desired complex function quickly in high accuracy, illustrating a very good ability to simulate the unknown non-linear curve. 4.1.2. Simulation of quantitative calibration curve in the presence of noise The type of logarithm curve is simulated as a quantitative calibration curve to test the ability of Q-FNN. The logarithm standard curve is one kind of sensor–response curves with concentration in a logarithm scale, and is widely used in analytical chemistry. Random noises were superimposed onto the simulated logarithm curve to imitate real cases. The noise generation is set as the following Eq. (19): y = log x ± y = log x + c × RSD × rand
(22)
where RSD is the RE requirement of x (concentration), y happens to be RSD mathematically, rand a random number between −1 and 1 which limits the varying range of noise within y, and c the fraction of noise level added to the standard curve. It is clear that the second term in right of Eq. (7) contributes to the noise amplitude.
112
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
Fig. 3. Simulation of quantitative calibration curve in logarithmic type with different noise levels: (a) the RE requirement with concentration; (b) the simulation curve of logarithm curve superimposed with 50% full noise level; and (c) the simulation curve of logarithm curve with full noise level. In (b) and (c), the concentration in the x-coordinate is set in the logarithm scale and the corresponding y-coordinate stands for the response intensity, and the solid curve is the standard logarithm curve without noise. The cross-sign, the logarithm curve with superimposed noises; and the dark-circle, the simulated curve by the trained Q-FNN.
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
In the range of 0.001–1000, 60 points on standard logarithm curve were picked up and different noises were added as defined in Eq. (19). Two training sets were obtained for c parameter of 50 and 100%, respectively. Another 50 points picked up from the standard curve were used as a test set. The Q-FNN is trained under the same condition as Section 3.3.2. The test set was predicted by the trained Q-FNN accordingly. All the RE permitted are shown in Fig. 3a and set according to the requirement for a normal analytical purpose [17]. Fig. 3b and c display the prediction of the trained Q-FNN for 50 and 100% noise level responding, respectively. Using the data of each training set, the Q-FNN has been trained and the data of test set were individually predicted then. In Fig. 3b and c, the concentration in the x-coordinate is set in the logarithm scale and the corresponding y-coordinate stands for the response intensity. It can be seen that the trained Q-FNN approximates the desired function quite accurately when the noise is either 50 or 100% at the concentration range of 0.1–1000. These results illustrate that in such a large concentration range Q-FNN still has a strong quantitative ability to calibrate response curve versus concentration. Thus, Q-FNN can be competent for the simulation to the non-linear system with intense interference in large range. There are still some biased predictions mainly in the concentration range from 0.01 to 10. The RE concept in the analytical field [17] shows that a larger RE is expected for low concentrations and especially for a concentration near the LOD. Thus, the prediction for the lower concentration range with a larger RE is reasonable, thus can still meet the RE requirement in the calibration curve of response versus concentration. 4.2. Quantitative determination of freon The predicted results of test set are shown in Table 2, when optimized conditions for Q-FNN are applied. The samples numbered with asterisks are interpolated data and the others are tested samples. The interpolated data were estimated from the response curves of sensors towards the gas samples. The permitted RE values (listed in Table 2) were assigned, according to the error requirement for each concentration interval
by the following Eq. 0.05/ε, 5/x, RE(x) = 10/x, 0.1,
113
(20). x≤ε ε
(23)
x > 100
where ε is set to 0.001. The Q-FNN had been trained under the conditions mentioned at Section 3.3.2. The transform point is set to 15. After 6880 epochs, 86% data in the training set have passed the half value of the RE requirement, and result in a converged Q-FNN. Concentrations of unknown samples were predicted with this trained Q-FNN with an error rate of 6.25% only. This error rate is considered to be rather satisfactory for a Q-FNN designed for quantitative analysis of gas mixture. As to the gas mixture with different inferences, there is no qualitative error and the quantitative result is also very good, implying an anti-interference ability of Q-FNN. Only two interpolated points near the concentration of 100 ppm do not fulfill the RE requirement. The reason for this departure might be mainly due to the sharp response curve of freon sensor with concentration range of 80–100 ppm, and partially for the instability of random excursion of sensors. The PLS method for the training set is also tried (results are not shown), where only the signal of the freon sensor and the ideal concentration in the training set are used. However, the result seems not well satisfactory. 4.3. The number of membership functions in the Q-FNN model In order to enhance the accuracy and learning speed of Q-FNN, it is essential to select a proper nMem. The nMem decides that of nodes in the second and third layer of the Q-FNN model, thus decides the total structure of the network. As to each nMem, the Q-FNN model is adjusted as follows: the fuzzification layer has 4∗ nMem nodes and the rule layer has the 4th power of nMem nodes. Four numbers are selected for nMem 2–5. The data listed in Table 1 is used as the training set, and the test set is predicted after each Q-FNN model is well trained. The converging speed and predicting accuracy (error rate) are illustrated in Table 3 to show the importance of selection of a proper nMem. It can be seen
114
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
Table 2 Result of trained Q-FNN applied in quantitative determination of freon Number
Predicted concentration (ppm)
Ideal concentration (ppm)
Experimental RE
Permitted RE
I1∗
5.2 9.3 14 21 27 40 51 102 122 149 230 309 422 743 0.004 2.4 9.8 24 27 58 70 99 198 413 482 986 0.005 16 0.036 32 0.005 28
7.0 8.1 12 17 27 37 51 86 107 154 244 307 415 750 0 5 10 20 30 60 80 100 200 400 500 1000 0 20 0 30 0 30
0.26 0.13 0.11 0.25 0.027 0.070 0.0004 0.19 0.13 0.031 0.057 0.004 0.014 0.010 3.2 0.51 0.019 0.20 0.11 0.037 0.12 0.007 0.009 0.033 0.034 0.013 4.1 0.20 35 0.074 3.7 0.081
0.72 0.61 0.4 0.29 0.18 0.13 0.2 0.11 0.10 0.10 0.10 0.10 0.10 0.050 50 1.0 0.50 0.25 0.17 0.17 0.13 0.10 0.10 0.10 0.10 0.050 50 0.25 50 0.17 50 0.17
I2∗ I3∗ I4∗ I5∗ I6∗ I7∗ I8∗ I9∗ I10∗ I11∗ I12∗ I13∗ I14∗ 1 2 3 4 5 6 7 8 9 10 11 12 M1 M2 M3 M4 M5 M6
Items with asterisks are interpolated data estimated from the response curves vs. concentration, and other numbers can be found in Table 1. Convergence condition, 86% of all the data in training set passed the 50% RE requirement; converging speed, 6880 epochs.
from this table that the converging speed and predicting accuracy would be the best when the nMem is 3. When nMem is 5, the predicting accuracy is also very satisfactory, however, the efficiency of study is not as
good as that for nMem equal to 3. Furthermore, the BP period of each epoch is also an important factor of the converging time, the higher the value of nMem, the more nodes of second and third layer resulting in
Table 3 Selection of number of membership functions (nMem) in Q-FNN model nMem
Converging speed (epochs)
Total ee
ee1 (<10)
ee2 (between 10 and 250)
ee3 (between 250 and 500)
ee4 (gas mixture)
2 3 4 5
16000 7000 14000 12000
0.27 0.07 0.27 0.13
0 0 0.13 0
0.38 0.11 0.38 0.22
0.33 0 0.17 0
0.33 0 0.33 0
Convergence condition, 80% of all the data in training set passed the 80% RE requirement. E is error rate of predicted concentrations of unknown samples. ee1 , ee2 , ee3 and ee4 are error rates for individual concentration ranges.
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
the larger computational amount and the longer BP of each epoch. Thus, it is noticed that the lowest value of nMem is preferred for the Q-FNN when the predicting accuracy is guaranteed and that a value of 3 can be taken for nMem in this study. 4.4. Comparison of Q-FNN and Q-ANN To compare the present model with a previous Q-ANN model, the data listed in Table 1 was used as the training set and the untrained data for interference gas and for interpolated data was used as the test set. The Q-ANN had been trained under the following condition [17]. The transform point is set to 100 with the training order from small to large and when RE requirement is satisfied in the last iteration the data will be omitted next time. After 50,000,000 epochs, 70% data in the training set have passed the half value of the RE requirement. All data in the test set is predicted by the trained Q-ANN and Q-FNN trained in the Section 4.2,
115
respectively. The predicted results are listed in Table 4. It can be seen that the Q-FNN has been converged after only 6880 epochs, while the Q-ANN needs 50,000,000 iterations. In the concentration range of 0–300 ppm except near the 100 ppm, both the two models approximate the ideal concentration very well, and when the concentration is more than 300 ppm, the Q-FNN has also provided a very good result. As to the gas mixtures with different strong interference, both the two models have no qualitative error, while the Q-FNN approximates the ideal concentration better when the prediction accuracy is concerned. Because the fuzzy logical system was introduced the Q-FNN is excellent in knowledge study as well as advantageous in convergence speed [20]. From the comparison of the error rate in prediction, especially the error rate for the gas mixture with interference, it can be seen that the Q-FNN shows a strong anti-interference capability. In addition, the Q-FNN is competent to deal with the data in a high speed in the presence of large random noise existing frequently in
Table 4 Prediction comparison of Q-FNN and Q-ANN in quantification Number
I1∗ I2∗ I3∗ I4∗ I5∗ I6∗ I7∗ I8∗ I9∗ I10∗ I11∗ I12∗ I13∗ I14∗ M1 M2 M3 M4 M5 M6
FNN
ANN
Predicted concentration (ppm)
Result checked
Predicted concentration (ppm)
Result checked
5.2 9.3 14 21 27 40 51 102 122 149 230 309 422 743 0.005 16 0.036 32 0.005 28
Pass Pass Pass Pass Pass Pass Pass Fail Fail Pass Pass Pass Pass Pass Pass Pass Pass Pass Pass Pass
4.2 7.2 13 18 26 40 52 100 96 146 230 284 443 924 0.001 16 0.001 35 0.004 24
Pass Pass Pass Pass Pass Pass Pass Fail Pass Pass Pass Pass Pass Fail Pass Pass Pass Pass Pass Fail
Ideal concentration (ppm)
Permitted RE
7 8.1 12 17 27 37 51 86 107 154 244 307 415 750 0 20 0 30 0 30
0.72 0.61 0.4 0.29 0.18 0.13 0.2 0.11 0.1 0.1 0.1 0.1 0.1 0.05 50 0.25 50 0.17 50 0.17
All the items are the same as in Table 2. Q-FNN convergence condition, 86% of all the data in training set passed the 50% RE requirement; converging speed, 6880 epochs. Q-ANN convergence condition, 70% of all the data in training set passed the 50% RE requirement; converging speed, 5 × 107 epochs. Each predicted result is checked with the RE requirement. And “Pass” presents that the result pass the RE requirement while “Fail” presents the result does not pass the RE requirement.
116
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117
an analytical measurement. In the whole concentration range of 0–1000, the Q-FNN is promising. Thus, Q-FNN has better quantitative accuracy in a wider range such as from 0.1 to 1000, compared with ANN approaches [17,19].
test increase with the number of the weights left. Thus, the original Q-FNN model is the most valid model in all the models mentioned in this study.
5. Conclusion 4.5. Cross validation test for the Q-FNN model To test the validation and stability of the Q-FNN model, a cross validation test has been done. The data listed in Table 1 were used as the training set and 14 untrained data (without interpolated data) were used as the test set. The cross validation test was done mainly with the connecting weight parameters in the Q-FNN model. One or more selected connecting weight(s) between the rule layer and the output layer is set to be constant of 1 (equivalent to leave one or more parameter(s) out) and then a series of FNN models with different number of weights can be constructed. In the cross validation test, the RE concept has also been integrated into the criteria function as follows: CV =
n
i=1
pre
[RE(yi
− yiideal )]2
(24) pre
where n is the number of the data in the test set, yi the prediction result, yiideal the target signals and RE the error requirement for the yiideal . These Q-FNN models are trained sequentially until the 70% data in the training set meets the 50% RE requirement. And then the data in the test set is predicted by each trained model, respectively. According to the cross validation test, a smaller CV value would indicate a more stable model. The results of the cross validation test show that the original Q-FNN model with 24 structure parameters and 81 weights has the least CV value of 11.9. In contrast, the CV values of the model by leaving one connecting weight out are between 12 and 21.5, and the CV values of different models seem to have a small change with the position of a selected weight parameter. As to the model leaving five weights out, the CV value is 16.6 for the model with five consecutive weights left, while the CV value is 12.9 for the model with five disperse weights left. The CV values of the model with disperse 10, 20 and 40 weights left are 18.2, 24.7 and 45.3, respectively, which are much greater than the original Q-FNN model. It can be therefore seen that the CV value of the cross validation
Structurally, the quantitative Q-FNN has integrated the concept of RE for the error treatment in FNN model as well as an additional fuzzy logic system inferred for chemical pattern recognition. Thus, high learning efficiency and good prediction accuracy are obtained in the training process for an analytical quantification. In addition, through the induced RE concept in the analytical chemistry and the control of training by a proper error rate (ee) of training set, the over-fitting problem has been fairly solved. Due to its high converging speed the Q-FNN can meet the need of an online training and that of determination in an analytical application such as electronic noses. The Q-FNN might be competent for the implementation of the electronic nose online. The Q-FNN can, apart from gas-mixture identification and quantification, be also possible to address other analytical issues concerning spectrum data.
Acknowledgements The research is supported by Ministry of Science and Technology of China (Contract # 96-A23-03-07), and partially by NFS of China. References [1] R.E. Shaffer, S.L. Rose-Pehrsson, R.A. McGill, Anal. Chim. Acta 384 (1999) 305–317. [2] G. Barkó, J. Abonyi, J. Hlavay, Anal. Chim. Acta 398 (1999) 219–226. [3] J. Savkovic-Stevanovic, Comp. Chem. Eng. 18 (11/12) (1994) 1149–1155. [4] C.T. Lin, C.S.G. Lee, Neural Fuzzy Systems, Prentice-Hall, NJ, 1996. [5] Y.P. Yang, X. Xu, W.Y. Zhang, Fuzzy Sets Syst. 114 (2000) 325–328. [6] R.J. Kuo, K.C. Xue, Fuzzy Sets Syst. 108 (1999) 123–143. [7] B. Novakovic, D. Scap, D. Novakovic, Eng. Appl. Artif. Intell. 13 (2000) 71–83. [8] Y.C. Huang, X. Z Wang, Chem. Eng. Sci. 54 (1999) 2731– 2738.
L. Zhang et al. / Analytica Chimica Acta 468 (2002) 105–117 [9] X.H. Song, P.K. Hopke, M.A. Bruns, D.A. Bossio, Chemometrics Intell. Lab. Syst. 41 (1998) 161–170. [10] R. Li, P. Wang, W.L. Hu, Sens. Actuators B 66 (2000) 246– 250. [11] B. Yea, R. Konishi, T. Osaki, K. Sugahara, Sens. Actuators A 45 (1994) 159–165. [12] B. Yea, T. Osaki, K. Sugahara, R. Konishi, Sens. Actuators B 41 (1997) 121–129. [13] B. Yea, T. Osaki, K. Sugahara, R. Konishi, Sens. Actuators B 56 (1999) 181–188. [14] E. Llobet, J. Brezmes, Y. Vilanovc, J.E. Sueiras, X. Correig, Sens. Actuators B 41 (1997) 13–21.
117
[15] J.W. Gardner, P.N. Barlett, Sens. Actuators B 211 (1994) 18–19. [16] Z. Wang, B.R. Kowalski, Anal. Chem. 67 (1995) 1497– 1504. [17] Y. Lu, L.P. Bian, P.Y. Yang, Anal. Chim. Acta 417 (2000) 101–110. [18] T. Takagi, M. Sugeno, IEEE Trans. Syst. Man Cybernet 15 (1985) 116–132. [19] A. Szczurek, P.M. Szecowk, B.W. Licznerski, Sens. Actuators B 58 (1999) 427–432. [20] R.J. Kuo, P.H. Cohen, Neural Network 12 (1999) 355– 370.