Incremental learning in Fuzzy Pattern Matching

Incremental learning in Fuzzy Pattern Matching

Fuzzy Sets and Systems 132 (2002) 49 – 62 www.elsevier.com/locate/fss Incremental learning in Fuzzy Pattern Matching Moamar Sayed Mouchaweh ∗ , Arna...

341KB Sizes 0 Downloads 72 Views

Fuzzy Sets and Systems 132 (2002) 49 – 62

www.elsevier.com/locate/fss

Incremental learning in Fuzzy Pattern Matching Moamar Sayed Mouchaweh ∗ , Arnaud Devillez, Gerard Villermain Lecolier, Patrice Billaudel Laboratoire d’Automatique et de Microelectronique, IFTS, 7, Boulevard Jean Delautre, 08000 Charleville-Mezieres, France Received 16 January 2001; received in revised form 5 October 2001; accepted 14 February 2002

Abstract We use learning methods to build classi5ers in using a set of training samples. A classi5er is capable to assign a new sample into one of the di6erent learnt classes. In a non-stationary work environment, a classi5er must be retrained every time a new sample is classi5ed, to obtain new knowledge from it. This is an impractical solution since it requires the storage of all available samples and a considerable computation time. The development of a classi5er that is capable to acquire new knowledge from each new classi5ed sample while preserving the current one is known as incremental learning. Our team of research “Diagnosis of Industrial Processes” works on diagnosis in using fuzzy classi5cation methods for data coming from industrial and medical sectors. We use the Fuzzy Pattern Matching (FPM) as a method of classi5cation and the transformation probability-possibility of Dubois and Prade to construct densities of possibilities. These densities are used to assign each new sample to its suitable class. When FPM works in non-stationary environment, it must update its possibility densities after the classi5cation of each new sample. The goal is to adapt the classi5er to the possible changes in the work environment. When the number of samples increases, the update time also increases. Furthermore the memory size required increases to store all the samples. In the literature, there is not any published paper that integrates the incremental learning in FPM. Thus, in this paper, we propose a method to integrate it. Then we show that the update time and the memory size are constant and independent from the number of samples. Finally we illustrate the advantages of this method in using di6erent examples. c 2002 Elsevier Science B.V. All rights reserved.  Keywords: Diagnosis by pattern recognition; Fuzzy Pattern Matching; Possibility theory; Incremental learning

1. Introduction An industrial system works under one of two operating states: normal or abnormal ones. Sensing devices feed a set of observations about the operating state of this system. These observations are divided into subsets of points, called classes or patterns, in ∗

Corresponding author. E-mail addresses: [email protected] (M.S. Mouchaweh), [email protected] (A. Devillez), [email protected] (G.V. Lecolier), [email protected] (P. Billaudel).

using unsupervised learning method or human experience. These observations with their class assignments form the training set. A supervised learning method uses the training set to build a classi5er that best separates the different learnt classes. The task of the classi5er is to assign a new input, which represents the actual operating state of the system, to one of these classes. Each class is associated with an operating state by an expert. Then, in classifying a new point, the diagnosis by pattern recognition determines the operating state of the system at each instant [8].

c 2002 Elsevier Science B.V. All rights reserved. 0165-0114/02/$ - see front matter  PII: S 0 1 6 5 - 0 1 1 4 ( 0 2 ) 0 0 0 6 0 - X

50

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

In a non-stationary work environment, the classi5er must be able to increase its knowledge in extracting information from each new classi5ed point. This new knowledge can improve the accuracy of the classi5cation and adapt the classi5er to possible changes in its work environment. To accomplish this task, the classical solution is to retrain the classi5er every time a new sample is classi5ed. This is an impractical solution since it requires the storage of all available samples and a considerable computation time [2]. Thus the incremental learning is the practical solution since it preserves the current knowledge and acquires the new one from the new sample. Only few papers have been published that deal with the incremental learning problems that are still an open issue in the pattern recognition literature [2]. Our team uses Fuzzy Pattern Matching (FPM) as a method of classi5cation for its simplicity and its low calculation time [1]. FPM is a multicriteria supervised classi5cation method [9]. It has been developed in the framework of fuzzy set and possibility theory in order to take into account the imprecision and the uncertainty of the data [7]. It uses a transformation probability–possibility to construct densities of possibilities. These densities are used as membership functions to assign each new sample to its suitable class. When FPM works in non-stationary environment, it must update its possibility densities after the classi5cation of a new sample. When the number of samples increases, the update time also increases. Furthermore the memory size required increases to store all the samples. In the literature, there is not any published paper that integrates the incremental learning in FPM. Thus, in this paper, we propose a method to integrate it. In Section 2 we review the basics of possibility theory then in Sections 3 and 4, we explain the functioning principles of FPM in using a simulated example. Then, in Section 5, we present our method to integrate the incremental learning in FPM and we demonstrate that the update time and memory size are constant and independent from the number of samples in the training set. Section 6 shows the experimental results of the comparison between the classical and incremental learning in using several examples.

2. Possibility theory The importance of the possibility theory, which was originally introduced by Zadeh, stems from the fact that most of information on which the human decisions are based, is possibilistic rather than probabilistic in nature [11]. Let X denotes a universal set assumed to be 5nite, a possibility measure  must satisfy the following conditions [5]: (X ) = 1

and

(?) = 0;

∀A; B ⊂ X; (A ∪ B) = max((A); (B)):

(1) (2)

Eq. (2) de5nes the basic feature of possibility theory that is the ordered feature on contrary of the additive one of probability theory. For this reason, the possibility measure needs less information than the probability one. Each possibility measure is associated with a necessity measure N which must satisfy: N (X ) = 1

and

N (?) = 0;

∀A; B ⊂ X; N (A ∩ B) = min(N (A); N (B)):

(3) (4)

Contrary to the probability, an event and its contrary are both completely possible as indicated in (5).   K ¿ 1: (A) + (A) (5) The relation between the possibility and necessity measures is de5ned by K N (A) = 1 − (A):

(6)

To obtain possibility measures, we can use a probability–possibility transformation. This one must keep a connection between probability and possibility measures. This connection is called the probability– possibility consistency principle [11]. Dubois and Prade consistency principle is de5ned as the following: going from a probabilistic representation to a possibilistic one, some information is lost because we go from point-valued probabilities to interval valued ones [6]. So possibility measure can be viewed as an upper function and this fact can be used to change probability into possibility or converse [6]. This transformation is de5ned as (7) [6]: k =

l=h  l=1

min(pk ; pl )

(7)

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

51

3. Fuzzy Pattern Matching (FPM)

3.2. Classi7cation of a new sample

This method, presented in [9], uses the de5nition of c prototypes of the c classes which are described by n attributes. Each prototype is a collection of n fuzzy sets Pi1 ; : : : ; Pin , expressing the set of typical values of each attribute for the class Ci .

The classi5cation of a new sample x whose values of the di6erent attributes are x1 ; : : : ; xj ; : : : ; xn , is made in two steps [3,4]:

3.1. Estimation of the densities of possibilities The prototypes are the densities of possibilities estimated for each attribute from each class. They are constructed in using the theory of possibility to consider the imprecision and uncertainty of the data. The histograms of data are transformed into distributions of probability p which are transformed, in their turn, into distributions of possibilities  in using the transformation of Dubois and Prade which is de5ned in (7). Finally these distributions are transformed into densities by linear linking. The histograms of data are constructed in using the learning set. The lower and upper bounds of the histogram for an attribute of a class are usually the minimal and maximal values of the learning set. These bounds can also be determined by the user. The number of bars h of the histogram determines the width of each bar. This parameter has a very important inLuence on the performances of the FPM method [3,4].

Learning set Ω = {x1, x2,…., xN}

• determination of the global matching degree between xj and the prototypes Pij . We obtain the matching degree ji between the sample and the prototype of the attribute j from the class Ci , • all the matching degrees concerning the class Ci : 1i ; : : : ; ni , are merged into a single one by an operator H : i = H (1i ; : : : ; ni ):

The result i of this fusion corresponds to the matching degree between the new sample and the prototype Pi , or in other words the possibility for this sample to belong to the class Ci . The matching operator can be a multiplication, a minimum, an average, or a fuzzy integral [9].

4. Functionality of FPM using the theory of possibility Fig. 1 shows the di6erent steps of the functionality of FPM.

New point Classification phase

Learning phase construction of the histograms for each attribute of each class

calculation of the probability distribution form the histograms

turning the probability distributions into possibility ones in using the transformation of Dubois and Prade

(8)

determination of the possibility of this point for each attribute of each class by linear interpolation

determination of the possibility membership value for each class by the fusion of the possibilities of all the attributes in using the operator minimum

determination of the class of the new point by the maximum membership rule

Fig. 1. Di6erent steps of the functionality of FPM.

52

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

Fig. 2. Example of a set of training data divided into two classes.

The height of each bar is the number of learning points located in this bar. The probabilities {pl ; l = 1; : : : ; h} of the centres of the bars are calculated in dividing the height of each bar by the total number of learning points situated in the corresponding class. The distribution of possibilities {l ; l = 1; : : : ; h} is calculated by the transformation of Dubois and Prade de5ned in (7). A linear linking changes the distribution of possibilities to the density ones. Fig. 4 shows the densities of possibilities of each attribute for the two classes. In the classi5cation phase, a new point has possibility membership values for each class and each attribute. These membership values are calculated from possibility densities by linear interpolation. For each class, the fusion of the possibility values of the di6erent attributes gives the global membership value. The point is a6ected to the class of the maximum membership value. 5. Incremental learning

Fig. 3. Histograms of data for each attribute of each class.

The following example illustrates the functionality of FPM in using a set of training points divided into two classes as it is shown in Fig. 2. The learning phase starts by constructing the histograms of data. We construct them for each attribute of each class (Fig. 3). The number of bars h for a histogram is determined in using a heuristic criterion. This number has a very important inLuence on the performances of FPM [3].

Learning machines must be adaptive if they have to operate e6ectively in complex, real world problems. In many real-world applications, the knowledge environment is non-stationary. The main assumption of a dynamic work environment is that all information is not available a priori in the training set. The environment changes over time and requires that the learning process tracks this change and includes it in its knowledge base. Building a successful machine learning for a real-world problem might only be possible by incorporating the Lexibility in acquiring the knowledge from each new observation. It is possible to accomplish this task in reproducing entirely a new knowledge base from the new observation plus all previous ones. This solution is impractical since it requires the storage of all available training set and a considerable computation time. Thus the system must learn from each observation when it becomes available while preserving its current knowledge. This task addresses the problem of incremental learning. Incremental learning is a way to control the cost of knowledge update,

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

53

Fig. 4. Possibility densities of each class for the two attributes.

Fig. 5. Possibility densities learnt with the half number of training set of the example of Fig. 2.

it learns from the new observation to adapt the previous knowledge to the changes in the work environment. Thus the incremental learning has the following advantages: • it enables learning of new observation, • it adapts the knowledge base to the changes of work environment, • it reinforces current knowledge. In the next section, we will present our method to integrate the incremental learning in FPM.

6. Incremental learning in FPM When FPM works in dynamic environment, it must acquire the new knowledge from each new classi5ed point. The importance of this new knowledge can be seen in classi5cation by pattern recognition when the number of training points is insuNcient to represent the real shape of classes. To give an example of this case, the learning phase of the previous example is calculated by using the half of training set. Fig. 5 shows the possibility densities learnt with the half number of training set. By comparing Figs. 4 and 5, we can see that the possibility

54

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

Fig. 6. The training points of class 1 are not suNcient to give the real shape of this class.

densities learnt with the half number of training set are di6erent from those learnt with the whale training set. Fig. 6 shows another example of this case where the training points of class 1 are not enough to give the real shape of this class. If FPM does not acquire the new knowledge obtained from the 5rst new classi5ed points, all the other ones will be rejected, (Fig. 7), and then the shape learnt of class 1 will not be the real one. Fig. 8 shows that when FPM learns from each new classi5ed point, it changes the form of class 1 to be as the real one. Additionally, it has been proven that the growth of the number of training points per class improves the classi5cation rate in FPM [10]. Fig. 9 shows the relationship between the classi5cation rate and the number of learning points per class for di6erent values of h. To accomplish this task the classical solution is to recalculate the new histograms and then to transform them into new possibility densities. The increase of the number of points in the training set entails both the increases of the computation time and the memory size. The development of a classi5er that is capable to acquire new knowledge from each new classi5ed sample while preserving the current one is known as the incremental learning. The integration of the incremen-

Fig. 7. FPM does not acquire any new knowledge of its working environment since it rejects all the new points that must be assigned to class 1 to give at the end the real shape of this class.

Fig. 8. FPM acquires the new knowledge from each classi5ed point and thus it learns the real shape of class 1.

tal learning in FPM avoids it to recompute the probability histograms for all the points in the training set. It preserves the current knowledge and acquires the new one from the new point. Thus we need to 5nd a method to update the possibility densities with a computation time and a memory size that are independent

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

55

classification rate in % 100

h=5 h = 10

95

h = 15 90 h = 20 85 h = 25 h = 30

80

h = 35 75

h = 40

70 0

5

10

15

20

25

30

35

40

Number of training points for each class Fig. 9. Relationship between the classi5cation rate and the number of training points per class for di6erent values of h.

of the number of points in training set. In literature, there is no method to integrate the incremental learning in FPM. For this reason, we propose the following method to integrate it. Let Ni denote the number of learning points in the class Ci , represented by n attributes and let the number of bars h be the same for all the histograms, for all the classes and all the attributes. We will make our demonstration for one histogram. This demonstration is identical for all the others. If the number of points in the bar k is bk , the probability pk of the centre of this bar is done by (9). pk =

bk : Ni

(9)

The possibility k of the bar k is done by the transformation of Dubois and Prade. When a new point is classi5ed in the class Ci , the number of points becomes Ni + 1, so the probability of each bar changes. If this point is situated in the bar k, the new probabilities for the centres of bars

are: p1 =

b1 ; Ni + 1

 pk+1 =

p2 =

bk+1 ;:::; Ni + 1

b2 ;:::; Ni + 1 ph =

pk =

bk + 1 ; Ni + 1

bh : Ni + 1

(10)

We change the set of Eqs. (10) to be as (11): p1 =

b1 Ni Ni : = p1 : ; Ni Ni + 1 Ni + 1

p2 =

b2 Ni Ni = p2 : ;:::; : Ni Ni + 1 Ni + 1

pk =

b k Ni Ni 1 1 : + = p1 : + ; Ni Ni + 1 Ni + 1 N i + 1 Ni + 1

 = pk+1

ph =

bk+1 Ni Ni = pi+1 : ;:::; : Ni Ni + 1 Ni + 1

bh Ni Ni = ph : : : Ni Ni + 1 Ni + 1

(11)

56

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

 p1     p2      p3   ×    :     :  ph 

Let us take the following matrixes: Pi = (p1 p2 : : : : : : ph ); Pi = (p1 p2 : : : : : : ph );   Ni 1 NPR i = ; ENi = 0 : : : 0 ::: Ni + 1 Ni + 1 (12) Pi is the matrix of the latest probabilities; Pi is matrix of new probabilities, NPR i and ENi are the matrixes that denote the acquisition of the new knowledge. We build the matrix ENi by putting the term 1= (Ni + 1) in the column corresponding to new point, all the other columns are equal to 0. We can now write the incremental learning equation: Pi = NPR i :Pi + ENi

(13)

In using (13) we do not need to recalculate the histogram of probabilities after the classi5cation of a new point. We need just to know where this new point is situated. To construct the possibility distributions, 5rstly, we organise the new probabilities (p1 p2 : : : ph ) in incremental order: p1 ¡ p2 ¡ · · · ¡ ph

(14)

secondly, we suppose 1 ; 2 ; : : : ; h are the new corresponding possibilities. If we use the transformation of Dubois and Prade (7), we 5nd the set of Eqs. (9): 1 = h:p1 2 = p1 + (h − 1):p2 3 = p1 + p2 + (h − 2):p3 ::: h = p1 + p2 + p3 + · · · + ph : We can rewrite (15) to be as (16):     h 0 0 ::: 1     1 h − 1 0 :::  2        3   1 1 h−2 0  =   ::: ::: ::: :::  :       :  ::: ::: ::: :::  h 1 1 1 :::

(15)

::: ::: ::: ::: ::: :::

0



0    0   :::   ::: 1

(16)

Finally, we reorganise the new possibilities (1 2 : : : h ) to take their original order: (1 2 : : : h )

(17)

This method preserves the current knowledge that is represented by the histogram of probability Pi = (p1 p2 : : : : : : ph ) and acquires the new knowledge obtained from the new point. This new knowledge, which is presented by the two terms: Ni =(Ni + 1) and 1=(Ni + 1), is added to the current knowledge as it has been shown in (13). Eq. (18) shows the limits of these two terms when the number of training points tends to be in5nite: lim

Ni →∞

Ni = 1; Ni + 1

1 lim = 0: Ni →∞ Ni + 1

(18)

The size of the matrixes in (13) is equal to the number of bars h of the histogram, this number is constant regardless to the number of training points. So the computation time to update the possibility densities is constant and independent from the size of the training set. Additionally, Eq. (13) needs to store h values for each attribute of each class to present its current knowledge and when it adds the new knowledge, it keeps the same number h of values. Therefore the memory size needed for the incremental learning is constant and independent from the size of the training set. In usual incremental learning methods, if the posterior learning result of incremental learning is exactly the same as the prior learning result, the additional training datum is regarded as redundant. We can 5nd through (13) and (18) that when the number of training points tends to be in5nite, the new points become redundant and thus the incremental learning does not extract any information from the new points.

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

57

Table 1 Comparison of the update time after the classi5cation of a new point in using the classical and incremental learning methods

Number of learning points in the class

Update time in seconds after the classi5cation of a new point in using classical method

Update time in seconds after the classi5cation of a new point in using incremental method

320 400 480 560 640 720 800 880 960 1040 1120 1200 1280 1360 1440

0.44 0.55 0.71 0.83 0.93 1.04 1.1 1.2 1.37 1.48 1.59 1.7 1.81 1.92 1.98

0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06

In the next section, we present several experimental results from di6erent application areas to con5rm the eNciency of our method.

7. Experimental results In order to 5nd the advantages of the incremental method, we will compare it with the classical method according to the update time and to the memory size in using di6erent examples. For the example of Fig. 2, Table 1 shows the update time after the classi5cation of a new point for di6erent number of initial learning sets in using the two methods. The update time is calculated as the average time of ten points situated in di6erent places in the class. This time is computed in using MATLAB 5.2 and the parameter h has been chosen to be equal to 23 in using a heuristic criterion. Fig. 10 shows the relationship between the number of learning points and the update time after the classi5cation of a new point in using the classical and incremental learning methods. The second example corresponds to the classi5cation of plastic bottles in one of the following three classes of polymers: PVC (class 1), PET (class 2), and PEHD (class 3) [1]. The classi5cation can be made

Fig. 10. Relationship between the number of training points and the update time after the classi5cation of a new point for the classical and incremental learning methods.

in using a feature space of two infrared wavelengths. The learning set contains 120 transmission values for the two wavelengths. These values are distributed in three classes of 40 points as it is shown in Fig. 11. Table 2 shows the update time after the classi5cation of a new point for di6erent number of training points in using the two methods and Fig. 12 shows the

58

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

Table 2 Comparison of the update time after the classi5cation of a new point in using the classical and incremental learning methods for the plastic bottles example

Number of learning points in the class

Update time in seconds after the classi5cation of a new point in using classical method

Update time in seconds after the classi5cation of a new point in using incremental method

40 80 120 160 200 240 280 320 360 400

0.07 0.13 0.18 0.23 0.27 0.33 0.39 0.44 0.51 0.55

0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06

Fig. 11. Representation of the 3 classes of plastic materials : (1) PVC, (2) PET, (3) PEHD.

Fig. 12. Relationship between the number of training points and the update time after the classi5cation of a new point for the classical and incremental learning methods for the plastic bottles.

relationship between the update time and the number of learning points for the two methods. We have also chosen h = 23 for this example. The third example corresponds to the detection of unbalance failures in a washing machine. The lateral and frontal amplitudes of the movements of the machine de5ne a classi5cation space. The unbalance failures make to appear four classes in this space. One of these classes corresponds to the good functioning and

the three other ones correspond to di6erent types of unbalance failures [1]. Fig. 13 shows these classes in the representation space. Table 3 shows the update time after the classi5cation of a new point for di6erent numbers of training points in using two methods and Fig. 14 shows the relationship between the update time and the number of learning points for the two methods. Here we have also chosen h = 23.

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

59

Table 3 Comparison of the update time after the classi5cation of a new point in using the classical and incremental learning methods for the machine washing example

Number of learning points in the class

Update time in seconds after the classi5cation of a new point in using classical method

Update time in seconds after the classi5cation of a new point in using incremental method

51 102 153 204 255 306 357 408 459 510

0.08 0.14 0.2 0.27 0.33 0.38 0.45 0.52 0.58 0.63

0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06

Fig. 13. Representation of the four classes of the laving machine example.

The fourth example, we consider the detection of metallic codes. These codes are a succession of metallic 5ne bands separated by an insulating zone. These codes are identi5ed in using an eddy currents sensor [1]. Fig. 15 shows 9 classes of the 9 metallic codes. Table 4 shows the update time after the classi5cation of a new point for di6erent numbers of training points in using the two methods and Fig. 16 shows the relationship between the update time and the number of learning points for the two methods. h is equal to 23 also as the previous examples.

Fig. 14. Relationship between the number of training points and the update time after the classi5cation of a new point for the classical and incremental learning methods for the machine washing.

The last example corresponds to the identi5cation of the functioning modes of a plastic injection moulding process [4]. The training set is divided in 5 classes in a feature space of 3 parameters as it is shown in Fig. 17. The classes 1 and 2 refer to the normal functioning and the other classes to abnormal one. These classes correspond to: • class 3: incomplete products, • class 4: products with shrink holes, • class 5: Lash products.

60

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

Table 4 Comparison of the update time after the classi5cation of a new point in using the classical and incremental learning methods for the metallic codes example

Number of learning points in the class

Update time in seconds after the classi5cation of a new point in using classical method

Update time in seconds after the classi5cation of a new point in using incremental method

45 90 135 180 225 270 315 360 405 450

0.06 0.09 0.14 0.18 0.21 0.26 0.29 0.34 0.38 0.41

0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06

Fig. 15. Representation of 9 classes of the metallic code data. Fig. 16. Relationship between the number of training points and the update time after the classi5cation of a new point for the classical and incremental learning methods for the metallic codes.

Table 5 shows the update time after the classi5cation of a new sample for di6erent number of training points in using the two methods. Fig. 18 shows the relationship between the update time and the number of learning points. To obtain a good classi5cation rate, h is chosen to be equal to 54. We can conclude that: • In the classical learning method the update time after the classi5cation of a new point increases linearly when the number of training points increases while this time remains constant in the incremental learning method. So the update time in the incre-

mental method is independent of the size of training set. • In classical method, the memory size increases when the number of training points increases since this method needs to store all the training points to calculate the probability histograms. In contrast, the memory size is constant and independent of the number of training points in incremental learning method. Indeed the integration of incremental learning avoids FPM to store the training points after the initial learning phase.

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

61

Table 5 Comparison of the update time after the classi5cation of a new point in using the classical and incremental learning methods for the plastic injection moulding process example

Number of learning points in the in the class

Update time in seconds after the classi5cation of a new point in using classical method

Update time in seconds after the classi5cation of a new point in using incremental method

219 438 657 876 1095 1314 1533 1752 1971 2190

0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

0.87 1.68 2.47 3.23 4.01 4.84 5.57 6.35 7.23 8.61

Fig. 17. Representation of 5 classes of the plastic injection moulding process.

Since the number of attributes and the number of bars h are the same for the 5rst four previous examples, the update time, in using the incremental learning method, is also the same. This fact is due to the advantage of the incremental learning that is the update time is independent of the size of training set and it depends upon the number of bars in the histograms. 8. Conclusion The state of an industrial system varies generally between normal and abnormal ones. The diagnosis by pattern recognition is used to determine this state at every instant. A set of sensors is used to measure the system parameters. These measures with their class

Fig. 18. Relationship between the number of training points and the update time after the classi5cation of a new point for the classical and incremental learning methods for plastic injection moulding process.

assignments constitute the training set that is divided into classes in using an unsupervised method. Each class indicates an operating state. We use the Fuzzy Pattern Matching (FPM) to build the classi5er for a new point. The class of this new point indicates the actual state of the system. In a non-stationary work environment, the classi5er must be able to increase its knowledge in learning from each new classi5ed point. This new knowledge can

62

M.S. Mouchaweh et al. / Fuzzy Sets and Systems 132 (2002) 49 – 62

improve the accuracy of the classi5cation and adapt the classi5er to possible changes in its work environment. To accomplish this task, the classical solution is to retrain the classi5er every time a new classi5ed sample is available. This is an impractical solution since it requires the storage of all available samples and a considerable computation time. Thus the incremental learning is the practical solution since it preserves the current knowledge and acquires the new one from the new sample. Only few papers have been published that deal with the incremental learning problem which is still an open issue in the pattern recognition literature. Our team uses Fuzzy Pattern Matching (FPM) as method of classi5cation. It is a multicriteria supervised classi5cation method. It has been developed in the framework of fuzzy set and possibility theory in order to take into account the impression and the uncertainty of the data. The learning phase uses a probability– possibility transformation to construct the densities of possibilities. We have chosen the transformation of Dubois and Prade. The densities are used as a membership function to assign the new sample to its suitable class. When FPM works in non-stationary environment, it must update its possibility densities after the classi5cation of each new sample. When the number of samples increases, the update time also increases. Furthermore the memory size required increases to store all the samples. In the literature, there is not any published paper that integrates the incremental learning in FPM. Thus, in this paper, we propose a method to integrate it. We have demonstrated that the update time and the memory size are constant and independent from the number of training samples. Finally we have pointed out the advantages of this method in comparing it with the classical one by using di6erent examples.

Our future work is to determine automatically the number of bars, h, in the histograms. This number has a very important inLuence on the performances of FPM. References [1] P. Billaudel, A. Devillez, G. Villermain Lecolier, Performance evaluation of fuzzy classi5cation methods designed for real time application, Internat. J. Approx. Reason. 20 (1999) 1–20. [2] L. Bruzzone, D. FernRadez Prieto, An incremental learning neural network for the classi5cation of remote sensing images, Pattern Recognition Lett. 20 (1999) 1241–1248. [3] A. Devillez, P. Billaudel, G. Villermain Lecolier, Use of the Fuzzy Pattern Matching in a diagnosis module based on the pattern recognition, CESA’98 IEEE Systems, Man Cybernet. 4 (1998) 902–907. [4] A. Devillez, P. Billaudel, G. Villermain Lecolier, Use of the Fuzzy Pattern Matching for diagnosis of a plastics injection moulding process, European Control Conference ECC’99, Karlsruhe, Germany, 1999. [5] D. Dubois, H. Prade, ThReorie des possibilitRes. Apllication aR la reprResentation des connaissances en informatiques, Paris, Masson, 1997. [6] D. Dubois, H. Prade, S. Sandri, On possibility=probability transformations, Fuzzy Logic, Kluwer Academic Publishers, Dordrecht, 1993, pp. 103–112. [7] D. Dubois, H. Prade, C. Testemale, Weighted fuzzy pattern matching, Fuzzy Sets and Systems 28 (1988) 313–331. [8] B. Dubuisson, Diagnostic et reconnaissance des formes, TraitRe des nouvelles Technonlogies, sRerie Diagnostic et Maintenance, Hermes, 1990. [9] M. Grabish, M. Sugeno, Multi-attribute classi5cation using fuzzy integral, Proceedings of Fuzzy IEEE, 1992, pp. 47–54. [10] F. Klein, Etude comparative de mRethodes de classi5cation Loues, ThRese de doctorat prResentRee devant l’UniversitRe de Reims Champagne–Ardenne, France, 1998, pp. 4 –38. [11] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1 (1) (1978) 3–28.