Interactive Interfaces to Detect Conceptual Difference for Group Knowledge Acquisition

Interactive Interfaces to Detect Conceptual Difference for Group Knowledge Acquisition

Copyright 0 IFAC Artificial Intelligence in Real-Time Control, Kuala Lumpur, Malaysia, 1997 Interactive Interfaces to Detect Conceptual Difference fo...

1MB Sizes 1 Downloads 59 Views

Copyright 0 IFAC Artificial Intelligence in Real-Time Control, Kuala Lumpur, Malaysia, 1997

Interactive Interfaces to Detect Conceptual Difference for Group Knowledge Acquisition Shogo Nishida, Tetsuya Yoshida and Teruyuki Kondo Dept. of Systems and Human Science, Osaka University, 1-3, Machikaneyama-cho, Toyonaka, Osaka 560 JAPAN [email protected]

2. Design Concept

Abstract

First of all, we define conceptual difference which is dealt with in this paper. Although we think there are many types of conceptual difference, the following two types are picked up here;

Conceptual difference is a serious problem in group knowledge acquisition systems, especially when different people with different background participate in a group. This paper deals with conceptual difference and proposes a method to detect it in the cases that different symbols are used as the same meaning and/or same symbols are used as the different meanings. In section 2, conceptual difference is defined and system architecture for detection is described. In section 3, detecting algorithm is designed, and then a prototype system and its evaluation are discussed in section 4.

Type(a) conceptual difference: Different symbols are used as the same meaning. Type(b) conceptual difference: Same symbols are used as the different meanings. A concrete example of conceptual difference may be observed when an electric motor is diagnosed by two engineers, that is, an electrical engineer and a mechanical engineer. It is probable that the electrical engineer uses the word "voltage frequency" and that the mechanical engineer uses the word "number of rotations" for expressing the same state, because vocabularies in electrical engineering and mechanical engineering are different and these words have the same meaning physically. On the other hand, the same words may have the different meanings in different fields . Therefore, it is very important to detect Type(a) and Type(b) conceptual difference before various types of knowledge are acquired from multi-user. Fig. 1 shows a system architecture of interactive interfaces to detect the above two types of conceptual difference. Though this figure shows the case with two users, the architecture can be extended easily to the case with multi-user. Each user gives the knowledge which he has on some object as the format of examples. It is assumed that examples are expressed by three parameters; class, attributes and values. The input data by each person is stored in a separate input file, and then a mixture file is created by adding all the input data as shown in Fig. 1. Based on these data, decision trees are constructed by applying ID3 algorithm [4). These decision trees are named self-trees and a mixture-tree respectively, as shown in the figure. By using these decision trees, Type(a) and Type(b) conceptual differences are

Copyright © 1998 IFAC

1. Introduction

Knowledge acquisition techniques have been studied since 1970's to solve the problems on building expert systems[l]. Though people are focusing on knowledge acquisition from one person in the initial stage, many researches on group knowledge acquisition have been conducted recently[2][3], especially in relation to CSCW(Computer-Supported Cooperative Work). Conceptual difference is one of the new and serious problems in group knowledge acquisition. Concepts on some object or in some field depend on each person, and they are different in many cases. We believe that detecting conceptual difference is a key function to build group knowledge acquisition systems. Furthermore, it is expected that this function would be used as a module in creative thinking support systems, since different viewpoints from other people may be a chance of hitting some new ideas. Based on this belief, this paper proposes interactive interfaces to detect conceptual difference by using decision trees. In section 2, design concept of our system is discussed, and concrete algorithm to detect conceptual difference is described in section 3. Then a prototype system and its evaluation result are described in section 4.

177

f.............

~.!

~~

~

pcnoo A

~

....................

~

same class. In this case, the different symbols appear in the same leaf of the mixture-tree. In the leaf, one symbol comes from only A's examples and the other comes from only B's. Fig.2 shows an example of decision tree for Type(a) conceptual difference. The following detecting algorithm is adopted in this case.

pcnooB

for Type(a) [Step 1] Search inseparable leaves, which have more than one class symbols, in the mixture-tree . [Step 2] Perform the noise cut operation for them. (Appendix A, operation-I). [Step 3] Search a still inseparable leaf, in which one symbol comes from only A's examples and the other comes from only B's examples. [Step 4] If such a leaf is found, then select the pair of the symbols as a candidate for conceptual difference. [Step 5] Repeat the operations from [Step 1] to [Step 4] and count how many times the candidates are detected. Indicate the candidates in order of the count to the users.

....,--....

Fig. 1 System Architecture

,........ ,.

Type(b) conceptual difference in classes means that the same symbols are used to express the different classes. In this case, the same symbols appear in the different leaves of the mixture-tree. The symbol in one leaf comes from only A's examples and the same one in the other leaf comes from only B's. The detecting algorithm in this case becomes as follows.

[

1--

for Type(b) [Step 1] Search a pair of leaves which have the same class symbol in the mixture tree. [Step 2] Perform the noise cut operation for them. (Appendix A, operation-I). [Step 3] Check whether the symbol in one leaf comes from only A's examples and the same one in the other leaf comes from only B's examples. [Step 4] If such a pair of the leaves are found, then select the symbol as a candidate for conceptual difference. [Step 5] Repeats the operations from [Step 1] to [Step 4] and count how many times the candidates are detected. Indicate the candidates in order of the count to the users.

.1it:u.t.t. ••

..7S §. I ••• l...... • 1 , .......... 1 .,

,.

v.... 1•• • • 1 1

Fig. 2 An example of decision tree detected. The result is shown as the candidates for conceptual difference and the users correct their input knowledge by themselves. By repeating this cycle, it is expected that conceptual difference is removed and improved knowledge is acquired.

3. Detecting Algorithm of Conceptual Difference 3.2. Conceptual Difference in Attributes Type(a) conceptual difference in attributes means that the different symbols are used to express the same attribute. Here it is assumed that the different symbols "X" and "Y" indicate the same attribute. In this case, the class distribution under the node with attribute symbol "X" in one self-tree becomes similar to the class distribution under the node with

Here the concrete algorithm of detecting conceptual difference is described. Two persons, A and B, are assumed as shown in Fig. 1. 3.1. Conceptual Difference in Oasses Type(a) conceptual difference in classes means that the different symbols are used to express the

178

value come only A's examples and classes under the other value come only B's examples. [Step 3] If such a pair is found, then select the pair of the symbols as a candidate for conceptual difference. [Step 4] Repeat the operations from [Step 2] to [Step 4] for all attributes found in [Step 1] and count how many times the candidates are detected. Indicate the candidates in order of the count to the users.

attribute symbol "Y" in the other self-tree. The detecting algorithm in this case is determined as folloWS. f2[Iype(a}

[Step 1] Search pairs of the different attribute symbols with the same set of values in the attribute fields of both input files. [Step 2] Search a pair of the nodes in each self-tree which have either attribute found in [Step 1] [Step 3] Calculate the similarity value (Appendix B, operation-2) for the pair of nodes. If the similarity value is smaller than certain threshold level, the pair of the attribute symbols are selected as a candidate for conceptual difference. [Step 4] Repeat [Step 2] and [Step 3] for each pair of the nodes, and count how many times the candidates are detected. [Step 5] Repeat the operations from [Step 1] to [ Step 4] for all pairs of attribute symbols and indicate the candidates in order of the count to the users.

Type(b) conceptual difference in values means that the same symbols are used to express the different values. This type is also detected in the same way. 4. Development of A Prototype System and Its Evaluation A prototype system is developed on the UNIX workstation. Each component of the system in Fig. I is written in C language. As an example, the motor diagnosis case is evaluated. In this case, two persons give their knowledge expressed by thirty examples, which are composed of six attributes, two or three values and five classes, respectively. Fig. 3 shows the concrete data. The following artificial data, which include conceptual difference, are given to the prototype system to evaluate the ability to detect conceptual difference.

Type(b) conceptual difference in attributes means that the same symbols are used to express the different attributes. In this case, class distribution pattern becomes different in either self-tree for the same attribute. Therefore, it is enough to replace "the different attribute symbols" in [Step 1] to "the same attribute symbols" and "smaller than certain threshold level" in [Step 5] to "larger than certain threshold level" in the above algorithm.

Input File A 3.3. Conceptual Difference in Values Conceptual difference in values appears, for example, in the case that person A feels "hot" for some temperature and person B feels "warm" for the same temperature. In this case, there exists a conceptual difference between person A and person B in the values of attribute" Temperature". Type (a) and Type (b) conceptual difference are also possible here. Type(a) conceptual difference in values means that the different symbols are used to express the same value. Here it is assumed that different value symbols are "X" and "Y". Then the classes under the value "X" come only A's examples and the classes under the value "Y" come only B's examples in the mixture-tree. Therefore the following algorithm is derived.

Examples

30

Value Attribute : Temperature : Normal inc INC : Stable inc/dec Fluctuate Current \ Yes : No Noise Yes No Vibration : INC Amplitude : Normal inc Frequency : Normal Low High

...

Input File B

Examples

Value Attribute : INC Temperature : Normal inc : Stable inc/dec Fluctuate Current Yes : No Noise Yes : No Voltage INC Amplitude : Normal inc High Frequency : Normal Low

for T,xpe(a) [Step 1] Search an attribute which has the same set of value symbols in input files. [Step 2] Search a pair of the value symbols in the mixture-tree such that classes under one

Fig. 3 Simulation Data

179

Class Imbalance PowerSuppl) Bearing Rigidity Overload

30

...

\

Oass Wiring PowerSupp~

Bearing Rigidity Overload

(A) Type(a) Conceptual Difference in Classes A,B : PowerSupply --> B: Insulation (B's class symbol "Power Supply" is replaced by "Insulation".)

(A) Type(a) in Classes 1) 2) 3) 4) 5)

(B) Type(b) Conceptual Difference in Classes A: Imbalance, B: Wiring --> A,B: Imbalance( B's class symbol "Wiring" is replaced by "Imbalance".) (C) Type(a) Conceptual Difference in Attributes A,B:Noise --> B:Stench (B's attribute symbol "Noise" is replaced by "Stench".)

(D) Type(b) Conceptual Difference in Attributes A:Vibration, B:Voltage -->A,B:Vibration(B's attribute symbol "Voltage" is replaced by "Vibration".)

A: PowerSupply B: Insulation (2) A: Rigidity B: Bearing (1) A: Bearing B: Rigidity (1) A: Imbalance B: Bearing (1 ) A: Imbalance B:Rigidity (1 )

Fig. 4 Simulation Result 1

(E) Type(a) Conceptual Difference in Values A,B:inc(Amplitude) --> B:Normal( B's value symbol "inc" in Amplitude is replaced by "Normal".) (A) Type(a) in Classes

(F) Type(b) Conceptual Difference in Values A:N ormal(Amplitude), B :inc(Amplitude) >A,B:Normal (B's value symbol "inc" in Amplitude is replaced by "Normal".)

1) A: Unbalance B: Rigidity (2) 2) A: PowerSupply B: Insulation (1) 2) A: Unbalance B: Bearing (1)

(B) Type(b) in Classes 1) Un balance (24) 2) Rigidity (12) 3) Bearing (3)

In the case that each conceptual difference exists alone, it is confirmed that the system can detect it for all cases. Fig. 4 shows the output of the prototype system for case(A). The number in this figure shows the count of detection and the priority for each candidate are decided based on the count. In this case, "PowerSupply" used by A and "Insulation" used by B are picked up as the first candidate for Type(a) conceptual difference in classes. This is the correct answer for this case. Furthermore, the data, in which all the six conceptual difference from case(A) to case(F) exist at the same time, are given to the system. Fig. 5 shows the output for this case. This result indicates that the system succeeds in the detection of five types of conceptual differences in first trial except Type(a) conceptual difference in values. Even in this case, the system can detect the Type(a) conceptual difference in values after the detected conceptual differences are corrected. It is confirmed by this result that the system has the possibility that conceptual difference which has not been detected yet is found by the feedback loop. The priority for the candidates is decided by the count here. The relative frequency that .the count is divided by the number of examples should be used instead of the direct value of the count, because the number of examples with each class may be different. In this simulation, however, the results does not change even if relative frequency is adopted.

(C) Type(a) in Attributes 1) A: Noise B: Stench (1)

(D) Type(b) in Attributes 1) 2) 3) 3) 3)

Temperature (4 ) Amplitude (2) Vibration (1 ) Frequency (1 ) Current (1)

(E) Type(a) in Values Nothing

(F) Type(b) in Values 1) attribute: Amplitude -> value: Normal (1 ) 1) attribute: Current -> value: Stable (1)

Fig. 5 Simulation Result 2 The operation-l is for noise reduction and operation-2 is for the calculation of the similarity. Each operation has the threshold level, and the levels affect the number of the candidates indicated from the prototype system. The threshold level is determined so that the number of the candidates may become 5 or less in this case.

180

5. Conclusions

The classes, whose value of Yi is less than some threshold level, are cut off as the noisy data.

Interactive interfaces to detect conceptual difference was dealt with in this paper. First we proposed a detecting algorithm for it, then developed a prototype system and ~valuat~d it for motor diagnosis case. By the expenments, It was confIrmed that the system worked well for motor diagnosis case. We plan to continue more experiments for various data, and improve the algorithm in the next step.

B. Operation-2 (calculation of similarity value) A pair of nodes are named X and Y, and the number of classes is assumed to be n. The number of examples of each class under node X is assumed to be Xi (i=l,n)., and one under node Y is assumed to be Yi (i=l,n), respectively. Then vector X and Y is defIned in n-dimensional space whose components are Xi (i=l,n) and Yi (i=l,n). Next, each vector is normalized into X norm and Ynorm by the following equation.

Acknowledgments This work was partially supported by the Grantin Aid for ScientifIc Research from the Japanese Ministry of Education, Science and Culture. (No.082233212,09450159)

Xnorm = ( XII IX I, X21 IX I, IX I) Ynorm = (Y11 IY I, Y2"Y I, ,Y n"Y I)

References [1] K.Morik, S.Wrobel,J-U.Kiets and W.Emde," Knowledge Acquisition and Machine Learning," Academic Press (1993) [2] T.R.Gruber, J.M.Tenenbaum and J.C.Weber," Toward a knowledge medium for collaborative product development," Proc. of 2nd Int. Conf. on Artif. Intell. in Design, pp.413-432 (1992) [3] P.N.Huyn, M.R.Genesereth and R.Letsinger, " Automated concurrent engineering in design world," IEEE Computer, Jan.1993, pp.74-76 (1993) [4] J. R. Quinlan, " Induction of decision trees," Machine Learning, Vol. 1, No. 1, pp. 81-106 (1986) [5] J. R. Quinlan," Learning Logical DefInitions from Relations," Machine Learning, Vol.5, No.3, pp.239-266 (1990) [6] V.G.Dabija, K.Tsujino and S.Nishida, "Learning to Learn Decision Trees," Proc. of AAAI-92, pp88-95 (1992) [7] K.Tsujino and S.Nishida, " Implementation and Refinement of Decision Trees Using Neural Networks for Hybrid Knowledge Acquisition," Artif. Intell. in Engineering, Vol. 9, pp. 265-275 (1995) [8] K. Tsujino, Vlad G. Davija and S. Nishida," Interactive improvement of decision trees through flaw analysis and interpretation", International Journal on Human- Computer Studies, No.45, pp.499-526, 1996.

,Xnl

where IX I and IY I indicate the norm of the vector X and Y, respectively. The distance of two vectors X and Y is defIned by dist (X,Y) = { (X1-Y 1)2+(X2-Y2)2+ . ... + (X n-Y n)2} 112 and similarity value is calculated by the following equation. similarity value = dist (X norm, Ynorm)

Appendices A. Operation-1 (noise cut operation) The number of classes is assumed to be n, and the number of examples in each class is assumed to be Xi (i=l,n). Then Calculate Yi (i=I,n) for each class.

181