Partially constructed knowledge for semantic query

Partially constructed knowledge for semantic query

Expert Systems with Applications 36 (2009) 10168–10179 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

3MB Sizes 1 Downloads 135 Views

Expert Systems with Applications 36 (2009) 10168–10179

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Partially constructed knowledge for semantic query Tzone I. Wang a, Tung Cheng Hsieh a,*, Kun Hua Tsai a, Ti Kai Chiu a, Ming Che Lee b a b

Department of Engineering Science, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan, ROC Department of Computer and Communication Engineering, Ming Chuan University, 5 De Ming Road, Gui Shan District, Taoyuan County 333, Taiwan, ROC

a r t i c l e

i n f o

Keywords: Ontology Formal concept analysis Knowledge acquisition Search engine

a b s t r a c t Among the developments of information technology, the most popular tools nowadays for seeking the knowledge are the Google or Yahoo keywords-based search engines on the Internet. Users can easily obtain the information they need, but they still have to read and organize those documents by themselves. Due to that reason, users have to spend most of time in browsing and skipping the documents they have searched. In order to facilitate this process, this paper proposes a query-based ontology knowledge acquisition system which dynamically constructs query-based partial ontology to provide proficient answers for users’ queries. To construct the relationships and hierarchy of concepts in such an ontology, the formal concept analysis approach is adopted. After the ontology is built, the system can deduct the specific answer according to the relationships and hierarchy of ontology without asking users to read the whole document sets. We collected three kinds of sports news pages as source documents including those regarding NBA, CPBL and MLB to evaluate the precision of the system function in the experiment, which, as a result, reveals that the proposed approach indeed can work effectively. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction During the last decade, the rapid advance of information technology, especially on the World Wide Web, has made the Internet available for people to share and acquire information easily. However, due to the enormous web pages the Internet contains, users spend most of time in browsing and skipping the documents they have searched. From time to time, an easy question costs a user a lot of time to find the answer. Some of solving methods have been proposed, particularly on the conception of ontology. Currently, an ontology applies widely to various applications such as, expert system, medical sciences, e-learning, and document processing. Ontology construction is a difficult and tedious work because the developing process needs vast of works, costs, and efforts of a lot of experts. It is suppose that the framework is built by domain experts according to the current information contents and attributes. However, the information content could be changed as time passing by. If an ontology structure can be adjusted with the different information contents and attributes, then it will easily map on suitable knowledge domains and improve the capability of reasoning. Therefore, there has been an extensive interest in studying the automatic construction method for ontology. The advantage of the automatic ontology construction is more flexible in its skeleton. In this paper, we utilize the formal concept analysis (FCA) ap-

* Corresponding author. E-mail address: [email protected] (T.C. Hsieh). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.12.030

proach to develop a query-based ontology for users searching for the specific information. The purpose is to help users extract information they need without have to read all the documents. An example will be used to illustrate the effectiveness of the proposed approach. The results show that the approach can perform effectively when searing for the information. The remainder of this paper is organized as follows. In Section 2, we present ontology construction and application, and formal concept analysis (FCA) literature briefly. Section 3 describes the architecture of the query-based ontology knowledge acquisition system for semantic search. Section 4 we explain how our proposed approach is used to construct a query-based ontology. The experimental results are presented in Section 5, and Section 6 gives the conclusion. 2. Literature review 2.1. Ontology The term ‘‘Ontology” is generally thought to be a philosophical theory originated by early Greek philosophers such as Plato and Aristotle, which seeks to explain or posit the basic categories and relationships of the nature of being or existence to define entities and types of entities within its structure. In the last decade, it has been comprehensively mentioned and applied in all kinds of fields (Brewster & O’Hara, 2004). Ontology, regarded as a basic knowledge structure for information on the Semantic Web, often captures a semantic network and represents it by a graph whose

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

nodes denote concepts or individual objects and arcs represent the relationships or associations among concepts (Huhns & Stephens, 1999). With an ontology as the common understanding of knowledge in a domain, users and a system can communicate with each other errorless (Yan, Jiang, Zheng, Peng, & Li, 2006). There have been proposed methodologies in automatic ontology construction. For example, Kim, Choi, Shin, and Kim (2007) proposed a methodology similar to the computer software development process for building philosophy ontology. Chi (2007) proposed an approach for developing an ontology building process to extract conceptual tags and hierarchies in textual documents. Lee, Kao, Kuo, and Wang (2007) presented an episode-based fuzzy inference mechanism to synthesize as a domain ontology from unstructured Chinese text documents. In addition, there have been many ontological applications in various domains. For example, Tsai, Lee, and Wang (2006) proposed an adaptive model to recommend to a learner suitable learning objects, which model infers what learning objects a system should look for automatically according to the specific ontology, and what learning objects a learner should study. Lee, Jiang, and Hsieh (2006), with an ontology model, presented Genetic Fuzzy Agent that infers suitable time slots to support a scheduling host to select an appropriate meeting time for all of invitees, known as the meeting scheduling system (MSS). Lee, Tsai, and Wang (2008) proposed an ontological model of semantic-aware learning object retrieval for any LOM-based search mechanism. Zhou et al. (2004) used Protege 2000 to develop concepts and relationships of an of traditional Chinese medicine (TCM) knowledge ontology. Chau (2007) used a three-stage life cycle for ontology design and proposed an ontology-based Knowledge Management system with a Java/XML-based scheme for automatically generating knowledge-search components.

10169

Definition 4. Concept lattice For a context L 2 (G,H,X) and two concepts C 0 ¼ ðA0 ; B0 Þ and C 1 ¼ ðA1 ; B1 Þ, a hierarchical relation can be given by where A0 # A1 and B0  B1 namely, ðA0 ; B0 Þ 6 ðA1 ; B1 Þ, ðA0 ; B0 Þ 6 ðA1 ; B1 Þ () A0 # A1 () B0  B1 . All concepts in L which are ordered by the hierarchical relation is called the Concept lattice of the context L 2 (G,H,X). 3. The system architecture Components of the proposed system include Retrieval Agents, a Documents Preprocessing Subsystem, a Query-based Ontology Construction Subsystem and an Ontology-based Knowledge Management Subsystem. Fig. 1 shows the architecture of the system for semantic search. 3.1. System architecture description Retrieval Agents search on the Internet automatically for the news documents, where we take sports as a main illustration, and then the Documents Preprocessing Subsystem scans meaningful term sets and builds co-occurrence matrices when a user issues a question for some sports information, from which the Ontologybased Knowledge Management Subsystem parses the user’s query into question segments and question focuses, and the meaningful segment terms store into repository. Then, the Query-based Ontology Construction Subsystem would function the formal concept analysis approach to construct the hierarchy and relationships for such a query-based partial ontology. Finally, the Ontologybased Knowledge Management Subsystem offers users the appropriate answer to her/his question with some related documents based on the query-based partial ontology. The detailed description of each process is as follows:

2.2. Formal concept analysis (FCA) 3.2. Documents Preprocessing Subsystem The formal concept analysis applied in the research is a mathematical approach that transfers structured data into formal concepts and forms a concept lattice (also referred to as Galois Graph), ordered by a concept-sub-concept relation (Formica, 2006). Wille (1982) introduced this theory of formal concept analysis in 1982, and it is applied in many applications, such as the ontology constructions (Tho, Hui, Fong, & Cao, 2006; Weng, Tsai, Liu, & Hsu, 2006), the recommendation system (Boucher-Ryan & Bridge, 2006), the information retrievals (Jun, Du, & Shen, 2005; Myat & Hla, 2005), and the clinical domain (Jiang, Ogasawara, Endoh, & Sakurai, 2003). The whole set of definition of formal concept analysis is introduced as below (Davey & Priestley, 2002; Tsai, Chiu, Lee, & Wang, 2007).

The Documents Preprocessing Subsystem performs a natural language processing on source document set, which is divided into two steps, the term analysis step and the content analysis step. Fig. 2 exhibits the process of subsystem for documents preprocessing.

Definition 2. Intent and extent 0 For a set A # G, A is called an Intent of A when 0 0 A ¼ fh 2 Hjð8g 2 AÞgXhg, and for a set B # H, B is called an Extent 0 of B when B ¼ fg 2 Gjð8h 2 BÞgXhg.

3.2.1. Term analysis step It uses a Part-Of-Speech (POS) tagging system supplied by the Chinese Knowledge Information Processing (CKIP) Group (CKIP AutoTag, 1998) of Taiwan and a self-built terms filter to collect a meaningful term set containing nouns (Na, Nb, Nc, and Nd) and verbs (VA, VC, VHC, VH, VJ, and VD). However, each of words or a compound word in a Chinese sentence has different meanings. (Bull pen)” for example. The Chinese Take the Chinese term ‘‘ (Bull pen) (Na)” has two words, namely ‘‘ (Cattle) term ‘‘ (Na)” and ‘‘ (Shed) (Na)”, which are segmented by POS tagging system. In our study, we take sports news documents as the main (Bull pen)” has more subject. Therefore, the compound word ‘‘ meanings than single word ‘‘ (Cattle)” or ‘‘ (Shed)”. We merged two single words into a compound word by following rules as Table 2. For instance, according to rule 2 Na + Na ? Na, we can merge (Bull pen) (Na)”. ‘‘ (Cattle) (Na)” and ‘‘ (Shed) (Na)” into ‘‘ Table 1 displays some examples of a meaningful term set. Table 2 presents rules table of Part-Of-Speech composition.

Definition 3. Concept A concept C in a context (G, H, X) is a pair of (A, B), where A # G and 0 0 B # H, A = B and B = A. Set A is called the Extent and set B the Intent of concept C.

3.2.2. Content analysis step In general, owing to the more documents we have, the longer execution time is needed. Hence, here the content analysis serves as building co-occurrence matrix to reduce execution time on find-

Definition 1. (Context). A context is a triple (G, H, X), where G and H are sets namely objects set and attributes set respectively, and X denoted as a set which is formed with binary relationship between G and H. As a rule, the (g,h) 2 X is in place of gXh as written; namely, the object g has the attribute h.

10170

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

Fig. 1. The architecture of the proposed system.

Fig. 2. Process of Documents preprocessing subsystem.

Table 1 Examples of meaningful term set.

set T j of meaningful terms selected by term filter, where a document set is expressed as fD1 ; D2 ; D3 ; . . . ; Dn g, and a term set of meaningful terms is as fT 1 ; T 2 ; T 3 ; . . . ; T m g. Then formula (1) is employed to create the set of the co-occurrences between meaningful terms and documents.

X ij ¼



1; Di co-occurs with T j ; 0;

otherwise:

ð1Þ

After the co-occurrence set is built, we can generate a n  m matrix, which is represented as Eq. (2), where n represents the number of source documents and m represents the number of meaningful terms.

2

ing relative sentences of each meaningful tem in formal concept analysis process. Accordingly, there are two kinds of sets, a document set Di of agent collected from sport news pages and a term

3

X 11

X 12

   X 1m

6X 6 21 M ij ¼ 6 6 .. 4 .

X 22 .. .

   X 2m 7 7 .. .. 7 7 . . 5

X n1

X n2

   X nm

ð2Þ

According to the matrix in Eq. (2), the Query-based Ontology Construction Subsystem discussed below can be utilized to find related sentences of each meaningful term easily.

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179 Table 2 Rules table of Part-Of-Speech composition. Rule #

Left term

Right term

Part-Of-Speech (POS) composition result

1 2 3 4 5

VH13 Na A,VH,Neu,Nes,VH13 VJ VC, VD

Na, Nd Na Na, Nd VH Na

Na

10171

then, a concept set is composed of a pair of sets (S,Q), i.e. a set of meaningful terms of each sentences (Attributes) and a set of sentences (Objects) where Q = r(S) and S = s(Q). 3.3.2. Hierarchy relationships generator After generate a concept set, the whole hierarchy relationships among concepts can be established. A concept (A0, B0) that is a subconcept of another concept (A1, B1) is denoted as (A0, B0) # (A1, B1). In the case of the concept set, it means that c0 = (A0, B0) becomes the sub-concept of c1 = (A1, B1). Given two elements ðI1 ; J 1 Þ and ðI2 ; J 2 Þ in the concept hierarchy, their infimum is defined as below (Buchli, 2003):

3.3. Query-based Ontology Construction Subsystem The purpose of this subsystem is constructing query-based ontology after formal concept analysis approach effected. Besides, in this paper, a methodology derived from (Weng et al., 2006) is used to build the relationships and the hierarchy of query-based partial ontology. Fig. 3 shows the process of Query-based Ontology Constructor Subsystem that is involved in three main steps with detail described as following: 3.3.1. Binary relation matrix generator and concept set builder In this step, we generated a binary relation matrix and concept set C, after which we employed the co-occurrence matrix to find out each of meaningful segment terms appearing in which documents, and divided those documents into sentences with punctuation marks. Notation (O,A,X) denotes a binary relation X between a set of objects O and a set of attributes A. In this study, O denotes the sentences, A denotes the meaningful terms of each sentences, and X represents the binary relations, i.e. X # O  A. Furthermore, to generate the concept set C, assume S is a partial set of O, Q is a partial set of A, and both S # O and Q # A, the set of sentences is

rðSÞ ¼ fa 2 Aj8o 2 S : ða; oÞ 2 Xg; and the set of meaningful terms of each sentences is

sðQ Þ ¼ fo 2 Oj8a 2 Q : ða; oÞ 2 Xg;

ðI1 ; J 1 Þ \ ðI2 ; J2 Þ ¼ ðI1 \ I2 ; rðI1 \ I2 ÞÞ: and their supremum is defined as

ðI1 ; J 1 Þ [ ðI2 ; J2 Þ ¼ ðsðJ 1 \ J 2 Þ; J1 \ J 2 Þ: The whole hierarchy relationship of concepts is constructed applicably by the infimum and supremum relationships. 3.3.3. Internal relationships generator This step identifies the internal relationships among concepts through the method (Weng et al., 2006), which assumes there are two concepts c0 ¼ ðA0 ; B0 Þ and c1 ¼ ðA1 ; B1 Þ, if 9Setx and 9Set y where Setx  B0 and Sety  B1 , and if Setx ¼ Sety , then the internal relationship between c1 and c2 can be explored. 3.4. Ontology-based Knowledge Management Subsystem Two major roles are played by the Ontology-based Knowledge Management Subsystem: the first role is to acquire questions from users, and the second is to provide the answers, whose accuracy could be validated by the tools developed in domain experts built for such a query-based partial ontology. The Second role is to find the answers to the questions. Fig. 4 signifies the process of subsystem for ontology-based knowledge management that consists of two steps.

Fig. 3. Process of query-based ontology constructor subsystem.

10172

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

Fig. 4. Process of ontology-based knowledge management subsystem.

3.4.1. Question analysis step The first question-answering step is to infer a question’s segments and focuses. The question segments phase processed the user’s questions issued. Take the Chinese question: (Who is the coach of the HEAT?) as an example; the segmentation results as: ðVEÞ

ðNbÞ

ðNaÞ

ðSHIÞ

Table 4 Example of marked question type.

ðNhÞ?ðQUESTIONCATEGORYÞ

The self-built terms filter phase collected meaningful segment terms and stored into meaningful segments repository. According to above example, we can obtain two terms as the meaningful seg(HEAT) and (Coach), and then ment terms, including store into repository. Then, the question focus selector phase marked hQFDi as question description and hQFi as question focus based on some patterns (Lee et al., 2005). In order to find the patterns of question focus, we collected and analyzed a lot of questions, as shown in Table 3. Based on Table 4, example questions could be

Table 3 Example of question patterns.

marked into QFD, specific question description and QF, question focuse, for the system to explore. 3.4.2. Answer analysis step At this step, we need to deduct an answer based on the constructed query-based partial ontology, where the system would transfer the ontology to a result set that has been structured and pertained among the concepts, shown in the following Fig. 5, the example of query-based partial ontology. Each node stands for a concept, and the hierarchy with two relationships: hierarchical relation and internal relation. The pattern classifier phase demarcates the scope of replying answer based on Table 4. Due to the differences of question types, the result set produces several candidates after each ontology construction formed. Hence, we compute the scores of relation among concepts based on formula (3), where w represents the weighting value from one concept to the another concept. In this paper, w is equal to 0.7 because the hierarchy relation has more relevancy strength than internal relation. Finally, the answer selector phase would select a suitable answer, which is based on all of relations with the highest scores.

Number of sentences in Termx co-occurs with Termy Score ¼  w; Number of sentences in appearing Termx  w ¼ 0:7; Hierarchy relation; ð3Þ w ¼ 0:3; Internal relation:

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

10173

Fig. 5. Example of query-based partial ontology.

4. An example In order to evaluate the proposed approach, we implement a user interface, called ontology-based knowledge acquisition system, for a user to ask questions spontaneously in Chinese about sport news and get the answers from this user interface, as shown in Fig. 6. The interface will process these questions transparently. In addition, domain experts will constantly validate the accuracy of the answer as well as the query-based partial ontology. Now, we take an example to describe the proposed approach in detail and see how it works effectively. Assume a user issues (Who is the coach of the ROCKa question: ETS?). The first work of the system is to segment the question. After the segmentation, the self-built terms filter phase will col(ROCKETS) and lect meaningful segment terms included

(Coach) and stored them in meaningful segments repository. Then, the question was marked hQT VALUE = ‘‘WHO”/i by mapping all of pattern types based on the Table 3. Fig. 7 shows the results of question type with the segments of the question for the subject analysis. The ontology construction step can be seen in the documents (Hsieh et al., 2007). In this paper, assume meaningful term , Coach) appearing in sentence (#60), the corresponding en( try of the binary relations matrix is labeled as ‘‘X”, where X denote a binary relation between A and O. Thus, if O indicates a set of objects, the sentences that are divided from those documents as shown in Table 5, and A indicates a set of attributes, the meaningful terms of each sentences as shown in Table 6, and then Fig. 8 will show the query-based partial ontology the approach finds. In order to select an answer for the user, we evaluate the scores of relation among the concepts. In this case, the term ‘‘

Fig. 6. Homepage of ontology-based knowledge acquisition system.

10174

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

Fig. 7. Results of question type in question analysis.

Table 5 Meaningful segment terms and all of sentence set. Meaningful segment terms (ROCKETS) & (Coach)

Appear in documents number

Document include sentence number

Document Document Document Document Document Document

#50, #51, #52, #53, #54, #55, #56, #57, #58, #59, #60, #61, #62, #63, #64, #65, #66 #120, #121, #122, #123, #124, #125, #126, #127, #128, #129, #130, #131, #132, #133, #134, #135, #136 #298, #299, #300, #301, #302, #303, #304, #305. #306 #307, #308, #309, #310, #311, #312 #1185, #1186, #1187, #1188, #1189, #1190, #1191, #1192, #1193, #1194, #1195, #1196, #1197 #1231, #1232, #1233, #1234, #1235, #1236, #1237, #1233, #1239, #1240, #1241, #1242, #1243, #1244, #1245 #1485, #1486, #1487, #1488, #1489, #1490, #1491, #1492, #1493, #1494, #1495, #1496, #1497, #1498, #1499, #1500, #1501, #1502, #1542, #1543, #1544, #1545, #1546, #1547, #1548

7 14 34 127 131 157

Document 163

Table 6 Part of meaningful terms.

building such an ontology, the system can offer an appropriate answer for the user automatically. Here, we would execute two experiments to validate the efficiency and accuracy of our system. In the first experiment, we will compare the accuracy of query-based partial ontology with that of whole ontology and keywords-based search engine, respectively. In the second experiment, we will employ the proposed approach in others sports source documents, such as Major League Baseball (MLB), and Chinese Professional Baseball League (CPBL) in Chinese character. 5.1. Experiment design

(ROCKETS)” appears in 25 sentences, the term ‘‘ (Coach)” ap(ROCKETS)” co-occur with pears in 8 sentences, the term ‘‘ (Van Gundy)”in 6 sentences, and the term ‘‘ (Coach)” ‘‘ (Van Gundy)”in 4 sentences. Based on the co-occur with ‘‘ above, we compute the scores of relation among the concepts by formula (3), as shown in Table 7, where the choice of the system, (Van with the highest score, colored in gray, is termed ‘‘ Gundy)” as the answer. Fig. 9 shows the searching results and related contents.

5. Experimental results In this paper, the formal concept analysis approach is extended to develop a query-based ontology knowledge acquisition system for users who ask questions in Chinese inborn language. After

Nowadays, a well-known the google search engine has more than 800 million web indexes, so that users can search for the information they need by keywords searching on the Internet, yet they still need to consume much more time on data selection. However, in the first experiment, we input questions into the google search engine, and compared its answers from the top-ten out of the searching list with ours validated, to verify the superiority of our system, where domain experts must evaluate the accuracy of the answers that are produced by query-based partial ontology and whole ontology, respectively. Furthermore, we divided a total of 210 questions sampled from sports forum into five parts, namely: WHERE, WHETHER, WHOSE, WHICH_ONE, and WHO, shown as Table 8, based on which we adopted the formula (4) to evaluate the precision of the performance of proposed approach. In second experiment, we still sample questions from the same database, sports forum. Table 9 shows the number of each question types for baseball. Fig. 10 shows the screen of expert viewer, by which domain experts can browser all of user’s queries and check the answer with its related part of news documents.

Precision ¼

The total of correct answers The total of input questions

ð4Þ

10175

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

Fig. 8. A query-based partial ontology.

Table 7 Score table.

Table 8 Number of questions for basketball.

Table 9 Number of questions for baseball. Question types

Major League Baseball (MLB) Number of questions

Chinese Professional Baseball League (CPBL) Number of questions

WHERE WHETHER WHOSE WHICH_ONE WHO

15 7 15 23 40

6 9 16 11 58

Total

100

100

5.2. Experiment I In this experiment, basketball sport new pages about National Basketball Association (NBA) in Chinese from udn.com website since 2007/04/01 to 2007/06/30 are retrieved by Retrieval Agents, with more than 180 documents were collected. Table 10 shows the number of documents, the number of sentences, the terms set orig-

Fig. 9. The searching results and related contents.

10176

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

Fig. 10. The evaluation viewer of domain experts.

Table 10 Number of documents and terms for basketball documents (NBA).

Total

Number of documents

Number of sentences

Number of terms before term filtering

Number of terms after term filtering

180

1728

66,296

25,493

Table 11 Results of terms analysis with Chinese natural language processing. Part-Of-Speech

POS tag

Nouns

Na Nb Nc Nd

8127 5084 766 1271

2286 331 144 93

Verbs

VA VC VHC VH VJ VD

1492 3679 208 3376 1204 286

350 775 45 837 188 37

25,493

5086

Total

Number of terms

Fig. 11. Precisions of three types of approach comparison.

Number of divergent terms

10177

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179 Table 12 Precision on query-based partial ontology judged by domain experts. Question types

Correct/wrong

WHERE

Correct Wrong

Number of questions 31 5

Correct rate/error rate (%) 86.11 13.89

WHOSE

Correct Wrong

4 3

57.14 42.86

WHETHER

Correct Wrong

10 1

90.91 9.09

WHICH_ONE

Correct Wrong

11 2

84.62 15.38

WHO

Correct Wrong

81 62

56.64 43.36

Total

Correct Wrong

137 73

65.24 34.76

Table 13 Precision on whole ontology judged by domain experts. Question types

Correct/wrong

WHERE

Correct Wrong

Number of questions 32 4

Correct rate/ error rate (%) 88.89 11.11

WHOSE

Correct Wrong

5 2

71.43 28.57

WHETHER

Correct Wrong

3 3

72.73 27.27

WHICH_ONE

Correct Wrong

12 1

92.31 7.69

WHO

Correct Wrong Correct

84 59 141

58.74 41.26 67.14

Total

Wrong

69

32.86

Table 14 Precision on the google search engine judged by domain experts. Question types

Correct/wrong

WHERE

Correct Wrong

Number of questions 27 9

Correct rate/error rate (%) 75.00 25.00

WHOSE

Correct Wrong

1 6

14.28 85.72

WHETHER

Correct Wrong

9 2

81.82 18.18

WHICH_ONE

Correct Wrong

8 5

61.54 38.46

WHO

Correct Wrong

102 41

71.33 28.67

Total

Correct Wrong

147 63

70.00 30.00

Table 15 Runtime for answer searching on query-based partial ontology, whole ontology and the google search engine. Average runtime of answer searching on query-based partial ontology (s)

Average runtime of answer searching on using whole ontology (s)

Average runtime of answer searching on the google search engine (s)

8.94

150.88

0.84

Table 16 Number of documents and terms for baseball documents (MLB and CPBL). Number of terms before term filtering

Number of terms after term filtering

MLB CPBL

Number of documents 100 50

Number of sentences 976 345

2665 2291

1413 943

Total

150

1321

4956

2356

10178

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179

Table 17 Precision of using query-based partial ontology judged by domain experts. Question types

Correct/wrong

MLB

CPBL

Number of questions

Correct rate/error rate (%)

WHERE

Correct Wrong

11 4

73.33 26.67

3 3

50.00 50.00

WHOSE

Correct Wrong

4 3

57.14 42.86

4 5

44.44 55.56

WHETHER

Correct Wrong

12 3

80.00 20.00

14 2

87.50 12.50

WHICH_ONE

Correct Wrong

8 15

34.78 65.22

5 6

45.45 54.55

WHO

Correct Wrong

32 8

80.00 20.00

46 12

79.31 20.69

Total

Correct Wrong

67 33

67.00 33.00

72 28

72.00 28.00

inally and the number of terms after term filtering, while Table 11 shows the results of terms analysis. The results of Experiment I can be seen in Fig. 11, Tables 12–14. Tables 12–14 show the precisions on query-based partial ontology, whole ontology and the google search engine judged by domain experts, respectively. Fig. 11 displays the individual precision of replying the answer in three types of approach comparison, where the horizontal axis denotes question types, and the vertical axis denotes accuracy. From this histogram, we found that there was no variance on precision of right answers among the three approaches. Furthermore, we timed up for each answer searching on querybased partial ontology, whole ontology and the google search engine, respectively. As shown in Table 15, the performance of approach we yielded was better than that by building whole ontology. Although the google search engine spends the shortest result time than our proposed system, users still have to involve in those concerning documents with more time. 5.3. Experiment II We collect 185 baseball news documents for experiment II. They are divided into two sets inclucing 135 Major League Baseball (MLB) news documents in Chinese from http://mlb.im.tv/ since 2007/05/01 to 2007/06/30, and 50 Chinese Professional Baseball League (CPBL) news documents form http://www.cpbl.com.tw/ website since 2007/06/01 to 2007/06/30. Table 16 shows the number of documents, the number of sentences and the number of terms after term filtering for individual data sets. Table 17 shows the precisions on query-based partial ontology for MLB and CPBL document sets, respectively. 6. Conclusion and future work In this paper, we propose an approach to build a partial ontology that is constructed on the base of a user’s query in offering the immediate appropriate responses to the related documents requested. In order to verify our proposed approach, we implement a query-based ontology knowledge acquisition system, which replies the user’s questions and evaluates the accuracy by domain experts. While a user inputs a query, the system, by employing a formal concept analysis method, will construct a query-based partial ontology, where the system can present the conjecturing answers related to the user’s objective. According to the experiment, there are two major contributions in this paper: Firstly, this approach would be more available for a dy-

Number of questions

Correct rate/error rate (%)

namic knowledge base, where new documents increase in high frequency. In addition, users can obtain the information they need by our proposed query-based ontology knowledge acquisition system instead of the browsing and searching all documents. And the further research under our approach is expected to be applied to the extension of the domains concerned and the improvement of the performance on the answers searched for. Acknowledgements This work is supported by the Nation Science Council of Taiwan under the contract NSC95-2221-E-006-158-MY3. References Boucher-Ryan, P. D., & Bridge, D. (2006). Collaborative recommending using formal concept analysis. Knowledge-Based Systems, 19(5), 309–315. Brewster, C., & O’Hara, K. (2004). Knowledge representation with ontologies: The present and future. IEEE Intelligent Systems, 19(1), 72–81. Buchli, F. (2003). Detecting software patterns using formal concept analysis. Switzerland: University of Bern. Chau, K. W. (2007). An ontology-based knowledge management system for flow and water quality modeling. Advances in Engineering Software, 38(3), 172–181. Chi, Y. L. (2007). Elicitation synergy of extracting conceptual tags and hierarchies in textual document. Expert Systems with Applications, 32(2), 349–357. CKIP AutoTag (1998). Chinese knowledge information processing group. Taiwan: Academic Sinica. Davey, B. A., & Priestley, H. A. (2002). Introduction to Lattices and Order. Cambridge University Press. pp. 65–84. Formica, A. (2006). Ontology-based concept similarity in formal concept analysis. Information Sciences, 176(18), 2624–2641. Hsieh, T. C., Tsai, K. H., Chen, C. L., Lee, M. C., Chiu, T. K., & Wang, T. I., (2007). Querybased ontology approach for semantic search. In Proceedings of the 6th international conference on machine learning and cybernetics, (pp. 2970–2975). Hong Kong. Huhns, M. N., & Stephens, L. M. (1999). Personal ontologies. IEEE Internet Computing, 3(5), 85–87. Jiang, G., Ogasawara, K., Endoh, A., & Sakurai, T. (2003). Context-based ontology building support in clinical domains using formal concept analysis. International Journal of Medical Informatics, 71(1), 71–81. Jun, T., Du, Y. J., & Shen, J. F. (2005). Research in concept lattice based automatic document ranking. In Proceedings of 2005 international conference on machine learning and cybernetics (pp. 5560–5565). Guangzhou. Kim, J. M., Choi, B., Shin, H. P., & Kim, H. J. (2007). A methodology for constructing of philosophy ontology based on philosophical texts. Computer Standards & Interfaces, 29(3), 302–315. Lee, C. W., Shih, C. W., Day, M. Y., Tsai, T. H., Jiang, T. J., Wu, C. W., et al., (2005). ASQA: Academia sinica question answering system for NTCIR-5 CLQA. In Proceedings of the NTCIR-5 Workshop Meeting, (pp. 202–208). Japan. Lee, M. C., Tsai, K. H., & Wang, T. I. (2008). A practical ontology query expansion algorithm for semantic-aware learning objects retrieval. Computers & Education, 50(4), 1240–1257. Lee, C. S., Jiang, C. C., & Hsieh, T. C. (2006). A genetic fuzzy agent using ontology model for meeting scheduling system. Information Sciences, 176(9), 1131–1155.

T.I. Wang et al. / Expert Systems with Applications 36 (2009) 10168–10179 Lee, C. S., Kao, Y. F., Kuo, Y. H., & Wang, M. H. (2007). Automated ontology construction for unstructured text documents. Data & Knowledge Engineering, 60(3), 547–566. Myat, N. N., & Hla, K. H. S. (2005). Organizing web document resulting from an information retrieval system using formal concept analysis. In Proceedings of the 6th asia pacific symposium on information and telecommunication technologies, pp. 198–203. Tho, Q. T., Hui, S. C., Fong, A. C. M., & Cao, T. H. (2006). Automatic fuzzy ontology generation for semantic web. IEEE Transactions on Knowledge and Data Engineering, 18(6), 842–856. Tsai, K. H., Lee, M. C., & Wang, T .I. (2006). A learning objects recommendation model based on the preference and ontological approaches. In Proceedings of the 6th IEEE international conference on advanced learning technologies, (pp. 36–40). The Netherlands.

10179

Tsai, K. H., Chiu, T. K., Lee, M. C., & Wang, T. I. (2007). Automated course composition and recommendation based on a learner intention. In Proceedings of the 7th IEEE international conference on advanced learning technologies, (pp. 274–278). Japan. Weng, S. S., Tsai, H. J., Liu, S. C., & Hsu, C. H. (2006). Ontology construction for information classification. Expert Systems with Applications, 31(1), 1–12. Wille, R. (1982). Restructuring lattice theory: An approach based on hierarchies of concepts. In I. Rival (Ed.), Ordered sets reidel (pp. 445–470). Boston: Dordrecht. Yan, H., Jiang, Y., Zheng, J., Peng, C., & Li, Q. (2006). A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Systems with Applications, 30(2), 272–281. Zhou, X., Wu, Z., Yin, A., Wu, L., Fan, W., & Zhang, R. (2004). Ontology development for unified traditional Chinese medical language system. Artificial Intelligence in Medicine, 32(1), 15–27.