EngngApplic.Artif. lntell. Vol. 6, No. 4, pp. 307-316, 1993 Printedin Great Britain. All rights reserved
0952-1976/93$6.00+ 0.00 Copyright~ 1993PergamonPressLtd
Contributed Paper
Intelligent User Interface to SQL-based Database System G. Y. SHOW Nanyang Technological University, Singapore
HINNY KONG Nanyang Technological University, Singapore
W. K. LIN Nanyang Technological University, Singapore
(Received September 1992; in revised form January 1993) One of the areas which draws a great deal of attention from researchers in database systems is the definition of query languages characterized by both high expressive power and ease of useJ -4 In this paper, an intelligent database interface system, IDQS'is proposed. This system serves'as the interface between users and relational database systems. IDQS enables users to query the database in English, and does away with the familiarization cost. IDQS consists mainly of two big blocks: a naturallanguage processing unit and a knowledge-base. The NL processing unit and knowledge-base are two completely separate entities. This system structure provides a very good base for a transportable system. To transport the system to a new domain, one only needs to update the knowledge-base. No modification iias.to be done on the NL processing unit. The knowledge-base is designed in such a way that knowledge updating is an easy task which can be carried out by a semi-expert database user. Keywords: Natural language processing, artificial intelligence, database system.
1. INTRODUCTION
NLmenu, s Inglish, 6 Kaleidoscope, 7 Query By Diagram ( Q B D ) , 8 T E A M 9 and Mapping N L Constructs. ~° Intelligent Database Query System (IDQS) was initiated with the above-stated objective in mind. I D Q S is a user interface system which sits in between the users and the database systems. It reads in a query in English and transforms the query into Structured Query Language (SQL). 11
Recently, there has been a remarkable change of user profile of database systems. Many office workers, students and businessmen have become the majority segment of the database user population. With the upsurge of database users from different sectors, the userfriendliness of database systems is put under test. In order to use a database effectively, users have to familiarize themselves with the database system and master some kind of query language. This familiarization cost has hampered the further spread of the database user population. Although, there is an emergence of non-procedural fourth-generation languages, these do not seem friendly, enough for casual users. Therefore, researchers have focused on designing user interfaces to database system which enable users from different sectors to use a database with minimum familiarization cost. Representation of this system are
IDQS is a user interface to a relational database system. Therefore, it is vital to understand the structural features of a relational database prior to understanding the interface system itself. Relational databases are collections of relation tables which obey the following disciplines:
Correspondence should be sent to: Hinny Kong, School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 2263.
• Each table row is unique. • Data cells which are unique to each row are primary keys of the table. • Rows and columns are unordered.
2. STRUCTURAL FEATURES OF RELATIONAL DATABASES
307
G.Y. SHOW et al.: SQL-BASEDDATABASESYSTEM
31)8
Table 1. Staff table Staff-name
Game
Subject
John Willy
Tennis Squash
Electrical Electronics
• Each table cell has only one data value. • Each table must contain at least one primary key. • Whenever the data required come from more than one table, multiple-table joining has to be performed to join the tables together. With the above-stated structural features of relational database, one simple sample database can be studied to provide familiarization with the terms used. The following characteristics can be drawn from Table 1: • Each and every table in the database must have a unique table name (the table name in this case is "staff-table"). • The first row of the table which labels the rest of the rows is called the field-name of the table. For example, "staff-name", "game" and "subject" are field-names of the table. • The data cells in the rows other than the first row are called field-values of the table. For example, "John", "tennis", "electrical", "Willy", "squash" and "electronics" are field-values of the table. • All the field-values in the same column belong to the same field-name. For example, both "John" and "Willy" belong to the field-name "staffname". A query can be viewed as a collection of lists of field-names and field-values. Field-names of a query are treated as the request of the SQL command and field-values are used as constraints in the SQL command to filter out the unwanted data. For example: Query: List the lecturers who like to play tennis. This query consists of a field-name "lecturer" (which is a synonym of "staff-name") and a field-value "tennis". The corresponding SQL command is as below: Select staff-name From staff-table Where game = "tennis". 3. SYSTEM STRUCTURE OF IDQS The overall system structure of IDQS is illustrated in Fig. 1. The system consists of a Natural Language Processing Unit, a Knowledge-base, an Oracle Interface Module and a System Response Generator. The knowledge-base is a collection of various type of knowledge regarding the lexicon of words, grammar rules, world-knowledge, database structure and SQL. The NL processing unit is an integration of five modules, namely Parser, Interpreter, Inference Machine, SQL Sorter and SQL Generator. The NL
processing unit analyses NL inputs with the assistance of the knowledge-base and generates the corresponding SQL command. The analysis process can be simplified into the following stages:
Stage 1 The query in English is fed to a parser, which parses the query with the aids of lexical and linguistic knowledge. The output of the parser is an internal representation of the query. Stage 2 An interpreter interprets the internal representation produced by the parser. If the result of the interpretation is inadequate to form an SQL command, the Inference Machine is invoked to infer additional information from the world knowledge (consisting of caseframes). The output of this module is a list of fieldnames and field-values. If there is any "operator", "group-by clause" and "having clause", the corresponding information is also included in the output. Stage 3 An SQL sorter, which refers to the database structural knowledge, searches for the tables and joints between the tables of the field-names and field-values. The output contains lists of requests, constraints, operator, joints and tables. Stage 4 An SQL generator then reads in the output from the SQL sorter and with the aid of SQL knowledge, generates the SQL command. The SQL command is output to a file. Stage 5 An Oracle interface module reads the SQL command from the file and accesses the relational database. The detailed description of various modules is discussed in the next section. 4. MODULE DESCRIPTIONS To understand a natural language, both linguistic knowledge of the particular language being used and world knowledge relating to the topic being discussed are required. This knowledge is coded into the knowledge-base of IDQS. The following sections give detailed descriptions of the utilization of knowledge by various modules in the NL processing unit. 4.1. Parser The parser refers to a dictionary and a grammar called Left-Associative Grammar (LA-grammar) 12'13 and generates an internal representation which is commonly known as a parse-tree. The parser analyses a sentence from left to right, always combining a sentence start and a next word into a new sentence start.
G. Y. SHOW et al.: SQL-BASED DATABASE SYSTEM
309
KNOWLEDGE BASE
USER'S NL QUERY
Parser
•
Interpreter
Inference Machine
SQL
Generator
SYSTEM'S RESPONSE
<2
•
I Oracle ' Interface
SYSTEM RESPONSE GENERATOR
IDQS
RELATIONAL
DATABASE Fig. 1. IDQS system overview.
In LA-grammar, words are defined as ordered pairs consisting of a surface and a category, e.g. (teach (N D A V)) and a set of rules is defined to operate on the categories of the words. A rule r~ is an ordered pair (coi rp~), where co~ is a categorial operation and rp~ is the rule package of r~ containing the rules which can possibly apply after successful application of r~. Left-Associative rules are expressed in the following form:
r,: [cat-1 cat-2] ~ [rpi cat-3] The categorial operation coi takes cat-1 (the category of the current "sentence start") and cat-2 (the category of the current "next word") as inputs and derives cat-3 (the category of the resulting "sentence start") as output. In the next combination, the rules in the rpi are
applied to the sentence start resulting from ri and a new "next word" from the sentence. When a Left-Associative rule applies, the surface of the "next word" is appended to the surface of the "sentence start". This results in an output surface one word longer than that of the input "sentence start". The LA-grammar rules presently consist of 16 rules. An abstract formulation of rule number 2 is shown below. r - 2: [(bXc) (b)] ~ [ ( r - 2, r - 3) (Xc)] Where: r - 2 is the rule name; (bXc) is the sentence start category; (b) is the next word category; (Xc) is the resultant sentence start category; ( r - 2, r - 3) is the set of rules which can possibly apply to the resultant "sentence start" category and the new "next word" category.
310
G.Y. SHOWet al.: SQL-BASEDDATABASESYSTEM Semantic
Syntactic Derivation
(v)
Hierarchy
Sentence 2_1 1
~tennls(N-H) UW) p~Mya(S3AV)
SubJ
Verb
NP-I!I 1
plays
1A
0NS)
Det
Noun
which
student
I
Dlr-obj
I
NP-2 1 1
,I
Name
I 21
stud.nt(S H) whlch(SQ)
I
tennis
Fig. 2. Semanticstructureof sentence. The parsing process can be illustrated by a simple sentence, "Which student plays tennis?". The data structure of the dictionary is: (CAT MEANINCrPOS AT]? OFN) Where: CAT is the word category; MEANING is the meaning of word in the database domain; POS is the part of speech of the word in the sentence; ATT is the attribute of the word which indicates whether the word is a field-name or a field-value in the database domain. If the word is neither a field-name nor a field-value, this should be nil; OFN indicates the field-name of the word (if it is a field-value). If the word is not a field-value, this should be nil. The following entries can be found in the dictionary of the knowledge-base. (student (((SH)) STUDENT-NAME NOUN F-N)) (which (((WQ)) WHICH))
the set of rules possibly applicable to the new sentence start "(WS) which student" and the new next word "plays" after the rule DET + NOUN is applied successfully. This process continues until there is no further new next word. 4.1.1. Parse tree generation
The process which works in tandem with the syntactic analysis is the parse tree generation. The parse tree generator combines a linear syntax with a hierarchical semantics, as illustrated schematically in Fig. 2. Each subtree in the semantic hierarchy of Fig. 2 corresponds to a word in the surface. The semantic hierarchy is built up in the following left-associative order: 1+2~(1,2) (1,2) + 3 =:),(1,2,3) (1,2,3) + 4 ~ (1,2,3,4)
(plays ((($3 A V)) PLAY VERB)) (tennis (((N-H)) TENNIS NAME F-V GAME)) In a sentence with n words, there are n - 1 left associative combinations. For this sample sentence, there are three left associative combinations, as shown below: Combination 1 (WQ) which + (SH) student ~ (WS) which student RULE-PACKAGE = *DET + NOUN Combination 2 (WS) which student + ($3 A V) plays ~ (A V) which student plays RULE-PACKAGE = *NOUN + FVERB Combination 3 (A V) which student plays+ (NH) tennis ~ (V) which student plays tennis. Application of LA-rule DET + NOUN to "which" and "student" results in the new sentence start "(WS) which student". The rule-package * D E T + N O U N indicates
In other words, the semantic representation of word 1 is combined with the semantic representation of word 2, resulting in the subtree (1,2). Then the subtree (1,2) is combined with the semantic representation of word 3 into the subtree (1,2,3), etc. The parse tree in Fig. 2 consists of frames, slots and fillers. A filler can itself be a frame. A frame-name like SENT-I_I_I consists of a base (e.g. SENT) and an index (e.g. 1 1_1). Different instances of frames with the same base are distinguished by the different indices. The index of a frame has three digits. The first represents the number of the word from which the frame or the filler has been generated. The second is the copy number of the frame, and the third is the number of the sentence in a query section. If a sentence has several constituents of the same category, the frames with the same base are distinguished by different word numbers (first digit of the index). If a sentence start is ambiguous, the frame of
G. Y. SHOW et al.: SQL-BASED DATABASE SYSTEM Table 2. SQL comparison operators and their NL
each reading is distinguished by a different copy number (second digit of the index). The output of the parser, a parse tree, is then fed to an interpreter, which will be discussed in the next section.
representations NL
Rule I In a natural language query, all the field-names or their synonyms are handled either as variables in a "Group-by" clause or as requests in a "Select" clause. The decision of which field-name should be a variable in a "Group-by" clause or a request in a "Select" clause is governed by Rule VIII. Similarly, all the field-values are handled as the constraints which are used as search conditions in "Where" clauses. Except for the condition stated in Rule H, the number of requests always reflects the number of subqueries in an SQL command. The first request is treated as the variable in the toplevel "Select" clause; the second request is taken as the variable in the second-level "Select" clause, and so on. Consider the following query: Which staff member plays the same game as Hinny? This query consists of two field-names, namely "staff" and "game" and a field-value "Hinny". Thus, abide by Rule I, the query is formulated into the following SQL command. Select staff-name From staff-game-t Where game = (Select game From staff-game-t Where staff-name = "Hinny") Rule H The field names before and after the conjunction "and" are treated as the requests for the same "Select" clause. This is illustrated in the query below: List the student and staff who play tennis The field-names "student" and "staff" are joined by the conjunction "and" in the above query. Thus, both of them are the requests of the "Select" clause as shown in the SQL command below: Select student-name, staff-name From student-game-t, staff-game-t Where game = 'tennis'. Rule III All the comparison phrases in NL queries have their corresponding set of SQL comparison operators as listed in Table 2 (not exhaustive). Recognition of such
representation
Not equal to Equal to Greater than Less than Between * and * Higher than
4.2. Interpreter For a thorough analysis of the parse trees and how they relate to the context of database, Interpreter has a set of rules which are oriented specifically to the generation of SQL commands.
311
SOL comparison operator != = > < between * and * >
phrases calls for a search in the look-up table stored in the knowledge-base to extract the corresponding comparison operators. Rule IV In an SQL command, the group functions are used to operate on the field-names in "Select" and "Having" clauses. Table 3 lists some of the commonly used SQL functions. Such functions can usually find a corresponding NL representation. Rule V The field-name located either at the left hand side of or between an NL comparison phrase (e.g. salary is more than, higher grade than) will be placed directly into the left-hand side of the SQL comparison operator in the SQL command. Similarly, if the right-hand side of the NL comparison phrase is a number quantity, it will be placed directly into the right-hand side of the SQL comparison operator. As an example, the SQL command for the query: Which student scores a higher grade than 80? is Select student-name From student-t Where grade > 80 In many cases, the search conditions might not be expressed using number quantities. Sometimes, there might be a nested query and such decision is guided by Rule V.I: If the field-value on the right hand side of the NL comparison is not a number quantity, then a formulation of a subquery must be performed. The subquery is another independent query nested in the main query bloc. Each subquery is completed with its "Select", "From" and "Where" clauses. The variable for the inner-level "Select" clause must be the field-name on Table3. SQL group functions and their NL representations NL representation
SQL function
The The The The The
avg(*) count(*) max(*) min(*) sum(*)
average of number of maximum of minimum of total of
312
G.Y. SHOWet al.: SOL-BASEDDATABASESYSTEM
the left-hand side of (or between) the NL comparison phrase. Sometimes, the field-name on the left-hand side of a comparison phrase is omitted and a group function presents on the right hand side, such as the query below: Which student scores higher than the average grade in Lisp? In this case, the omitted field-name has to be inferred from the verb "scores" in the sentence. The inference process is carried out by referring to the syntactic information of the sentence as well as the world knowledge in the knowledge-base. The inference machine unveils that the direct object of the verb "scores" is absent in the sentence, whereas in the world knowledge, the direct object is "grade". On the other hand, the right hand side of the comparison phrase is neither a number quantity nor a field-value, but a group function "average" which operates on the field-name "grade". Thus, comply to rule IV, the right-hand side of the comparison phrase has to be a subquery whose "Select" clause variable is a group function. The resultant SOL command for this query is: Select student-name From student-t Where grade > (Select avg(grade) From student-t Where subject = 'Lisp') Rule V1
Not all NL queries explicitly expressed their requests. In the case where the requests are not explicitly stated, the requests are usually implied by the use of WH-expressions such as "Where" and "When". "Where" is referring to a location and the field-name implied is "venue". The implied request for "When" is the field-name "time". Rule VII
When a field-name (request) is not explicitly stated in an NL query, the verb in the NL query has to be analyzed to infer the intended field-name. For example: Query 1: Which lecturer teaches John ? Query 2: Who teaches John ? In query 1, both request "lecturer" and constraint "John" are expressed explicitly. In contrast, only a constraint "John" is found in query 2; there is no fieldname stated explicitly in this query. In both cases, the inference machine is invoked to infer the additional information. The inference machine searches and matches the verb "teach" with the case-frames residing in a knowledge-base, and eventually the following caseframe is found: [TEACH agent = lecturer
pupil = student location = lecture-theater] Where: "Teach" is the name of the case flame. "Agent", "pupil" and "location" are the case-slot names. "Lecturer", "student" and "lecture-theater" are fillers of the case-slots. For query 1, no additional information is inferred. But for query 2, the pupil of the "teach" action is found to be equal to "John" who is a student. Thus the agent of "teach" action must be equal to "lecturer". Sometimes, the information in the world knowledge is inadequate to perform a thorough inference. In this case, the inference machine has to prompt the user to provide additional informations. For instance: How many students fail Lisp? The inference machine searches for the case-frame of "fail" and the case-frame is found in the knowledgebase as below: [FAIL agent = student dir-object = subject if-needed = ask] In this case-frame, an "ask" function is attached to the "if-needed" slot. When activated, this function will ask the user to enter the fail mark. With this additional information and Rule IV, the SQL command for this query can be formulated as below: Select Count(student-name) From student-t Where subject = 'Lisp' And grade < fail-mark Rule VII
Sometimes, the query might operate on a specific logical sub-group of data in a table. For instance: Query: List the top score of the classes in Electronics Group. Query: List the class with the biggest registration in the EEE school. For these queries, the additional "group-by" and "having" clause have to be formulated in the final SQL command. This can be guided by the following rules: Rule VIIL1
If a request and a field-name are related to each other by a group function (e.g maximum grade of the students), the relation is to be expressed in a "Groupby" clause. The "Group-by" clause always comes after the "Where" clause, or the "From" clause if there is no "Where" clause. The variable for a "Group-by" clause is always the field-name following the request. Rule 11111.2
If the left-hand side of a comparison phrase consists of an SQL group function such as "population", "the
G. Y. SHOW et al.: SQL-BASED DATABASE SYSTEM
number of", etc., the relation is to be expressed by the group function, the comparison operator and the "Having" clause following the "Group-by" clause. Consider the following example: Query: What are the classes whose registrations are more than 40? The SQL command is: Select class From student-t Group By class Having count (student-name) > 40. Rule VIII.3
If the constraint on the right-hand side of the comparison phrase from Rule VIII.2 is not a number quantity, the constraint must be re-expressed in a subquery. The variable for the inner-level "Select" clause must be the same as the group function on the left-hand side of the comparison operator. This is exemplified in the following query: What are the classes whose registrations are more than Class-31? The SQL command is: Select class From student-t Group By class Having count(student-name) > (Select count(student-name) From student-t Where class--'Class-31'). Rule I X
As the function of a function (e.g. max(count(staffname))) is not allowed in SQL command, the system will not handle NL queries which require this function of function operation. For example: Query: What is the most popular game amongst the staff in EEE? will generate the SQL command: Select max(count(game)) From staff-game-t Group by game This SQL command will induce errors in the Oracle database. To remedy this shortcoming, the function of function can be simulated by combining Group-by, Having and subquery. The simulation of the above query is: Select game From staff-t A, game-t B Where A.staff-name = B.staff-name And A.school = 'EEE' Group by game
313
Having count(game) > = all (Select count(game) From staff-t A, game-t B Where A.staff-name = B.staff-name And A.school = 'EEE' Group by game) The output of the interpreter is lists of field-names, field-values, operators, group-by clause variables and having clause variables. These outputs are then fed to the next module, the "SQL Sorter". 4.3 SQL sorter The SQL Sorter is a module which searches for the tables containing the lists of field-names and fieldvalues produced by the Interpreter. The knowledge of the database structure is coded into the knowledgebase. The database structural knowledge is stored in the form of a table and a network. The table is illustrated in Table 4 and the network is shown graphically in Fig. 3. As can be seen in Table 4, the names of the tables in the relational database and their corresponding fieldnames are listed in a table format. To find out which table contains the field-name, one can access Table 4 using this field-name as the access key. If the records required are distributed over several tables, a common field-name will be used to join them together. The database structure network in the knowledge-base is useful for multiple-table joining. The usefulness of the network is illustrated with an example: Query: Who are the students of Hinny? By referring to Table 4, field-name "student-name" (synonym of student in database domain) is found in student-t and field-value "Hinny" (whose field-name is "staff-name") is found in staff-t. Therefore a tablejoining process has to be performed to find the join. The joining process involves a breadth-first search through the network. The following steps describe the breadth-first search strategy. 14 • The network consists of tables and field-names as its nodes, and the connections between the nodes are called arcs. • There are a starting node, many intermediate nodes and a finishing node. The problem is to find a path from the starting node through some of the intermediate nodes to the finishing node. • The strategy is to manipulate a list of partial paths, from starting node to intermediate nodes, until one of the partial paths is extended far enough to Table 4. Database structure knowledge student-t staff-t subject-t time-t
student-name staff-name subject time
game school year value
314
G.Y. SHOWet al.: SOL-BASEDDATABASESYSTEM
amej staffname staff__tsble sch~._l
game time_table :lass table
md._nW Fig. 3. Database structure network. become an acceptable complete path. The list of partial paths is called the partial paths queue. • The search process starts with the creation of an initial partial path queue consisting of just one partial path. The initial path only contains the starting node. • The first partial path on the queue is extended to all its neighbors. Then the first partial path is replaced with the newly extended partial path, adding the new partial path to the appropriate places in the partial paths queue. • The process continues until one of the partial paths becomes an acceptable complete path from the starting node to the finishing node.
SOL generator generates the final SOL command with the assistance of the SQL knowledge layout in section 4.2. The final output is written to a file.
For the sample query, the starting node is "studentname" and the finishing node is "staff-name". Following the above strategy, a complete path which is shown in Fig. 4 can be found for these two nodes. From this path, the field-name which joins student-t and staff-t can be found.
An IDQS prototype was developed on the TI Explorex Workstation. All except the Oracle interface module were developed in Common Lisp. 14''6 The interaction of these modules with the knowledge-base is shown in Fig. 5. Some of the queries that can be handled by the system include:
4.4. SQL generator The list of requests, constraints and tables is then passed to the SOL generator. If there is any table joining, "Group-by" and "Having" clause, the necessary data are also passed to the SOL generator. The
I stud-name~
4.5. Oracle interface An Oracle interface module written in C reads the SOL command file produced by SOL generator and accesses the Oracle database. ,5 This interface program is necessary because the system is written in Lisp, a language which does not interface with the Oracle relational database system.
5. IMPLEMENTATION
1. What is the registration of class F31? 2. Which staff member has the highest salary in the EEE school? 3. Who are the lecturers of the classes whose population are more than class F31?
tutorial ~ s t l f f - n a m e class Fig. 4. Completepath for student-nameand staff-name.
I
World Knowledge
--'V" Data6
315
_/ D~
Data5
Date8 Dam7
SQL & Database Knowledge
~ Datal
Data2
Lexical & SyntacticKnowledge DatalO RelationalDatabase
Datal
Which staff played tennis ?
Data2
a) Lexical Entries: (Which (((W-H) tWO_))WHICH)) (staff (((SH)) STAFF-NAME NOUN F-N)) (played (((IN A V)) PLAY VERB)) (tennis (((S-H)) TENNIS NOUN F-V GAME)) b) Grammar Rules : Sixteen Left-Associative Grammar Rules
Data3
Internal Representatioo (parse tree): (Sent-3 1 1 (SUBJ (NP-I_I_I (REF Which) (NOUN Staff-name)))
)ata 11
~ R B play) (Dm-OBJ (NP-4_I_I (NAMEtennis)))) Data4
SQL Knowledge consist of a set of rules. a) Operator table for comparisons such as more than, less than, equal to, and etc. b) Rules of SQL functions such as count(*), max(*), min(*) and etc. c) Rules of Using Group By and Having Clauses. d) Rules of subquedes formulation. e) Rules of table joining.
Data5
A list of field-names and field-values
(STAFF-NAME TENNIS) Data6
World Knowledge
Data7
Database and SQL Knowledge a) A network which illustrates the relationship of the field-names and the tables in the relationaldatabase. b) A table depicts the table-names and their ~ e c t i v e field-names.
Data8
Lists of field-names, field-values, operators and the table-names where these field-values and field-names are found. (STAFF-NAME) (STAFF-GA]VIE-T) (TENNIS STAFF-GAME-T =)
Data9
SQL command: SELECT staff-name FROM staff-game-t WHERE staff-game-t.game = 'tennis'
Datal0
Data in the relational database.
Datal I
Required data which was extracted from the relational database. Fig. 5. Dataflow diagram of IDQS.
316
G.Y. SHOW et al.: SQL-BASED DATABASE SYSTEM
4. W h a t are the n u m b e r of classes of respective subject-groups in the E E E school? 5. W h a t is the average salary of the staff in the E E E school? 6. H o w m a n y staff m e m b e r s are there in E E E having P h D s ? 7. List the n a m e of the staff m e m b e r s who are interested in playing tennis. 8. W h a t is the most p o p u l a r g a m e a m o n g s t the staff in the E E E school? 9. W h e n is the Lisp lecture c o n d u c t e d ? 10. W h a t lecture is going to be c o n d u c t e d t o m o r r o w at 12.00.00 by H i n n y ? 6. C O N C L U S I O N In this p a p e r , an intelligent interface system I D Q S was presented. This w o r k p r o p o s e d a possible system structure capable o f translating English queries into S Q L c o m m a n d s . P e r h a p s the most significant contribution of this w o r k is the establishment of a set o f rules which is o r i e n t e d specifically to the interpretation of the internal r e p r e s e n t a t i o n g e n e r a t e d by the parser. These rules play an i m p o r t a n t role in the interpreter m o d u l e : they decide what sorts of information are extracted f r o m the N L queries; they also decide w h e n to invoke the inference m a c h i n e to acquire additional inform a t i o n f r o m the k n o w l e d g e - b a s e as well as the users. Nevertheless, these rules are not perfect to the extent that they are capable of handling all sorts o f N L queries. T h e r e are still a lot of r o o m for i m p r o v e m e n t . Future w o r k is suggested to w o r k on this set o f rules to i m p r o v e the N L queries coverage of the system.
REFERENCES 1. Englels G., Gogolla M. and Hohenstein U. Conceptual modelling of database applications using an extended ER model. J. Data Knowledge Engng 9, 157-204 (1992). 2. Bertino E. and Musto D. Query optimization using knowledge about data semantics. J. Data Knowledge Engng 9, 121-155 (1992). 3. Weigand H. Assessing functional grammar for knowledge representation. J. Data Knowledge Engng 8, 191-204 (1992). 4. Gerstl P. Linking linguistic and non-linguistic information. J. Data Knowledge Engng 8, 205-222 (1992). 5. Tennant H. R. and Saenz R. M. Building usable menu-based natural language interface to databases. 9th Conference on VLDB, pp. 101-110. 6. Philips B. and Nicholl S. Inglish: a natural language interface. In Foundation for Human-Computer Communications. IFIP WG 2.6 Working Conference on the Future of Command Languages,
pp. 10-21 (1985). 7. Cha S. K. Kaleidoscope: a cooperative menu-guided query interface (SQL Version). IEEE Sixth Conference on Artificial Intelligence Application, Vol. 1, pp. 304-309, Stanford University (1990). 8. Angelaccio M., Catarci T. and Santucci G. QBD: a graphical query language with recursion. IEEE Trans. Software Engng 16, 1150-1163 (1990). 9. Grosz B. TEAM: an experiment in the design of transportable natural language interfaces. Proc. Artif. lnteU, pp. 173-243 (1987). 10. Tseng F. S. C., Chen A. L. P. and Yang W.-P. On mapping natural language constucts into relational algebra through E-R representation. J. Data Knowledge Engng 9, 97-117 (1992). 11. Sayles J. S. SQL for DB2 and SQL/DS Application Developers. QED Information Sciences Inc. (1990). 12. Hausser R. Computation of Language. Springer, Berlin (1989). 13. Kong H. and Show G. Y. Evaluation of parsing techniques for natural language processing. Proc. Int. Conf. on Information Engineering, Singapore, pp. 422-432 (1991). 14. Winston P. H. and Horn B. K. P. LISP, 3rd edn. Addison-Wesley, Reading, MA (1989). 15. Sayles J. S. Embedded SQL for DB2, Application Design and Programming. QED Information Sciences Inc. (1990). 16. Steel G. L. Jr. Common Lisp, The Language, 2nd edn. Digital Press (1990).
AUTHORS' BIOGRAPHIES Guan Ytmag Show received his B.Eng. (Honour) degree from Nanyang Technolgical University (NTU), Singapore. He is currently a Research Assistant at the School of Electrical and Electronic Engineering, NTU. Himay Pe Ilia Kong performed his graduate study in computer and information sciences at University of Massachuesetts at Amherst. He is currently a Senior Lecturer of Information Engineering, School of Electrical and Electronic Engineering, NTU, Singapore. Prior to this post, he held both technical and management positions at Bell Canada, the Hong Kong Productivity Council and Digital Equipment (Far East). He is a member of ACM and I E E E (Computer Society). His research interests are software engineering, database, distributed systems, and intelligent software. Kenneth Weate Lin received his Ph.D. and Sc.M. degrees from Brown University and is currently an Associate Professor of Information Engineering, School of Electrical and Electronic Engineering, NTU, Singapore. Prior to this post, he held both technical and management positions at Computer Corp. of America and Xerox Corp., and had taught in the California State University at Northridge. He is a member of ACM and IEEE. His research interest is in distributed database systems.