Interactive communication of sentential structure and content: an alternative approach to man-machine communication

Interactive communication of sentential structure and content: an alternative approach to man-machine communication

Int. J . Man-Machine Studies (1989) JO, 121-148 Interactive communication of sentential structure and content: an alternative approach to man-machine...

2MB Sizes 0 Downloads 23 Views

Int. J . Man-Machine Studies (1989) JO, 121-148

Interactive communication of sentential structure and content: an alternative approach to man-machine communication R. CHANDRASEKAR AND

S.

RAMANI

National Centre for Software Technology, Gulmohar Cross Road No.9, Juhu, BOMBAY 400049, India (Received 22 September 1986 and in revised form 25 February 1988) Natural language communication interfaces have usually employed linear strings of words for man-machine communication. A lot of 'intelligence'--in the form of semantic, syntactic and other information-is used to analyse these strings, and to puzzle out their structures. However, use of linear strings of words, while appropriate for communication between humans, seems inappropriate for communication with a machine using video displays, keyboards and a mouse. One need not demand too much out of machines in this area of analysis of natural language input; one could bypass these problems by using alternative approaches to man-machine communication. One such approach is described in this paper, for the communication of the content and structure of natural language sentences. The basic idea is that the human user of the interface should use the two dimensional screen, mouse and keyboard to create structures for input, guided by appropriate software. Another key idea is the use of a high degree of interaction to avoid some problems usually encountered in natural language understanding. Based on this approach, a system called ScreenTalk has been implemented in Common LISP on a VAX workstation. The man-machine interface is used to interactively input both the content and the structure of sentences. Users may then ask questions, which are answered using stored information. ScreenTalk now operates on a database of brief news items. It has been designed to be fairly domain independent, and is expected to be used soon in other applications. The conceptual framework for such an approach, the design of the experimental interface used to test this framework and the authors' experience with this interface are presented.

1. Introduction Four decades after the first electronic digital computer was built, most computers in use remain bound to very 'mechanical' activities. The number of words that we use to communicate with computers is no more than a few hundred, and even these are interpreted in a primitive manner. Three decades of research in artificial intelligence (AI) has produced very interesting concepts and techniques for representing knowledge in a machine and for processing this knowledge, thereby creating a basis for natural language communication. Tennant (1981), Barr & Feigenbaum (1981) and Hayes & Carbonell (1983) have surveyed relevant work. But there are very few systems, if at all, in practical use with which we can communicate the way we talk to each other. One of the major reasons for this, we argue, is the absence of a good design for man-machine interfaces. The importance of such an interface has been recognized widely and major attempts have been mounted to create 'Natural Language 121 0020-7373/89/020121 + 28S03.00/0

© 1989 Academic Press

Umiled

122

R. CHANDRASEKAR AND S. RAMANI

Understanding (NLU)' systems (e.g. Sager, 1981) However, there are many problems in understanding natural language. The standard approach to NLU, which involves analysis of strings of words (sentences or utterances), requires a lot of knowledge to be built into the system to overcome these problems. But even all this packaged intelligence does not always solve the problems inherent in NLU. We point out problems in this approach, and suggest an alternative approach based on a special purpose man-machine communications interface: one not designed to resemble a human listener, but designed to capitalize on the strength of an interactive video terminal. This approach provides for a front-end to NLU systems allowing the communication (and internal representation) of both content and structure of natural language sentences, while bypassing some problems in language understanding. Internal representations of incoming sentences can further be processed appropriately for applications such as machine translation.

2. The objective Our immediate objective is to have an interactive system which will take in the infromation contained in English sentences in a manner that is suitable both to the user and the system, and transform it into an intermediate representation which can be unambiguously interpreted by further processing. The output of this sentence comprehension system should be in the form of

who

when

what

TOURNAMENT

name

f-WOFlLD CUP

lIome SOCCER

"-+J

Flo. 1. Typical output from a sentence comprehension system.

COMMUNICAnON OF SENTENTlAL STRUcruRE AND CONTENT

123

structures like those shown in Fig. 1 for the sentence 'Argentina wins World Cup soccer tournament in Mexico-City on 30th June 1986". Availability of information, at the input stage, directly in the form of such structures would vastly simplify the task of coping with information intelligently. The problem is to design a system permitting this. This means that such a system should be able to cope with the problems peculiar to natural language. 2.1. PROBLEMS WITH NATURAL LANGUAGE UNDERSTANDING

The following simple model can be said to hold when two people communicate with each other: The speaker, who wants to convey some information, has a mental representation of such information. He then converts this representation into a linear string of words, and either speaks or writes it out. The listener/reader mentally converts this string of words into a structure, and then goes about understanding the content of this structure. One basic objective in Natural Language Understanding is to build a system which, in part, emulates the listener above; that is, we want the system to be capable of taking in all sorts of (say) English sentences, and creating internal representations for their content. The problem in understanding natural language is that the structure in speech and in writing is hidden. Puzzling out a structure from a linear string of words becomes a major problem in itself, requiring intelligent use of syntactic, semantic, pragmatic and possibly extralinguistic information. It is, therefore, not surprising when the listener has difficulty in recreating the structure that the speaker had in mind. To understand the implications of this, it is useful to list some major problems of this type. Lexically ambiguous words At the level of words, it may not be easy to determine whether a word like "issue' is used as a noun or as a verb. Another form of ambiguity involves a word having two or more meanings, while playing the same syntactic role: e.g. 'kid' meaning the young one of a goat, or a young human. Syntax and comprehension Minor changes in word order can change meaning drastically. For example, compare "They have just built a flying machine" with "They have built just a flying machine". However, some syntactic transformations leave the basic meaning unchanged. Human readers cope with syntactic and lexical errors and comprehend sentences despite such problems (seemingly incomplet or ungrammatical sentences is usually comprehensible!). Multiple interpretations Depending on the context, the same sentence may be interpreted in many ways. Consider: "Visiting scientists can be interesting." Are the scientists interesting, or is the act of visiting that is of interest? Compare this with:" Visiting museums can be

124

R. CHANDRASEKAR AND S. RAMAN I

interesting". The problem here is to decide which word is being qualified by which qualifier.

Pronoun reference Whom does the 'he' refer to in the sentence: "The commander told the spy that he would be shot the next day"? How do we know this? Other problems When we get beyond single sentences to conversational fragments or multiple sentences, other issues like cooperative responses (Kaplan 1982) , the pragmatics and context of the situation, issues like causality and planning come inlo the picture. From the above, it is clear that Natural Language Understanding systems need to use a lot of information about words, syntax and semantics, pragmatics, shared beliefs in a conversation, intentions behind a statement or query, a knowledge of idiomatic and metaphorical usage etc. Humans seem to have all this information and are able to apply this in their everyday conversations with other humans. But how do we get a program to perform at this level of expertise? 2.2. OUR APPROACH: USING INTERACfION AND CONVEYING STRUCTURE

Advances made in language understanding by people like Schank (1973), Winograd (1972) and Woods (1970) are well known and have formed the basis for practical man-machine interfaces coping with limited subsets of English . There have also been programs like DIAGRAM (Robinson 1982), LSP (Sager 1981) and HAMANS (Hoeppner et of. 1983), which have impressive capabilities, based on an extensive knowledge-base of language. The latter set of programs (especially) try and solve all problems purely through the knowledge encapsulated in them; as a result, they are huge systems. Growth in complexity and size of such systems is unavoidable if their scope is to be extended, unlike systems which solve part of the problem through human interaction. Further advances can be expected in this area, enlarging the range of discourse permitted and speeding up the analysis of input. However, we argue in favour or not demanding too much from machines in the area of analysis of natural language; we would rather bypass problems in the input or sentences and the representation of their content by using an al1emative approach. We do visualize a combination of techniques from both approaches to be particularly valuable. The approach to be presented in the sequel can be characterized by the following principles:

(1) A man-machine interface for NL communication should provide for interactive communication of structure as well as content. It should not burden the computer with the task of puzzling out a suitable structure for a string of words. (2) As far as possible, the new system should avoid problems such as word disambiguation by the use of appropriate display structures in which words are entered (unambiguously) by the human user. (3) TIle system should handle any remaining ambiguities (as in pronoun references) by interaction . The gu iding principle should be: When in doubt, ask the user. The case for these principles follows .

COMMUNICATION OF SENTENTIAL STRUCTURE AND CONTENT

125

2.2.1. The case for communicating structure Once it is clear that recreating sentence structure (from a linear string of words) is not too easy, besides being error-prone, it is natural to make it possible for the user (who knows fully well the structure that he wants to communicate) to convey structure information along with the content. Two steps, the linearization of a menta) structure at the source, as well as the recreation of the original structure, at the destination, can be avoided in the process of communication . But this does not come cheap; there is a price to pay for the advantage gained. In this case, the price to be paid lies in the additional effort to be invested in conveying structure. The user has somehow to break up his sentences into reasonable, digestable chunks and feed those chunks to the system, along with information on the relationships between these chunks. While this may be trivial for simple sentences, it could be tiresome for sentences of some complexity, As we shall see later, we can introduce mechanisms to simplify this task, without losing track of our original objective. The exploitation of interaction for full fledged man-machine communication requires a bold departure from a mental set we have inherited from two decades of work. We should be willing to accept a machine-like interface quite different from the interfaces for human communication. 22.2. The case for interaction Use of linear strings of words, very appropriate for auditory communication between humans, seems inappropriate for communication with a machine using video displays, keyboards and pointing devices like a mouse or a joystick. Recent technical developments have made comfortably large screens (say 19") and significant computing capability (1 to 10 MIPS) widely available through workstations. They also provide additional facilities, such as windowing, All this has further contributed to the feasibility of highly responsive man-machine interfaces. In an interactive system, it is useful to have different windows to keep track of the state a user is in, how he got to be there and what he can do. It is also helpful to have some sort of choice mechanism made available either through menus, icon-selection or some other form of pointing. The exact modality of choice is not important; what is important is that there must be some mechanism to display choices, and accept a user's selection of one of these. Hence, at a conceptual level, it is as relevant to talk of menu-selection as of icon selection. Menu-selection has brought into wide use interactive communication of content. But, so far, interfaces based on menu-selection have placed severe limits on what a human can say to a machine; effectively, all he can do is to fill the blanks in a predetermined set of sentences. They do not provide for general purpose communication, that is, for communication of the structure and content of arbitrarily chosen sentences. There has been one system reported by Tennant, Ross & Thompson (1983) to overcome this limitation, but the system described there is also restricted to a set of predetermined elements for sentence construction. Section 8 contains a discussion of this and other similar systems.

126

R. CHANDRASEKAR AND S. RAMANI

3. An Interface for the Communication of Structure and Content With the objectives set out in the previous section, a new type of interface was designed, satisfying the requirements of Section 2.2. One version of this interface (called ScreenTalk) has been created; its implementation and our experience with it will also be described. t 3.1. STRUCTURE REPRESENTAnON

The first step towards the new interface is the recognition that associative structures can be conveniently input an interactive video display unit (VDU) as in Fig. 2. For example, in the system to be described, the sentence shown diagrammatically in Fig. 1 could be entered as follows:

ROOT WORD «ESC) TO EXIT): WHO WHAT WHEN WHERE

wins Argentina World-Cup soccer tournament June 301986 Mexico-City

FIo. 2. ScreenTalk representations of structure in Fig. 1. In this, and in other figures, prompts from ScreenTalk are in CAPITALS, while the rest is user input.

How does this structure information help? Consider the following sentences: (1) Time flies like an arrow (cited by Winograd 1983). (2) She was given more difficult books by her uncle (Robinson 1982). The first sentence has many different interpretations depending on which word you use as the root or predicate in the sentence. In ScreenTalk, there would be only one interpretation possible because the user explicitly defines the predicate and (as we shall see) each sense of the predicate has its own skeletons and hence different instantiations. Jane Robinson (1982) points out that the second sentence has at least six different interpretations, depending on the words that the qualifiers modify. (For example, does 'more' qualify 'books' or 'difficulty'? Were the books given by her uncle, or were they written by her uncle but given to her by someone else?). Since ScreenTalk forces the user to explicitly specify the modifiers and the words they qualify, the user automatically identifies the unique interpretation through the structure he enters. Thus two interpretations of the first sentence would be input as in Fig. 3. With flies as the root (Fig. 3b), one gets the normal interpretation of time moving fast. With like as the root (Fig. 30), the interpretation is slightly unusual: a special breed of flies (called time flies) have a weakness for arrows! tin the interests of sharing our experience widely, the software for the interface described will be made available to readers for a small handling charge. ScreenTalk is implemented in VAX LISP (Common LISP) on a VAXstation II running MicroVMS and VAX Workstation Software, driving a VR260 19" bit-mapped screen equipped with a mouse.

COMMUNICAnON OF SENTENTIAL STRUcruRE AND CONTENT

Root : like Who : time flies What : an arrow (a) With like as the root.

127

Root : flies What: time How : like an arrow (b) With flies as the root.

FIG. 3. Two different structures for "Time flies like an arrow".

3.2. HANDLING PRONOMINALS INTERACTIVELY

A very demanding feature of Natural Language Understanding is the assigning of interpretations to pronomials such as He/She/It etc. To do this successfully, the unit of analysis should be larger than a single sentence. There are few systems attempting to do this. But one does not have a tilt at this windmill, designing an ambitious pronominal interpreter. An interactive video terminal can respond to the use of a pronominal by displaying the equivalent of immediate memory, listing all possible entities which could conceivably be referred to by a pronominal. This means that the system has to keep track of the last so many objects referred to by the user, and treat that as the list of potential pronominal referents. A menu selection response can identify this entity without error, eliminating the need for highly intelligent analysis on the part of the machine. This is a sample of what can be achieved if we intelligently query the user in case of doubt. Obviously, it helps if the system recognizes 'she' to stand for a female human and displays only the relevant items for menu selection by the human user.

4. Information units (lnfits) and Skeletons Two basic structures are necessary in the interface being described-the skeleton and the infit (which is a portmanteau word standing for Information unit). 'Skeletons' correspond to 'types', while 'infits' correspond to 'tokens'. Skeletons as well as infits are nested compositions of attribute-value pairs. Our assumption is that all information can be coded into structures built in this manner. Both attributes and values carry 'labels' which are words in the language of discourse. In the sequel, we will describe words, skeletons and infits in detail. In ScreenTalk, each word has many skeletons. Each skeleton corresponds to a particular sense of a word. The lexical information about the word and the information about its skeleton are kept together in ScreenTalk. However, it is useful to split the discussion of these entities into two levels: the lexical level and the skeletal level. In the next two sub-sections, therefore, we will discuss these two levels as if they were separate components, even though they are two facets of the same entity. 4.1. WORDS: THE LEXICAL LEVEL

The basic unit of language that we will be handling is a word. ScreenTalk has a lexicon in which words frequently used in dialogues with the system are stored. In

128

R. CHANDRASEKAR AND S. RAMANr

the ScreenTalk lexicon, there is a lexical entry for each word, which consists of one or more sense entries, one for each sense of the word. The part of speech (POS) and 'meaning' of each sense of the word are stored in each sense entry . The POS field is used in a parser for regular expressions that is built into ScreenTalk, for the limited purpose of recognizing the structure in short phrases. The 'meaning' field is provided only for the benefit of the human user of the system-for disambiguation of words. For example, a word like 'issue', has the lexical entry: word verb noun

issue to give out [sense 1] child (sense 2}

If the word "issue' is used in ScreenTalk, the system has to determine the sense of the word meant. ScreenTalk will display all the sense-entries for that word, and prompt the user to choose one of them. There are two mechanisms to reduce the burden of the users in the definition and choice of words. ScreenTalk assumes all words to be part of its 'lexicon'. However, the system does not require lexical entries to be (pre-) defined for each word . ScreenTalk allows the user to defer this definition of the lexical entry, until the POS is required for parsing. At that stage, ScreenTalk prompts the user to supply requisite information. Thus, there is prompting only on the basis of a need for such information . The other is a very simple 'context' mechanism. If a word like 'issue' is used in a predicative sense ('to give out') at some point in a session, the chances are that the next use of that word is very likely in the same context. ScrecnTalk uses this information to show rudimentary intelligence when the user is expected to choose between alternatives: the user is prompted to say if it is used in the same sense as the previous time; if not, the user can always use the choice mechanism. The 'context' is maintained even across sessions, a given user's context being stored in his file 'directory'. 4.2. CREATING SKELETONS; mE SKELETAL LEVEL

Every word heard by a human listener triggers off expectations, the satisfaction of which leads to reasonable communication. For example, the verb 'killed' raises the following expectations: 'who did the killing', 'who was killed', 'why', 'when' etc. These expectations are viewed in ScreenTalk as attributes of the word. For every noun, verb and adjective, a 'skeleton' is maintained. A skeleton is a list of elements. Each element is an attribute - value-expectation pair. To instantiate an occurrence of a skeleton, its attributes have to be given values by the user. The value-expection defines the constraints on the values that each attribute may take, reflecting the semantic restrictions associated with the word. Hence constraint satisfaction checks can be done on the value input at the time of instantiation. Some attributes in the skeleton have necessarily to be filled, while some others are optional; these are marked as such in the skeleton. As the first step in creating a skeleton, the canonical form of the word is entered. The canonical form is the singular form in the case of verbs and nouns. For this word, a series of attributes and the corresponding value-expectations are entered. A typical skeleton entry would be something like Fig. 4.

COMMUNICATION OF SENTENTIAL STRUcruRE AND CONTENT

Structure number: 10 10.1 word 10.2 who 10.3 what 10.4 when 10.5 where

129

wins NIL NIL NIL NIL

Flo . 4. The skeleton for the word 'wins' ,

What this skeleton defines is this: the word 'wins' has several attributes like who, whom, what, when and where. There is no restriction on the values that these attributes can assume. Thus any input will be acceptable for each of the expectations. For an example of something slightly more complicated, consider the skeleton for 'date' (Fig. 5). The first attribute, date, is constrained to lie within the range 1 to 31, and the month is constrained to the range 1 to 12. The value for year defaults to 1986. The day can only be one of the values given in the list, within square brackets. The list being defined is named, and stored. Later, the user can refer to the list by its name. Though the word being described ('date') is written the same as the first attribute 'date', there is no confusion since they are being used in different contexts. For the moment, we will ignore the fact that this skeleton can be used to define illegal dates. There are other types of value-expectations and values that can be used. The complete list of value-expectations that are now available in ScreenTalk and the restrictions they impose on the input are given in the Appendix. 4.2.1. Hierarchical inheritance of properties Many knowledge representation systems offer a property-sharing mechanism, typically as a link between a 'parent node' and a 'child node'. One such mechanism is the IS-A hierarchy of KRL (Bobrow & Winograd 1977). ScreenTalk offers a similar facility through 'root-of links between skeletons. ENTER WORD «ESC) TO EXIT) ATTRIBUTE «ESC) TO EXIT VALUE ATTRIBUTE «ESC) TO EXIT) VALUE ATTRIBUTE «ESC) TO EXIT) VALUE ATTRIBUTE «ESC) TO EXIT) VALUE NAME OF nus LIST

date date Range 1 31 month Range 1 12 year DEFAULT 1986 day MEMBER-OF (Mon Tue Wed Thu Fri Sat Sun) days-of-week

Flo. 5. A skeleton for 'date' , using ranges, defaults and lists.

130

R. CHANDRASEKAR AND S. RAMANI

Word root bearded

: man : human

Word root

: MEMBER-OF (Yes No)

: woman : human

hair-colour : MEMBER-OF (blonde brunette)

FIG. 6. Skeletons for 'man' and 'woman', showing property sharing.

Consider the skeleton for the word 'human', We could have something like this: Word Number-of-eyes Number-of-legs Name

human 2 2 NIL

Suppose we now have to define skeletons for the words 'man' and 'woman' , Most of the properties that we have defined for 'human' will be applicable to skeletons for 'man' and 'woman' . In addition, there are additional attributes that we have to define for them. It does not make sense for the attributes of 'human' to be duplicated in the skeleton for ' man' . What is required, instead, is a method by which properties may be shared or inherited by objects. In ScreenTalk, sharing of properties is implemented using a parent-ehild inheritance heirarchy. In our example, the skeletons for 'man' and 'woman' would be as in Fig. 6. This means that 'man' has all the properties of 'human' , plus special properties such as 'bearded: ...'. Thus the skeleton for 'human' acts like a parent to the child-skeleton 'man'. The case with 'woman' is similar. Note that this hierarchy may be built to any depth. Thus we could have a hierarchy as in Fig. 7. Properties are shared by recursive access to parent skeletons. Note also that the same attribute may appear at different levels. For example, if you have defined

object

/ animate / -,

mammal

""-inanimate /

-,

/". non-human

human

/""woman

man

/

"

FIG. 7. A sample inheritance hierarchy.

COMMUNICATION OF SENTENTIAL STRUCfURE AND CONTENT

131

'human', and want to define 'Cyclopean', you may say: Word root number-of-eyes

Cyclopean human 1

The scope of an attribute and its value is defined as follows: The value associated with an attribute holds at the level in the hierarchy where the attribute is defined, and at all its descendant levels, unless and until it is redefined (i.e., until it is given a new value). This hierarchical property inheritance may be used for skeletons of verbs too-given a skeleton for 'kill', a skeleton for 'killed' may be defined simply as Word root tense

killed kill past

Thus it is enough to define the skeleton for one 'canonical' form of the word-each of its derived forms can be defined with a root pointer to the canonical form along with associated modifying attribute-value pairs which differentiates it from the root skeleton.

4.2.2. Canonical forms and variants Every skeleton may have a 'root' indication. This root indication is entered by the user. For example 'dog' has the root 'animal', whereas 'gallop' has the root 'run'. This means that 'dog' has all the properties that 'animal' has, and more too; 'gallop' has all the expectations that 'run' has, and more (like "speed: very fast"). For every canonical form, other forms of the word like plural form, past tense form etc. will be stored as separate skeletons, with the canonical form as the root. In scanning input words to get a unique word number, the system will look for a maximal substring match in the lexicon. This will thus allow us to treat particlecarrying verbs (verbs with associated particles like 'give up', 'give way') as a single lexical item. 4.3. CREATING INFITS

In ScreenTalk, information is stored as a list of attribute-value pairs; this composite is called an 'infit'. Each infit is created from the responses to prompts triggered off by (specification of) its keyword-in other words, each infit is an instantiated skeleton. When the user indicates a desire to create an infit, he is prompted for the 'root' of the infit. The skeleton for that 'root' word is then looked up and the details of its attributes, values and value-expectations are obtained. All the ancestors of that skeleton are likewise accessed (using the root indication), and details are obtained from them also. For each mandatory attribute in that list, the user is prompted to supply a value that satisfies the corresponding value-expectation. After all mandatory attributes have been given values, the user is shown a list of optional attributes. He can then choose the attributes he wants to instantiate, and input the appropriate value for each one of them.

132

R. CHANDRASEKAR AND S. RAMAN!

The value-expectation describes the set of all specific values that can be assigned to an attribute. The value expectation is, therefore, more general than a specific value. For example, the value-expectation could be the range 1900 to 2000. The value 1986, which satisfies this expectation, is more specific. So the process of instantiating a skeleton to form an infit may be viewed as a process of refinement of known information. But it may happen sometimes that, for some attribute, no more detail is available than is present in the value-expectation. In such a situation, the user will leave, by default, the uninterprcted value-expectation itself as the value. After all this, the user may still find that some new attributes (not listed in the skeleton) need to be added to this infit; he can then add as many such arbitrary attribute-value pairs as he wishes. When the user is through with the input, the system composes an infit, using the user's name to create and append an author attribute (HEntered-by"), indexes its components and stores it. Attributes and values in the infit need not be atomic. They can be sub·intits, that is, they can have structure, with their own attribute-value lists. Each sub-in tit will have an attributed called 'word'; this attribute will have a specific value. The other attribute-value pairs in the structure describe properties of this value. A noungroup, for instance, would classify as a sub-infit. So would a verb phrase. The key noun or the verb will define the value of the attribute 'word'. An infit which has one or more sub-in fits as its values or attributes will be called a multi-level infit, since the tree structure diagram of the infit may have atomic values at various levels . Note that sub-infits are input in the same way that an infit is. For example 'the black dog'. would be represented as: word dog colour unique

black true

This sort of nesting of infits within infits can go on to any depth. Note also that it is the user who defines the attributes and value-expectations; so he can create any directed acyclic graph of relations between the skeletons. Some examples of infits are presented in the sequel. The figures are different from what is seen on the screen during an actual ScreenTalk session. For convenience, only the dialogue window is shown in these figures. The first example (Fig. 8) presents the infit corresponding to Figs 1 and 2. Note ROOT WORD (ESC) TO EXIT): wins WHO Argentina WHAT tournament/ • EXPANDING SKELETON FOR WORD: 'TOURNAMENT' . ROOT tournament A1TRIBUTE «ESC) TO EXIT) name VALUE World Cup ATIRIBUTE «ESC) TO EXIT) game VALUE soccer ATIRIBUTE «ESC) TO EXIT) • SKELETON FOR WORD: 'TOURNAMENT' EXPANDED. WHEN June 30 1986 WHERE Mexico-City FIG. 8. An infit as the user inputs it.

COMMUNICATION OF SENTENTIAL STRUcruRE AND CONTENT

Structure number: 297 297.1 root 297.2 who 287.3 what 296.1 root 296.2 who 296.3 what

133

believe the police blew up terrorists the aircraft

FiG. 9. A multi-level infit as it is displayed.

the use of 'tournament,". This indicates to the system that the user plans to instantiate the skeleton for that word at that position, to create a (nested) sub-infit. If this skeleton does not exist, the user is allowed to input arbitrary attribute-value pairs to describe the word being opened-out. Such openings-out lead to the creation of multi-level infits. Currently, ScreenTalk does not 'learn' the new skeletal structure for the root of the sub-infit being thus defined. Clearly, implementation of this learning mechanism is a possibility worth exploring. In Fig. 9 another example of a multi-level infit is seen. Here the main infit talks of the police believing something, and that something is described in a sub-infit. The internal structure of the infit is shown here, as it would appear in print. The level of indentation on each level gives an idea of the nesting of infits involved. The numbers there refer to details of the implementation. Lists can also be used as values. Once a list is defined, it is named by the user. ScreenTalk stores such lists for future use. The list names can be used to refer to these lists. Figure 10 gives an example of an infit with an embedded list.

ROOT WORD «ESC) TO EXIST: WHO

reach MEMBER-OF (Argentina Belgium France Germany) NAME OF THIS LIST quarterfinalists WHAT quarter-finals OF tournament/ • EXPANDING SKELETON FOR WORD: 'TOURNAMENT. ROOT tournament ATTRIBUTE «ESC) TO EXIT) name VALUE World Cup ATTRIBUTE «ESC) TO EXIT) game VALUE soccer ATTRIBUTE « ESC) TO EXIT) • SKELETON FOR WORD: 'TOURNAMENT EXPANDED. ATTRIBUTE «ESC) TO EXIT) where VALUE Mexico when ATTRIBUTE «ESC) TO EXIT) VALUE June 261986

Fio. 10. An infit with an embedded list.

134

R. CHANDRASEKAR AND S. RAMANI

4.3.1. A finite state grammar in Screen Talk

The general policy in ScreenTalk is that sentences will be input along with their structure. This rule should apply to segments of sentences too. But a compromise is made in this case-ScreenTalk will take in phrases provided they fall into a limited class of sentence-segments; they have to be parsable by a small finite state grammar parser, which recognizes a few categories of 'phrases'. (See Item 4 in the Appendix for a grammar of such phrases.) The parser converts the given phrase into an appropriate structure. So the internal representation of information remains the same . This leads to some ease of use without sacrificing the fundamental ideas of the system. Of course, if the user prefers it, he can still input all information as a sequence of atomic attribute-value pairs. Thus, in ScreenTalk, it is possible to enter whole noun groups, verb phrases etc. as input, and expect the system to form an appropriate sub-in fit structure for this group. Some examples of acceptable sentence fragments are given below: e.g. a cute little girl ran tirelessly aimlessly wandered 4.3.2. Infit qualfiers The root of an infit need not be atomic. The root phrase may be of the following forms: [(Modal)/(Auxiliary)] [NOT] (Verb) [(Noun) ]

This allows the qualification of an infit in several different ways. In general, a sentence in English may have qualifiers which can be classified into five broad categories: qualifiers of will, qualifiers of probability, qualifiers of necessity , qualifiers of (subjective) truth/estimation and qualifiers of absolute truth. Words in these categories include 'will', 'may', 'must' and 'know'. Such qualifiers are essential in any language, and hence must find a place in any scheme for the representation of language. In ScreenTalk, all these qualifiers are represented with two attributes 'Modality' and 'Truth', which are defined for each infit. They are defaulted to NIL and T respectively, meaning that unless the modal is otherwise specified, an infit would read "It is asserted by 'the person entering the infit' that ..." If a root phrase (including a modal and/or negation) is input, rather than a root word, ScreenTalk fills up the value of Truth and Modality appropriately. Thus the expressiveness of the interface language is enhanced, without demanding that the user should attend to grammatical detail.

5. Processing Different Types of Sentences There are four basic types of sentences one encounters in English (Robinson 1982): declarative (0) (e.g. The car is waiting); imperative (I) (e.g, Wait outside); polar interrogative (PI) (e.g. Is the car waiting?); and non-polar interrogative (NPI) (Where is the car waiting?).

COMMUNlCATION OF SENTENTIAL STRUCTURE AND CONTENT

135

The imperative form is structurally similar to the declarative form, except that the subject here is 'You' . PI forms take a Yes-No answer (or sometimes a May-be/Could-be type of answer), whereas NPI forms take a WH· (Where/How/What ...) structure and a corresponding answer. In ScreenTalk, one can represent all such sentences (l/D/PI/NPI) using infits. But only declarative sentences will be stored as information in ScreenTalk's knowledge base. Imperative and interrogative sentences can occur in reported speech (quotes) in a declarative statement (e.g. He shouted 'Bring the wine!'). In the next section, we show how interrogative sentences are handled in ScreenTalk.

6. Query types, indexing and retrieval The scheme described so far lets a user input information into a system. One standard way to use such stored information would be to let the user ask questions about it. In a way, this would also test if information has been input in a proper fashion. This is the approach allowed in ScreenTalk. Information retrieval in ScreenTalk is essentially retrieval of 'matching' infits. A query is very similar to an infit, except that some of its arguments are unknown; this is indicated by putting a '?' in the appropriate value fields of a query. The system tries to match this incomplete infit with the stored infits, and outputs a proper reply. A certain level of similarity can be noted between this method and the Query-ByExample system of Zloof (1977). The actual process of matching is described below. Polar interrogative forms are also treated as queries; they deal with the truth value of an earlier declarative infit. (e.g. 'It the car waiting T' may be answered by finding an infit 'The car is waiting' and displaying its truth value .) The following figures, which show a sample query and the corresponding retrieval, (Figs 11 & 12) should give a feel for the process of query handling. In Fig. 11, some of the interaction features are shown, namely the status, prompt, dialogue and menu windows, with appropriate commands and messages in them. The query here is "Who wins what and when , in Mexico-City?" Type in any input

ScreenTalk:=:-) Answering Queries

«1986-07-13113:26:13»

ROOT WORD ({ESC) TO EXIT) : wins ? WHO ? WHAT ? WHEN Mexico-City WHERE

- - - Type "A (CR) for HELP 1- - -

Flo. 11. A typical ScreenTal1c. query to ask "Who wins what at Mexico-City, and when?".

136

R. CHANDRASEKAR AND S. RAMANI

Response 3 to your QUERY follows SCreenTalk:"":-) Answering Queries

«19~-13113:28:14)}

304.1 root wins Argentina 304.2 who 304.3 what 302.1 root tournament 302.2 name 301.1 World 301.2 foot Cup 302.3 game soccer 304.4 when 303.1 month June 303.2 date 30 303.3 year 1986 Mexico-City 304.5 where 304.6 Entered-by Chandrasekar Type (CR) to continue, E to edit, n to get nth response, (ESC) to quit! - - -Type -A (CR) for HELP--FIG. 12. Response to query in Fig. 11.

Note that '1' is used here to specify that a value in an infit is unknown. Here we want to know what this value is, if it is stored in ScreenTalk's data base. We are seeking information here. Figure 12 depicts a typical response to the query above. Note that the display is of the internal representation; so it may not look similar to manner in which the infit was input. Note also that this is the third response which matches the query conditions (The first two would already have been displayed). 6.1. INDEXING AND RETRIEVAL OPERATIONS

The requirements for a retrieval scheme such as ours are easily specified. Words are at the root of skeletons and infits and there can be an arbitrary number of words in a skeleton or an infit. The user might supply a target list of any subset of these words and request the retrieval of the infits in which they occur. Thus it is important thai each skeleton and infit be indexed on all its constituent words. Hashing transformstions, where the key (word) is mapped onto a number in a unique manner, are very useful for such indexing. But indexing by one word at a time is not enough. We know that the user may specify more than one word in the target list. In that case. we have additional information: each of the matching infits must contain all the specified words. Ramani & Chandrasekar (1986) suggest a method called 'coordinated hashing' tc handle this. This method uses columns of bits to represent the absence or presence of words in an infit, and combines this with hashing and the boolean AND operatior to return a list of possible matching infit numbers. These are used to retrieve the appropriate infits from a file of infits. This gives multi-key access to skeletons anc infits (where each word or phrase is a key), with the error rate due to hashing kep arbitrarily low.

COMMUNICATION OF SENTENTIAL STRUcruRE AND CONTENT

137

7. The implementation of ScreenTalk ScreenTalk was originally implemented on the DECsystem 10 at NCST, Bombay, in about 4100 lines of SIMULA code, and drove a vnoo terminal. The current version of ScreenTalk works on a VAXstation II; it has been implemented in VAX LISP which is DEC's implementation of Common LISP. This version requires VAX Workstation Software, which drives a 19" bit-mapped screen and a mouse. Since the SIMULA version used a non-intelligent terminal, all the window and screen handling software had to be written. The description below is essentially that of the SIMULA version; the VAX LISP version uses multiple windows, and menu selection using the mouse etc . The user of an interactive system should ideally have visual indication available at all stages of discourse, to tell him where he is, what he can do, and how he can do it. This is in line with the ideas of Nievergelt & Weydert (1980). To implement this, ScreenTalk uses a screen divided into four windows (see Fig. 13): the status window, prompt window, the dialogue window and the menu window. The status, prompt and menu windows are updated continuously, and serve to orient the user in his navigation through ScreenTalk. The SIMULA version of this program divides the screen into four areas; aU the examples in this paper have been taken from this version. The VAX LISP version uses windowing software to create status, prompt and dialogue windows; error and menu windows are created as necessary. ScreenTalk allows the user to input and edit skeletons as well as infits. In addition, it allows the user to frame queries about the stored information. It also makes contextual help available to the user. To aid the user, pronominal reference is handled by displaying the contents of the 'immediate memory' , whenever needed; the user may choose any of the words in this list as a response, simply by choosing from a menu. All functions are chosen through menus. Wherever there is a

Prompt Window ScreenTalk Command: «ESC) to Exit) : Status Window ScreenTalk:=:-) Waiting for a command Dialogue Window

«19~13100:01:17»

- - - Type •A (CR) for HELP I - - Menu Window 3) QUERY ScreenTalk Data, 1) Build a SKELETON. 2) Build an INFlT, 4) Edit a STRUCIlJRE, S) DlSPLAY Structures, 6) Save ALL Info , 7) Provide HELP

Flo. 13. Typical screen layout, annotated to show various windows, at ScreenTalk startup time.

138

R. CHANDRASEKAR AND S. RAMA

possibility of a choice between a small number of alternatives, the user is shown menu of these choices, and allowed to choose. ScreenTalk saves its information on files automatically at regular time interval ScreenTalk also gives the facility of logging all interaction. ScreenTalk has been tested with a knowledge base of brief news items, garnen from a leading English daily over 3 years (1984-1986). These are ideal because th: are simple and fairly independent of other information. The data base contai about 500 words and about 400 skeletons and infits. 7.1. LIMITATIONS OF THE CURRENT IMPLEMENTATION

(1) In handling queries in Version 1 of ScreenTalk, several simplifying decisio were taken. For example, it was decided that the result of a query will not be synthesized sentence; the result will be displayed as it is stored; this is, as structure. In this way we sidestep the problems associated with the synthesis text. While synthesis is easier than analysis, it is by no means a trivial probler (2) It was also decided that no inference of any sort would be possible in Version of ScreenTalk. So only information stored explicitly is retrievable. Retrieval based on matching, and not on deduction. These limitations will be overcome in succeeding versions of ScreenTatk. ~ experimental version of ScreenTalk written in LOGLISP (on the DEC-tO) allowrules to be specified, along with facts. The system could apply these rules, at deduce new information. Such inferential capability is being incorporated in tl VAX LISP version of ScreenTalk. 7.2. ISSUES BEYOND TIlE CURRENT VERSION

In this section, we present issues that have not been implemented in the curre (VAX LISP) version of ScreenTalk. We are working on some of these issues present. The other issues that we raise are those we feel are intimately connect. with the ideas in this paper, and to which further thought has to be given. Procedural access to information All the value-expectations and values described in the Appendix are passive; th. dictate the 'type' or syntax of the input, but do nothing more. In a system Iii ScreenTalk, however, it seems important to have procedural access to informatio One example of such a restriction could be: Age : (SCurrentYear - SBirthYear) Some simple interpretable language (which should ideally be at a higher level th: the ScreenTalk implementation language) has to be defined and used. Calls executable programs could also be included at this level. Networks of skeletons? At present, property sharing in ScreenTalk is implemented through the hierarchic 'root' mechanism, and all attributes of a root are implicitly passed on to : descendent skeletons. There is some doubt as to whether knowledge representatic requires a network of such skeletons instead of a tree (as in ScreenTalk). F

COMMUNICATION OF SENTENTIAL STRUCTURE AND CONTENT

139

example, 'dog' could have 'animal' as one root and 'domestic-pet' as another. In such a situation, which path to the root should one take to obtain all the relevant properties of this skeleton? Should we always go all the way to the root? If not, where should we stop? Answers to these questions are not all that easy. We have taken a decision to have only a hierarchy of skeletons, and we find that Charniak (1981) also takes a similar decision. In our implementation, properties are obtained from all skeletons going all the way to the root. This decision is based on the premise that all skeletal trees are likely to be shallow-that is, each skeleton will have very few ancestors. Controlled property sharing In the implementation of ScreenTalk, all attributes of a word can be accessed (i.e. are 'visible') at descendent levels. It may sometimes be necessary to control this visibility/accessibility at different levels (in a way this corresponds to the scope definition problem in programming languages.) We may also need some mechanism to be able to specify that some objects may have only one 'owner' (e.g. person X cannot have both Yand Z as Mothers). A context sensitive thesaurus In systems like ScreenTalk, it becomes necessary to treat two different words as synonyms, in order to process queries. For example, the sentence "Who assassinated Gandhi?" may have to be converted to "Who killed Gandhi?" before it is processed by the system. So some sort of Thesaurus is necessary in this system. But this Thesaurus has to have some intelligence built into it; it must be context sensitive so that it understands which synonyms are relevant at a given stage, and handles only those. Match modes Queries in Screen'Talk now retrieve all infits which exactly match a given query (which is nothing but a partially filled infit). It may be necessary to include some logical operators such as ANY, ONE, NOTANY etc. to act on the given query and output only those infits that satisfy the given constraints. Inter-attribute restrictions and named variables Consider the skeleton for the word 'commits suicide', using the root 'kills':

commits-suicide word root : kills who :X whom :X

Here we refer to the person named X killing X, defining the notion of suicide. For a skeleton for homicide, we might have person X killing person Y, (X:::/= Y), making it distinct from suicide. It is obvious that such named variables are needed to link up related information within a skeleton. Many other complex relations can be usefully represented using such named variables.

140

R. CHANDRASEKAR AND S. RAMANI

Named variables in queries It should be possible to specify variables in queries, and stipulate relations between

the variables in a query. This will permit the selection of solutions which satisfy the basic query, as well as the relations between (that is, the constraints on the value! of) the named variables. Pre-programmed functions There might be a need sometime for functions like NUMBER, AVERAGE TOTAL etc. to be defined in ScreenTalk, so that questions like: "How many infits refer to the exports of machine tools from Germany?"

can be answered. Such a facility would be useful for information-bases on specifk topics, for example one on the steel industry in a country.

8. How and why ScreenTalk is different: a comparative study In this section, we will describe various systems that have similarity to ScreenTalk ir some fashion or the other. In each case, a study of the similarities is followed by , discussion on the way ScreenTalk is different. 8.1. THE TI SYSTEM AND SCREENTALK

Conventionally, menus have been used as command selection procedures, where the user navigates through a tree of commands. Typically, the user chooses a menu iten among a set of fixed items, and is taken to another lower level of the program where he has to choose from another menu. In some sense, this operation is purelj syntactic. Tennant et al. (1983) have taken a different approach to menu usage. They buik up the system of menus with the application in mind, and then use menus t( compose natural language queries or statements. Individual menus (as well as mem items) are activated or deactivated depending on the context. If the user is buildin] up a query about parts, phrases related to this are retained in the menu. As the use proceeds with the query, newer menus become active. These determine which phrase: are selectable at each point. In some sense, therefore, this method is semantic. Since the menu-builder has already decided which combinations of phrases wil make a well-formed query, the user canot introduce queries of arbitrary phrases Thus the natural language understanding required is easy to program. The user alsr has the comfort of knowing which phrases he can use in formulating his queries. However, the system is limited to predetermined menus, which are compiled intr the system. The user cannot express himself using arbitrary sentences. Again, the system is oriented more towards interfaces to (relational) databases, rather thai interfaces to natural language understanding programs. Within these limitations however, it is an elegant system. 8.2. KRL AND SCREENTALK

KRL (Bobrow & Winograd 1977) is a language, built on top of INTERLISP, whicl uses frame (slot-and-filler) structures to represent knowledge. Some of its underly

COMMUNICATION OF SENTENTIAL STRUcruRE AND CONTENT

141

ing assumptions are quoted below (from Bobrow & Winograd 1977): (1) Knowledge should be organized around conceptual entities with associated descriptions and procedures. (2) A description must be able to represent partial knowledge about an entity and accommodate multiple descriptors which can describe the associated entity from different viewpoints. (3) An important method of description is comparison with a known entity, with further specification of the described instance with respect to the prototype. Entries in KRL are represented as UNITs. Some UNITs represent abstract concepts, while others represent specific instances of these concepts. These correspond to the skeltons and infits of ScreenTalk. Slot fillers in concept UNITs define default values of restrictions on values that may fill the slot ('value-expectations' in ScreenTalk). KRL allows the encoding of incomplete information as well as redundant information. KRL uses SELF (IS-A or 'root') links for property inheritance. KRL has a built-in pattern matcher to answer simple queries. However, the system does not have any theorem proving procedures. In all these respects, ScreenTalk is similar to KRL. However, KRL allows procedural knowledge to be embedded into the essentially declarative framework. This means that, for a few (specific) query types, more than normal pattern matching will be executed, allowing a slight increase in the complexity of the queries. ScreenTalk, while similar to KRL in the context of knowledge representation details, has a different philosophy altogether. KRL tries to represent knowledge of concepts in a more or less static manner. ScreenTalk interactively takes in sentential information, in a highly dynamic fashion. It is more oriented towards man-machine communication. ScreenTalk uses menus and the two-dimensional format of the VDU screen extensively in this transfer of information. ScreenTalk acts as an interface to Natural Language Understanding programs; it does not seek to be a general purpose Natural Language interface. ScreenTalk design philosophy consciously tries to bypass problems specific to natural language information input in the context of man-machine communication. KRL tries explicitly to be flexible, rather than "embody specific commitments about either processing strategies or the representation of specific areas of knowledge". 8.3. ZOO AND SCREENTALK

ZOG (Ramakrishna 1981; Robertson et al. 1981) is a rapid-response, large network, menu-selection system for human computer communication. Design features in ZOG include the following: (1) (2) (3) (4) (5) (6)

Rapid response to user menu-selection. Simple menu-selection gestures. A large network of frames, statically organized. Simple frames, each consisting of some information and a menu. Total user control. An active communication role, alowing ZOG to interact with existing programs and to provide guidance in running them.

142

R. CHANDRASEKAR AND S. RAMANI

ZOG provides the user the ability to traverse a graph-structured database. Each node in the graph is a frame. Note that 'frame' is used here in the sense of 'a screen or page of information', rather than in the AI sense of a slot-and-filler schema, ZOG allows the user to design and update this ZOGnet information base. ZOG uses schemas, as well as interaction styles. While ZOG also uses schemas and schema instantiation, it is markedly differen1 from ScreenTalk as an interface. ZOG interfaces to facilities (programs) and to a ZOGnet of frames of information. It is not primarily designed to be an interface tc Natural Language Understanding programs, which ScreenTalk is. Menu selection is used in ZOG as support for browsing, whereas it is used iI1 ScreenTalk to convey information structure and content. The nodes in ZOG'~ information net contain paragraphs of text, without explicit representation 01 structure. Different nodes are linked syntactically by menus. In ScreenTalk. individual nodes correspond to highly structured concepts or phrases. The links between nodes are semantic. 8.4. RABBIT, KL-ONE AND SCREENTALK

RABBIT (Tou et al., 1982) is an intelligent database assistant which provides fot retrieval by reformulation(refinement) of target descriptions. The main idea! underlying the theory behind RABBIT are: (1) Retrieval by constructed descriptions. (2) Interactive construction of queries. (3) Critique of example instances. (4) Dynamic perspectives.

To formulate a query, "the user interactively constructs a description of his target item(s) by criticizing successive examples (and counter-example) instances." One major notion in RABBIT is the idea of using a well-defined perspective inferred from the user's query and the knowledge base to present appropriate instances. Perspectives are used to control the type and amount of information presented, tc help the user understand better the instances shown, to enforce some sorts oj semantic consistency and to organize and manage heterogeneous data. RABBIT is of value to casual users with vaguely formulated queries who need to be guided iI1 query formulation. The KL-ONE formalism (Brachman & Schmolze, 1985) for representing knowledge has been a major influence on the development of RABBIT, which is why we have taken the liberty of discussing RABBIT and KL-ONE together. To start with, RABBIT accesses a KL-ONE network as an experimental database. Instance classes comprising RABBIT description (similar to 'skeletons' of ScreenTalk) are represented using KL-ONE generic concepts. Specific instances (corresponding to ScreenTalk's 'infits') are represented by KL-ONE individual concepts. RABBIT's attribute-value pairs are represented by KL-ONE role-value pairs. Since KL-ONE supports an inheritance network, RABBIT is able to use this directly. ScreenTalk's representation mechanism closely resembles KL-ONE's, and hence that of RABBIT. But the major idea in RABBIT seems to be the new paradigm for retrieval. The other key point lies in the use of perspectives to control various

COMMUNICATION OF SENTENTIAL STRUcnJRE AND CONTENT

143

aspects of the dialogue. Thus while their knowledge representation seems to be similar, their objectives are totally different. It is possible that some notion similar to that of perspectives may be of value in ScreenTalk. We should also note that RABBIT has no aspirations to being an interface to Natural Language Understanding programs. 8.5. KRYPTON AND SCREENTALK

Krypton (Brachman et al. 1983) is a knowledge representation system which clearly distinguishes between definitional and factual information by using both framebased and logic-based languages. That is, Krypton uses two languages, one for forming descriptions and the other to make statements about the world using these descriptive terms. In addition, Krypton provides a functional view of a database, which is characterized in terms of what knowledge can be added or queried, rather than in terms of particular structures it uses to represent knowledge. Krypton has two main components: a terminological one, or T Box and an assertional one, or A Box. Using the T Box, one can establish taxonomies of structured terms. The T Box supports two types of expressions: concept expressions (like KL-One concepts) and role expressions (similar to KL-One roles). The T-Box language includes various operators which take concepts and roles as arguments and allows composition of operators to form other concepts and roles, using combination and restriction. The A Box is used to build descriptive theories of domains of interest. Sentences in the A Box are again constructed out of simpler ones. Sentence forming operators are the standard first order predicate calculus operators: NOT, OR, ThereExists etc. The non-logical symbols in A Box sentences are the terms of the T Box language. The Krypton knowledge base is viewed as an abstract data type, characterized by the functions permitted on it. This prevents the user from some misuses of the system. The internal structure of the knowledge base is hidden from the user . In fact, the user cannot directly access the hierarchy of concepts of the T Box or the set of first-order clauses of the A Box. However, the user has access to Tell and Ask operations. These operations allow the user to augment the knowledge base and extract information from it. Both Tell and Ask can be definitional or assertionaI. These operations allow the user, in the A Box, to assert a statement or to evaluate its truth value. In the T Box, these allow the user to define terms and to check for subsumption and disjointness of terms. In addition to these operations, certain others are necessary, for example to find individuals satisfying a particular property. These operations are not available in this design. Krypton grew out of disenchantment with frames as a representation language. With frames, it was claimed, "structural and assertional facilities were often confused, the expressive power is limited, ... , and frame systems are defined only in terms of the data structures used to implement them" (Brachman et al. 1983). It was feared that too much may be read into the structuring of the data, and the presence or absence of certain structures. For example, imagine the following situation: there is a description of a structure called "rock"; the concepts "sedimentary rock", "large rock" and "igneous rock" all have a IS-A link with "rock". It is true that all these are 'types' of "rock" in one sense; but these categories are not mutually

144

R. CHANDRASEKAR AND S. RAMAN]

exclusive. Thus it is incorrect to assume that one can count the kinds of rock (in l geological sense) by counting the number of IS-A links to rock. One has to be careful in imposing an interpretation on the internal structure. As a knowledge representation system, Krypton has sought to remove some 0 these problems. But in the process, some new research problems, such as the complexity of subsumption (Brachman & Levesque, 1984), have come up. Compared to Krypton, ScreenTalk is somewhat frame-oriented, and closer to the T Box though with some of the expressiveness of the A Box. With the flexibility tha lists of attribute-value pairs offer, much of the problems mentioned above do no arise in ScreenTalk. Since ScreenTalk allows skeletons to be dynamically defined any arbitrary conceptual structure can be represented. Infits are flexible instantia tions of skeletons, with the possibility of adding more information to the basic skeleton. ScreenTalk allows incomplete information to be expressed with minima effort. ScreenTalk also allows only certain types of retrieval, and hence there is nc danger of "counting data structures" etc. Issues such as subsumption are handled bl the user entering information, and are that much simpler. Thus ScreenTalk, bein] an interface specific to Natural Language programs, is different in many ways fron Krypton.

9. Conclusions One of the major problems in creating and using sizable bodies of knowledge or machines is caused by the absence of good man-machine interfaces. In this paper, e new approach has been described for the communication of the content anc structure of natural language sentences. This involves a departure from the use 01 linear strings as in human communication. It also involves creating an interactive mechanism capitilizing on the assets of an interactive computer workstatior equipped with a large screen and a mouse. The authors believe that there is a lot more to be done towards the developmem and use of interfaces of this type. The rapid growth in popularity of different type of menu selection as versatile components for man-machine interfaces should be noted. In menu selection, each response by the user conveys 1 to 3 bits 0: information. On the other hand, the menu selection technique provides for the machine to convey a thousand bits or more per screen of information to the humar user. In contrast, the technique presented here as the basis of the ScreenTalk systerr aims at the communication of all information normally expressed through humar utterances, by the user to a machine. Considerable experimentation with this technique will be necessary to give us feedback for the next step, the design 0 operational interfaces of this type. It is a pleasure to thank Ruven ~rooks and Josef Bayer for t~eir comments on the subjec of this paper. We would also like to thank Professor Nievergelt, Dr Ibramsha, R Ramanujam, K. Lodaya, S. Arun-Kumar and P. Pandya for their comments and suggestion: on an earlier draft of this paper.

References BOBROW, D. G. & WINOORAD, T. (1977). An Overview of KRL, a knowledge repre sentation language. Cognition Science, 1(1),3-46.

COMMUNICATION OF SENTENTIAL STRUcruRE AND CONTENT

145

BARR, A. & FEIGENBAUM, E . (1981) The Handbook of AI, Volume I, Chapter IV . BRACHMAN, R. J ., FIKES, R . E. & LEVESQUE, H . J. (1983). Krypton: A functional approach to knowledge representation. IEEE Computer, 16(10), 67-73. BRACHMAN, R. J. & LEVESQUE, R. J . (1984). The tractability of subsumption in Frame-Based Description Languages. In Proceedings of the National Conference on Artificial Intelligence, AAAI-84 (Austin, August 6-10, 1984), AAI, Menlo Park, pp. 34-37. BRACHMAN, R. J. & SCHMOLZE, J . G. (1985). An Overview of the KL-ONE Knowledge Representation System. Cognitive Science, 9(2), 171-216. CHARNIAK , E. (1981). A common representation for problem-solving and languagecomprehension information. Artificial Intelligence, 16(3), 225-255. HAYES, P. J. & CARBONELL, J. G. (1983). A tutorial on Techniques and Applications for Natural Language Processing, Technical Report CMU-CS-83-158, Department of Computer Science, Carnegie-Mellon University. HAYES, P. J . & REDDY, D. R. (1983). Steps towards graceful interaction in spoken and written man-machine communication. International Journal of Man-Machine Studies, 19(3), 231-284. HEPPNER, W., CHRISTALLER, T., MARBURGER, H., MORIK, K., NEBEL, B., O'LEARY, M., & WAHLSTER, W. (1983). Beyond domain-independence: experience with the development of a German Language Access System to Highly Diverse Background Systems.

Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, pp. 588-594. KAPLAN, S. J . (1982). Cooperative Responses from a Portable Natural Language Query System . Artificial Intelligence, 19(2), 165-187. NIEVERGELT, J., WEYDERT, J. (1980). Sites, modes and trails: telling the user of an interactive system where he is, what he can do, and how to get places. In R. A. GUEDJ, Ed. Methodology of Interaction, pp . 327-338. Amsterdam: Elsevier. RAMAKRISHNA, K. (1981). Schematization as an Aid to Organizing ZOO Information Nets. PhD Thesis. Department of Computer Science, Carnegie-Mellon University. RAMANI, S. & CHANDRASEKAR, R. (1986). Coordinated hash coding: a multiple-key indexing technique. Technical Report, 1986, National Centre for Software Technology, TIFR, Colaba, Bombay 400 005, INDIA. ROBERTSON, G., McCRACKEN, D. & NEWELL, A . (1981). The ZOG approach to man-machine communication, International Journal of Man-Machine Studies, 14, 461-488. ROBINSON, J . (1982). DIAGRAM: A grammar for dialogues. Communications of the Association for Computing Machinery, 25(1),27-47. SAGER, N. (1981). Natural Language Informing Processing. Reading: Addison-Wesley. SCHRANK, R. C. (1973). Identifications of conceptualizations underlying natural language. In R. C. SCHANK & K. COLBY, Eds . Computer Models of Thought and Language. pp 187-247. San Francisco: Freeman. TENNANT, H. (1981). Natural Language Processing. New York: PBI Petrocelli. TENNANT, H. R., Ross, K. M. & THOMPSON, C. W. (1983). Usable natural language interfaces through menu-based natural language understanding. In A. JANDA, Ed. Proceedings of the CHI'83 Human Factors in Computing Systems (Boston, December 12-15, 1983), ACM, New York, pp. 154-160. Tou, F. N. , WILLIAMS, M. D ., FIKES, R ., HENDERSON, A. & MALONE, T. (1982). RABBIT: An intelligent database assistant. In Proceedings of the National Conference on Artificial Intelligence, AAAI-82 (Pittsburgh , August 18-20, 1982), AAAI, Menlo Park, pp. 314-318. WINOGRAD, T. (1972). Understanding Natural Language. New York: Academic Press. WINOGRAD, T. (1983). Language as a Cognitive Progress. Volume : 1 Syntax. Reading: Addison-Wesley. WOODS, W. A. (1970). Transition network grammars for natural language analysis. Communications of the Association for Computing Machinery, 13(10), 591-606. ZLooF, M. M. (1977). Query-by-Example: A data base language. IBM Systems Journal, 16(4),324-343.

146

R. CHANDRASEKAR AND S. RAMANI

Appendix: value-expectations and values The following paragraphs define value-expectations that can be used in ScreenTall and the values that satisfy these restrictions. The value-expectations are of different types. ScreenTalk takes in the type specification through a menu, and then takes in additional information. (1) Range Constraints on Numbers

Fonn

Example RANGE 100 110 GREATER 1900 LESSER 99 GREATEREQO LESSEREQ 2001

RANGE (numl ) (num2) GREATER (number) LESSER (number) GREATEREQ (number) LESSEREQ (number)

All these restrictions force a number satisfying a given range restriction to be input These are meant for use in defining skeletons and cannot be input as a value while defining an infit. (2) Type specification

Fonn

Example OF-TYPE date

OF-TYPE (word)

This is a mechanism by which we can constrain the value of an attribute itself to be an infit or a sub-in fit. This is a very powerful construct and lends much to the powei of ScreenTalk. As seen in examples in the text, this value-expectation triggers of the instantiation of the skeleton for (word). As a consequence, one might expec problems with recursive definitions like: Word Name Father

man OF-TYPE man

If every value input is to be a non-null string, the instantiation of this skeleton wouk never be complete. Therefore, in this case, the user is allowed to avoid thi: indefinite recursion by using the interpretation suppression mechanism (see item I below). (3) List Membership

Fonn MEMBER-OF «iteml) (item2) ...) MEMBER-OF (Iistname)

Example MEMBER-OF (Mon 1 2) MEMBER-OF Divisors-of-12

In ScreenTalk, value-expectations typically define ranges or sets of acceptable values . Values are more specific, in that a particular element is chosen from amoni the set of all acceptable ones. In certain cases, though, values themselves may have to be non-specific; they may have to be constrained to belong to a particular set 0 words or set of numbers. ScreenTalk allows the specification of sets both as valut expectations and as values.

COMMUNICATION OF SENTENTIAL STRUcnJRE AND CONTENT

147

A set can be specified using the MEMBER-OF type by enclosing a list of elements (separated by blanks) in parentheses, for example, "MEMBER-OF (A E IOU)". Sets can also be named lists. For example, you could have a set named Vowels containing (A E IOU) and reference it later as "MEMBER-OF Vowels". For an attribute which has a MEMBER-OF value-expectation, the value input is checked for set membership against the set specified in the value-expectation. (4) Phrases at a time

Fonn (wordl ) (word2) ...

Example a little white car

As explained in the body of the paper, ScreenTalk, 'understands' word groups (phrases) which fall into the set of word sequences described by the following grammar: [(Determiner)] [(Adjective)]· [(Possessive)] (Noun) [(Number») [(Adjective)]· [(Possive)] (Noun) [(Adverb)· (Verb) (Verb) [(Adverb»)· (Month) (Date) (Year) (Noun) [AND (Noun)] (Noun) [OR (Nounj ] (Word)[ - (Wordsj ]" (Integers) (Real numbers) [Key: Non-terminals within square brackets are optional elements. The asterisk indicates repetition: "zero or more of" .] (5) Phrases at a time-sub-infit creation

Fonn (word)1

Example dog I

This is possible only as a value and in some sense corresponds to the "OF-TYPE (word)" value-expectation. It means that the user wants to give more information about (word). In this case, ScreenTalk looks for the skeleton for (word). If such a skeleton exists, the user is allowed to instantiate that skeleton. Otherwise, the user is allowed to add arbitrary attribute-value pairs to define (words). (6) Default Value Specification

Form DEFAULT (item)

Example DEFAULT Madras

This is a mechanism to input default values. The user is prompted for this attribute, but told that he has to hit (Escape) to 'choose' the default value . The user, of course, is free to input any other value, if he wishes to.

148

R. CHANDRASEKAR AND S. RAMAN!

(7) Specification of Constants Form (item)

Example kills

(item) specifies any LISP atom , subsuming the categories (word) and (number). The attributes corresponding to these value-expectations are never prompted for. Their values are pre-determined by the skeleton itself. (8) Supressing Interpretation Form A period (.) implies that the corresponding value-expectation is not to be interpreted; instead, the value-expectation itself is to be copied as the value. This symbol can occur only when value is prompted for; it cannot be input as a value-expectation. (9) Basic restrictions Form NIL ALPHA NUMERIC Having the value-expectation as a null string (NIL) implies that any input is permissible as the value. This is the most general restriction possible . Having the restriction ALPHA implies that only alphabetic input will be taken as valid input. Similarly NUMERIC signals that only a numeric input will be taken in as valid . These two restrictions do not make any sense if they are used in an infit; so they can be specified only in a skeleton (as value-expectations), and not in an infit.