Data & Knowledge Engineering 4 (1989) 101-114 North-Holland
101
Generation of natural language from information in a frame structure W.A. P E R K I N S Artificial Intelligence Center, Org. 96-20, Build. 259, Lockheed Research and Development, 3251 Hanover Street, Palo Alto, California 94304, U.S.A.
Abstract. Many expert systems and relational database systems store factual information in the form of attributes values of objects. Problems arise in transforming from that attribute database representation into English surface structure and vice versa. In this paper we consider only the generation process. In its interaction with the user, the expert system must generate questions, declarations, and uncertain declarations. Attributes such as COLOR, LENGTH, and ILLUMINATION can be referenced using the template: "(attribute name) of (object)" for both questions and declarations. However, many other attributes, such as RATI'LES, in "What is RATiLES of the light bulb?", and HAS_STREP_THROAT in, "HAS_STREP_ THROAT of Dan is true." do not fit this template. We examined over 300 attributes from several knowledge bases and have grouped them into 16 classes. For each class there is one "question" template, one "declaration" template, and one "uncertain declaration" template for generating English surface structure. The internal databases identifers (e.g. HAS_STREP_THROAT and DISEASE_35) must also be replaced by output synonyms. Classifying each attribute in combination with synonym translation remarkably improved the English surface structure that the system generated.
Keywords. Expert systems, Relation databases, Natural language generation.
1. Introduction
It is becoming increasingly apparent that users want and need expert systems and relational database systems which can interact in natural language. Menus are useful, but they are often constraining. It is generally believed that expert systems will need a natural language capability for handling complex tasks such as DARPA's Strategic Computing Projects for Air Land Battle Management and Pilot's Associate. This natural language capability should be based on a deep understanding of what the user is saying to them and the information that they are putting out to the user. There has been much less research in text generation than in natural language understandirig until recently. Some of the early work dealt with translation of underlying formal representations [1,2] while later work has dealt with developing and using grammars [3-5] (e.g. systemic and unification) for producing grammatical text. McKeown [6] has concentrated on the problem of producing multiple sentences in a response to queries of a limited class (e.g. definitions or comparisons of database objects). Mann [7] has also worked on organizing a large framework of knowledge into coherent text. It is convenient to split the generation process into two parts: a strategic planning component and a tactical or sentence generation component [6]. The location of the split varies in the different systems. For example, McDonald [8] and Danlos [9] assume that the tactical component will be furnished the complete phrase structure by the planner while Goldman [10] assumes that the tactical component will be given semantic representations which have only been linearized into sentence-sized chunks. 0169-023X/89/$3.50 (~ 1989, Elsevier Science Publishers B.V. (North-Holland)
102
W.A. Perkins / Generation of natural language
The present work involves generation starting with the input of a deep meaning expression (i.e. a connected set of case grammar claus6s which produce one sentence). No attempt has been made to produce paragraph-length output [6]; we are more concerned with consultation dialogue. The high-level discourse planning is done by the expert system which handles the consultation session, asking questions when it needs help, reporting results, and answering user's queries. The high-level planning component has worked very well, but many of the sentences generated by the strategic and tactical components were ungrammatical. Expert systems typically store information in an attribute database (usually a frame structure). Although an attribute database or relational database is efficient for storage and retrieval, the form of the information differs greatly from English surface structure. Different grammatical structures are required for different attributes in order to produce appropriate English surface structure. This problem has been seen in many expert systems (see for example [11, 12]); it has usually been handled by storing a "canned" sentence with each attribute. This only partially corrected the problem (e.g. it usually only worked for generating questions), and it required much unnecessary work in creating the knowledge base. Fortunately, we have found that a relatively modest number of different grammatical structures (16), some of which are closely related to each other, is adequate for greatly improving text generated from internal attribute database structures. If the knowledge engineer places each attribute in one of these categories, the system can automatically generate adequate English phrase structure describing these attributes. We chose case-grammar as the starting point of generation (rather than systemic or unification grammar) because the internal understanding structure for goals, rules, events, and queries in our testbed system (LES) is already in this form [13]. Case grammar is often used in understanding systems [14] and in some generation systems [1, 10]. The case grammar input must contain all the information needed for the surface structure other than information about attribute properties and objects which can be extracted from the knowledge base. For example, in pronominalization the system will determine the gender of an object by checking the knowledge base. The main problem in generation for this system is determining which words to use from the knowledge base (e.g. should the attribute's value be mentioned), what auxiliary verbs are needed, and the correct ordering of the selected words. This problem is solved by having templates for each of the 16 types of attributes.
2. Different attribute types
We examine over 300 attributes from several knowledge bases and found that they fall into the 16 types shown in Fig. 1. For each type there is one "question" template, one "declaration" template, and one "uncertain declaration" template for generating English surface structures as shown in Fig. 2 for the attribute HAS_ITCHY_NOSE of type HAS. Negation is handled as part of the verb selection process. The appendix shows the English surface structure generated for all of the attributes listed in Fig. 1. Typically, expert systems communicate with users by asking questions when they need to know information, making declarations when they are asked questions or have solved a goal, or making uncertain declarations when they are referring to unknown values. The templates, shown in Figs. 3, 4, ,.nd 5, are mainly used to determine which constituents are present in the surface structure and their order.
W.A. Perkins / Generation of natural language
103
ATTRIBUTE TYPE
TYPICAL ATTRIBUTE
ATTRIBUTE SYNONYM
OF WHO_OF OF_VALUE IS IS_VALUE IS_VERB_VALUE DOES WHO_DOES WHAT_DOES HAS HAS_VALUE OBJECT WHO_OBJECT SPLIT CAN HOW_WELL_CAN
LENGTH FRIEND BREATHING_PATTERN CORRECT SEX SCREWED_IN RATTLE KNOW SUPPORT HAS_HEADACHE TYPE_OF_DISEASE SUPPORTED_BY SUPERVISED_BY BIRTH_DATE CAN_READ READING_ABILITY
LENGTH FRIEND BREATHING PATTERN CORRECT SEX SCREWED IN RATTLE KNOW SUPPORT HEADACHE DISEASE SUPPORTED BY SUPERVISED BY DATE/BIRTH READ READ
Fig. 1. Different attribute types used in generating natural language with typical attributes and their synonyms.
QUESTION:
Does Ann have an itohy nose?
DECLARATION:
Ann has an itohy nose. Ann does not have an Itohy nose.
UNCERTAIN DECLARATION:
whether Ann has an Itohy nose or not.
Fig. 2. Different kinds of clauses which system must generate. OF: WHO_OF: OF_VALUE: IS:
What ,be> t h e , a t t r > o f < a r t , , o b J , ,u>? Who
t h e < a t t r , o f ,obJ>? ,be> t h e , a t t r ~ o f , a r t > ,obJ> ,u>? ,be> , a r t > ~obJ> , a t t r > ?
IS_VALUE: IS_VERB_VALUE:
,be> ,leg val> ,u>? ,be> ,art, ,obJ> ,attr> ,leg val>
DOES: WHO_DOES:
,do> , a r t , , o b J , , a t t r > ? Who ,do> , a r t > ,obJ> , a t t r > ?
WHAT_DOES:
What
HAS:
,do> , a r t ,
,do,
,art> ,obJ,
,obJ,
,art> ,obJ>
,u~?
,attr>?
,have, a/an ,attr,?
HAS_VALUE:
,do,
OBJECT: WHO_OBJECT: SPLIT: CAN:
What ,be> , a r t , , o b J , , a t t r > ? Who , b e , , a r t ~ ,obJ> , a t t r > ? What ,be> t h e , a t t r I > o f , a r t , ,oan> , a r t > , o b J , ~ a t t r > ?
,have,
,leg val>
HOW_WELL_CAN:
HOW well ¢oan, ~art>
,obJ>
,u,?
,obJ,°s
,u,?
,attr>?
Fig. 3. Question te,nplates.
OF: WHO_OF: OF_VALUE: IS: IS_VALUE: IS_VERB_VALUE: DOES: WHO_DOES: WHAT_DOES: HAS: HAS_VALUE: OBJECT: WHO_OBJECT: SPLIT:
the cattr, of cart> ,obJ> ,be> ,value> the cattr> of ,obJ, ,be, of ,art> ,obJ> ,value> ,u> ,art> ,obJ> ,be> ,art> ,obJ> ,be> ,value> ,art> ,obJ> ,be, ,attr> ,value> ,u> c a r t > ,obJ> < a t t r > ,ObJ> , a t t r > ~value> ,art> ,obJ> , a t t r > ,value> ,art> ,obJ> ,have> a/an ,attr> ,art> ,obJ> ,have> ,value> ,u> ,art> ,obJ> ,u, ~value, of ,art> ,obJ,'s ,att~l, ,be,
CAN:
,art>
,obJ> , a t t r >
HOW_WELL_CAN:
~obJ,
,value,
Fig. 4. Declaration templates.
,u,
W.A. Perkins I Generation of natural language
104 OF:
WHO_OF: OF_VALUE: IS:
IS_VALUE: ISVERB_VALUE: DOES:
WHO_DOES: WHAT_DOES: HAS:
HAS_VALUE: OBJECT: ~HO_OBJECT: SPLIT:
CAN: HOW_WELL_CAN:
the ,attr, of the ,attr, o~ the ,afar, of whether ,~rt, the ,~ttr) of whether (art, whether (art,
,art, (obJ~ ,be, d e t e r m i n e d ,art, ,obJ, ,be, d e t e r m i n e d ,art, ,obJ, ,be, d e t e r m i n e d ,obJ, (be, ,attr, or not ,~rt, (obJ, (be, d e t e r m i n e d ,obJ, ,be, ,attr, or no$ ,obJ, (attr, or not who , ~ r t , (obJ, (attr) what ,art, (obJ, ,attr, whether (art, (obJ, ,have, a/~n ,attr, or not the (attr, of (art, , o b J , ~be, d e t e r m i n e d what (art, ( o b J , ~be, , a t t r , who , a r t , (obJ, (be, (attr, the ,attrl, of (art, (obJ,'s ,attr2, (be, determined whether (art, (obJ, (oan, ,attr, or not how well (art, ,obJ, (oan, ,~ttr, Fig. 5. Uncertain declaration templates.
2.1. Effect of verb tense, person, and number Some constituents of the templates depend on verb ~t;~,se and the person and number of the object. Some of the attribute synonyms themselv:~ may undergo morphological transformations. The OF, WHO_OF, OF_VALUE, HAS, and SPLIT type attribute synonyms are singular or plural in agreement with the object. The DOES, WHAT_DOES, and WHO_ DOES type attribute synonyms (actually verbs) change with person and tense. The attribute synonyms for the other types do not change. The surface form of the verbs in the templates depends on the person and number of the object in the actor slot and tense of the sentence. Actually, the templates shown in Figs. 3 - 5 are only applicable to the present tense. Templates to handle any tenses are not as explicit on the type of verb (such as "be" or "do") and have three verb locations which may or may not be used. For example, the IS template for a question has the form: ( Verb I ) ( art. ) (obj.) (verb2) (verb3) ( attr. ) ? For the present tense, (verbl) becomes a "be" verb and both (verb2) and (verb3) are null, resulting in the form shown in Fig. 3. Whereas, for the past perfect progressive tense, (verb1) becomes a "have" verb, (verb2) becomes a "be" verb and (verb3) is null. Most of our work has been concerned the present tense, and this has been adequate for most knowledge bases. As more temporal reasoning is used, the need for other tenses will increase significantly. The templates for the present tense determine which verb (e.g. "be", "do", or "can") is inserted and where it is inserted. The system selects the proper form of the verb based on person and number. It should be noted that the verbs in the templates are often auxiliary verbs as some of the attributes become verbs in the generated text (e.g. SUPPORT of type WHAT_DOES in "What does the table SUPPORT?"). In the general case, it is necessary to store irregular tenses for some verbs in the lexicon. While verb tense must be an input parameter for the generator, person and number are handled by checking the knowledge base. The system handles first and second person by recognizing its own internal name and the user's internal name. Then, it determines the slot (ACTOR, OBJECT, etc.) in which .the name (for itself) occurs and the format for that slot in order to choose the appropriate form (I, my, me or mine).
2.2. Insertion of articles We will now describe the method used by the system to insert articles- not a flawless
W.A. Perkins / Generation of natural language
105
algorithm. The insertion of articles before attribute output synonyms is completely determined by the templates. Some templates (such as the OF type in "What is the length of the block?") require the article "the" while other templates (such as the HAS type) require the article "a", and others require no article (such as the IS type in "Is the circuit faulty."). (The article "an" is inserted before words starting with vowels in place of "a".) Inserting articles before object names is determined by a few simple rules. If the object's output synonym is a proper name (i.e. begins with a capital letter), no article is inserted. Otherwise, the object output synonym in the template is preceded by "the", unless it is not a definite object in the knowledge base in which case it is preceded by "a". The output synonym for an object, which is an attribute value, is preceded by "a".
2.3. Output synonyms Another problem arises in that the user expects the system to output words with which he/she is familiar instead of those in the database (e.g. HAS_STREP_THROAT and DISEASE_35). (The attribute names such as HAS_STREP_THROAT contain a complete concept and are very useful, although they are not good English.) This problem can be handled by using a lexicon with input and output synonyms. The database names can be transformed into words (output synonym~) with which the user is familiar and that will produce a better surface structure. However, the main use for the lexicon occurs in the understanding process. Many of the templates (IS, DOES, HAS, and CAN) refer to attributes whose value is never mentioned as it would be "yes", "no", "true", or "false" (e.g. "Is the drawing C O R R E C T ? " and "The drawing is CORRECT"). The values of "no" or "false" are handled by insertion of "not".
2.4. Using the templates To see how the templates are used, consider the HAS type attribute HAS_ITCHY_NOSE with synonym "itchy nose". Assume the object is PATIENT3 with description "Ann". The appropriate form of the "do" and "have" verbs is determined by person, number, negation, and present need (question, declaration, or uncertain declaration). We substitute "itchy nose" for (attr.), Ann for (obj.), null for (art.) (as Ann is a proper name) into the templates to obtain the results shown in Fig. 2. The appendix shows results for every attribute type.
2.5. Classifying attributes A new attribute can be classified by writing down three sentences -question, declaration, and uncertain declaration - using that attribute. These sentences can then be compared with those of the appendix or with the templates of Figs. 3 - 5 to determine the appropriate type and output synonym. The appearance of the form "(attribute) OF" in the sentence is a clue that it is probably one of the OF types (OF, WHO_OF, or OF_VALUE). The legal values of the attribute are also a useful clue. For example, legal values of "YES or NO" indicate IS, DOES, HAS, or CAN type. If the legal values are to be mentioned in questions, then one of the " . . . . VALUE" types is appropriate. The SPLIT type is unique in that the two words describing the attribute are often separated and a possessive is used. The OF type of attribute is the most common. The IS and HAS are also very common. Some examples are:
W.A. Perkins / Generation of natural language
106
IS
OF NAME HEIGHT MASS VOLUME RADIUS LEVEL COLOR AMPLITUDE
FAULTY LEAKY PRESENT REPAIRED CHECKED WEALTHY HUNGARY MARRIED
HAS HAS_FEVER HAS_SORE THROAT HAS_COUGH HAS_ELECTRICAL_POWER HAS_OUTPUT_SIGNAL HAS_DEBTS HAS_STOMACH_PAIN HAS_SWOI,LEN_GLANDS
3. The transformation process The system needs some internal representation for the sentence it wants to generate. It could be simply clauses formed of attribute-object-value triples. In our implementation we used a case-grammar representation which allows considerable flexibility and is useful as a meaning representation. References to the frame representation appear in some slots of the casegrammar representation.
3.1. Case grammar representation Traditional case-grammar clauses in our nomenclature have the form: TYPE_ENTRY SUB_TYPE FOCUS ACTOR ACTION_VERB OBJECT INSTRUMENT DONOR RECIPIENT ACT_REASON ACT_RESULT ACT_TIME ACT_LOCATION LIKELIHOOD CONCATENATION
' (action), (state), or (~tate change)' ' (type of sentence)' '(actor, object, or act_reason)' '(actor)' '(verb)'
'(object)' '(instrument)' '(entity that gives)' '(entity that receives)' '(reason for action)' '(result of action )' '(time designation)' '(location designation )' ' (certainty factor)' '(pointer to next clause)'
Subordinate and conjunctive clauses are handled by pointers in slots which point to other case-grammar clauses. In generating text, the subordinate clause plus an appropriate connecting word (e.g. that, because, or and) is inserted in the location for that slot's value. Our use of focus is just to change the order of the sentence by putting the important entity first. FOCUS is given the name of another slot (such as ACT_TIME) and the information generated for that slot appears first. Typically, only five or six slots are used for a clause. For example, a clause with TYPE_ENTRY of ACTION is:
W.A. Perkins I Generation of natural language
TYPE_ENTRY SUB_TYPE ACTOR ACTION_VERB OBJECT
107
'ACTION' 'ASSERTION' 'HUMAN(INTERNAL_NAME = HUMAN1)' 'HIT' 'BALL(INTERNAL_NAME = BALLS)'
The literal translation is that instance HUMAN1 HIT (verb) instance BALLS. HUMAN and BALL are categories in the frame database. In English, this would read "Joe hit the ball" if the name of HUMAN1 were Joe. Most of the information contained in the frame database that we need to represent in sentences falls under the TYPE_ENTRY of STATE or STATE_CHANGE. For example, the statement, the length of the block is 5 meters is a STATE clause. This information would be stored in the frame database under the attribute (length) of object (block) with value (5 meters). To handle this kind of information, we have extended the case-grammar format to handle "actors" which are attributes of objects as indicated below: TYPE_ENTRY SUB_TYPE ACTOR ACTION_VERB OBJECT
'STATE 'ASSERTION' 'LENGTH[BLOCK(INTERNAL_NAME = BLOCK1)]' f
~
,
'5 METERS'
This assumes that the frame database has a category BLOCK with an instance BLOCK1. Concatenated clauses for sentences such as: "If the length of BLOCK1 is equal to 5 meters and the width of BLOCK1 is less than 2 meters, then the lift mechanism is a forklift." appear with pointers in specific slots for connecting the clauses as shown below. /***** CLAUSE1 TYPE_ENTRY SUB_TYPE ACTOR ACTION_VERB OBJECT CONCATENATION
******/ 'STATE 'ASSERTION' 'LENGTH[BLOCK(INTERNAL_NAME = BLOCK1)]'
'5 METERS' ' - > CLAUSE:2'
/***** CLAUSE2 TYPE_ENTRY SUB _TYPE ACTOR ACTION_VERB OBJECT
******/ 'STATE 'ASSERTION' 'WIDTH[BLOCK(INTERNAL_NAME = BLOCK1)]'
/***** CLAUSE3 TYPE_ENTRY SUB_ENTRY ACTOR
******/ 'STATE 'ASSERTION' 'LIFt_MECHANISM[BLOCK(INTERNAL_NAME BLOCK1)I' ,__,
'd
r
'2 METERS'
=
ACTION_VERB OBJECT ACT_REASON
'FORKLIFt' ' - > CLAUSE: 1'
108
W.A. Perkins / Generation of natural language
3.2. Transforming from case grammar representation to English With that background, we will proceed to discuss the transformation process. Fig. 6 summarizes the natural language generation process starting with the internal representation in case-grammar and resulting in natural language output.
I TYPE_ENTRY STATE' 1 SUB_TYPE ''Q UESTION' ACTOR 'BREATHINGPATTERN [CHILD(INTERNAL_NAME = CltlLD_I)]' ACTION_VERB'=' OBJECT 'WHEEZING'
input to generator
basekn°wlefdge LB~p. )~,~j ATTRBI~U.T_E~_~. NAME RN."~ ,~~
k ~ATTRI VALUE;, ----BUTE VALUE t ~/LEWHEEZI IRGAPI AL DNORMAL NGf L"~OBJECT C~D~_I NAME, J~
~IRREGULAR ~ OUTPUT_SYNONYM~ chdli /
OUTPUT_SYNONYM patter~ breathing
C.os,.,v.o.c.....,o.
present need
result
Cu.c....,.o.c.....,o.l
1
~
ls the breathing pattern of the child normal, rapid, wheezing, or irregular?
Fig, 6, Summary of natural language generation process. CATEGORY NAME: CHILD
/'****
ATTRIBUTE
ATTRIBUTE NAME:
PROPERTIES
******/
BREATHING_PATTERN
TYPE ATTRIBUTE LEGAL_VALUES TYPE
SINGLE-VALUED NORMAL, RAPID, WHEEZING, or IRREGULAR OF_VALUE
OUTPUT_SYNONYM
breathing pattern
/*****
INSTANCES
INTERNAL_NAME
******/
CHILD_I
OUTPUT_SYNONYt,1 SEX BIRTH_DATE AGE TEMPERATURE HAS_FEVER ALLERGIC TO PENICILLIN FATHER TYPE_OF_DISEASE BREATHING_PATTERN
'c h lid' 'MALE' '1986' '2' '104' 'YES' 'NO' 'ADULT_2' " "
•.`•••1••••`.•••.••`•.••`••`•.••.•.••••.•q•••°••••••••••••••
Fig. 7. Part of child frame.
W . A . Perkins / Generation of natural language
109
f
Input to generator
TYPE_ENTRY
'STATE'
SUB_TYPE
'QUESTION'
ACTOR
'DROOLING[CHILD(INTERNAL_NAME = PATIENT_l)]'
ACTION_VERB OBJECT ACT_TIME
'ANYTIME_DURING PRESENT - 3:00:00:00 .. PRESENT'
r
ATTRIBUTE NAME
DROOLING
TYPE
knowledge base
'IS'
LEGAL_VALUES
'(YES or NO)'
TIME_FORMAT
'JULIAN'
TIME_VARYING •
result
'VALID_UNTIL_CHANG ED'
OUTPUT_SYNONYM
(Has k
Ann
been
drooling
'DROOLINGA LOT'
a lot during the last three
• days?
J
Fig. 8. Generation with temporal information.
In transforming this clause, the system examines the properties of the attribute (BREATHING_PATTERN) in the CHILD category (frame). Fig. 7 shows part of the CHILD category. The TYPE property of the attribute and the "present need" are used to select the template, and the ATTRIBUTE VALUE and LEGAL VALUES are filled in as required by the template. Synonym translation using the lexicon is done for attribute and object names as indicated. The arrows from the knowledge base (pointing to OUESTION in Fig. 6) are shifted to accommodate the "present need" of the expert system. Sometimes, temporal information abou* an attribute is important. The expert system uses the ACT_TIME slot as shown in Fig. 8 to have a temporal field added to the generated template.
4. Sample session To test these ideas, we modified a knowledge base (Childhood Disease) that had caused considerable difficulty in the natural language generation. We simple designated the attribute types and added synonyms. In the following session the input by the user, who has a sick child, is preceded by "* * * * * ". To make the generation process clearer, we have added the attribute type in brackets at the end of each clause. Furthermore, to show how the present system improved the generated text, we have also inserted the "old" generated text surrounded by "111" whenever the old text differed significantly from the present one. [[[What is the sex of the child?[[[ Is the child MALE or FEMALE? • • • • • male What is the name of the child? • • • • • Dan
[IS_VALUE] [OVl
110
W.A. Perkins / Generation of natural language
I[IWhat is the birth_date of DAN?Ill What is the date of DAN's birth in years? • • • • * He is very sick with a temperature of 104 degrees! Yes. What is the date of DAN's birth in years? • • • • • 1986 IllWhat is the allergic_to_penicillin of DAN?Ill Is DAN allergic to penicillin?
[SPLIT]
[SPLIT]
[is]
*****no
iilWhat is the father of DAN?Ill Who is the father of DAN? • Dan has a severe headache! Yes. Who is the father of DAN? • John Doe II[What is the insured_by of JOHN DOE?[][ Who is JOHN DOE insured by? • • • • • why We have established: (1) that the sex of DAN is determined; (2) that his name is determined; (3) that his age is determined: Ill(4) that the allergic_to_penicillin of DAN is determined;]]] (4) whether he is allergic to penicilin or not; (5) that his father is determined; IF we can: II1(1) show that the insured_by of JOHN DOE is determined[ll (1) show who JOHN DOE is insured by THEN if follows from rule FILL_OUT_RECORD: [l[that the complete of the health record is YES.[[[ that the health record is complete. • * * • • Green Shield [[[What is the breathing_pattern of DAN?[[[ Is the breathing pattern of DAN NORMAL, RAPID, SQUEAKY, WHEEZING, or IRREGULAR? • * * * * wheezing IllWhat is the peers_have_colds of DAN?Ill De'-- DAN have peers with colds during last seven days? • * * * * Is the record complete? [[[Yes, the complete of the health record is YES.[[ I Yes, the health record is complete. Does DAN have peers with colds during last seven days? *****no [[[What is the can_touch_chin_to_chest of DAN?[I I Can DAN touch chin to chest? • * * * * yes [[[What is the vomits_often of DAN?Ill Does DAN vomit often? • * * * * yes
[WHO=OFI [WHO=OF]
[WHO_OBJECT]
[IS_VALUE] [OF] [OF] [IS] [WHO_OF]
[WHO_OBJECT]
[ts]
[OF_VALUE]
[HAS]
[Is] [HAS]
[CAN]
[DOE•I
W.A. Perkins I Generation of natural language
IIIWhat is the tonsils_swollen of DAN?Ill Does DAN have swollen tonsils? * * * * * yes I have come to a conclusion: II1(1) the type_of_disease of DAN is TONSILLITIS.Ill (1) DAN has TONSILLITIS. , • • • • why I said that because: II1(1) the has_high_fever of DAN is TRUE,Ill (1) Dan has a high fever, and 111(2) the has_headache of DAN is YES,Ill (2) he has headaches, and 111(3) the vomits_often of DAN is YES,Ill (3) he vomits often, and 111(4) the tonsils_swollen of DAN is YESIII (4) he has swollen tonsils , • • • • why does he have a high fever? IIIThe has_high_fever of DAN is TRUE because: III DAN has a high fever because: (1) his current temperature is greater than 103 degrees Fahrenheit
111
[HAS]
[HAS_VALUE]
[HAS]
[HAS]
[DOESI [HAS]
[HAS] [OF]
. , . . , . , . , . . . . . . . , . . . . . . . . . . . . . * . . . . . . . . . . . . . ' '
5. Discussion
We have identified a problem in the generation of text from an attribute or relational database. Different grammatical forms are needed for different attributes. Our solution is to categorize the attributes into types and use type-specific templates for generating the output text. Fortunately, only a small number of types (about 16) will handle the different attributes. This method is very convenient to use if the attributes have properties since there is a place to store the TYPE, LEGAL_VALUES, OUTPUT_SYNONYM, and UNITS (if any). In creating a knowledge base, the knowledge engineer just fills in these properties to aid the system in its natural language processing. The present method has been used on several knowledge bases by different knowledge engineers with good results. We found that they could rapidly categorize attributes and select appropriate synonyms. The computer generation method is itself very fast with a typical sentence being generated in less than 0.5 seconds. The transformations discuss here are not very useful for systems in which the information is stored as events (e.g. John hit the ball) because no attribute is involved. However, much work has been done for that situation and good text generation methods already exist (see [9, 10]). In the future we plan to do more work with temporal reasoning, and the high-level planning component of the expert system will be extended to generate different tenses. The generation component will also be extended at that time to handle different tenses (see [15]) as discussed in Section 2.1.
112
W.A. Perkins / Generation of natural language
Acknowledgments T h e a u t h o r is g r a t e f u l to T . J . L a f f e y a n d A . A u s t i n f o r c r e a t i n g the C h i l d h o o d D i s e a s e k n o w l e d g e b a s e a n d h e l p i n g to e l u c i d a t e g e n e r a t i - n p r o b l e m s with s u c h k n o w l e d g e b a s e s . H e also a p p r e c i a t e s h e l p f u l d i s c u s s i o n s w i t h T.W. B i c k m o r e a n d W . S . M a r k .
Appendix In this appendix we give sentences that the system generated for each of the sixteen different attribute types. The typical attributes of Fig. I. were used. The four sentences are "question", "declaration", "negative declaration", and "uncertain declaration". Attribute "'LENGTH" of type " O F " What is the length of Block A in meters? The length of Block A is 27 meters. The length of Block A is not 27 meters. The length of Block A is unknown. Attribute "FRIEND" of type " W H O _ O F " Who is the friend of John? The friend of John is Tom. The friend of John is not Tom. The friend of John is unknown. Atrtribute '"BREA THING_PA TTERN" of type " OF_VA L UE'" Is the breathing pattern of the child normal, rapid, wheezing, or irregular? The breathing pattern of the child is wheezing. The breathing pattern of the child is not wheezing. The breathing pattern of the child is unknown. Attribute "CORRECT" of type "IS" Is the drawing correct? The drawing is correct. The drawing is not correct. Whether the drawing is correct or not is unknown. Attribute "SEX" of type "IS_VALUE" Is the patient male or female? The patient is female, The patient is not female. The sex of the patient is unknown. Attribute "SCR EWED_IN" of type "IS_VERB_VA L UE" Is the light bulb screwed in tightly, loosely, or not at all? The light bulb is screwed in tightly. The light bulb is not screwed in tightly. Whether the light bulb is screwed in or not is unknown. Attribute "RA TTLE" of type " D O E S " Does the light bulb rattle? The light bulb rattles. The light bulb does not rattle. Whether the light bulb rattles or not is unknown. Attribute " K N O W " of type " W H O _ D O E S " Who John John Who
does John know? knows Tom. does not know Tom. John knows is unknown
W.A. Perkins I Generation of natural language
113
Attribute "SUPPORT" of type "'WHA 7"_DOES" What does the block support? The block supports the pyramid. The block does not support the pyramid. What the block supports is unknown. Attribute "'HAS_HEADACHE" of type "'HAS" Does John have a headache? John has a headache. John does not have a headache. Whether John has a headache or not is unknown. Attribute "TYPE_OF_DISEASE" of type "'HAS_VALUE" Does the child have croup, bronchitis, pneumonia, or whooping cough? The child has bronchitis. The child does not have bronchitis. The disease of the child is unknown. Attribute "'SUPPORTED_BY" of type "'OBJECT" What is the pyramid supported by? The pyramid is supported by Block A. The pyramid is not supported by Block A. What the pyramid is supported by is unknown. Who John John Who
Attribute "'SUPERVISED_BY" of type "WHO_OBJECT" is John supervised by? is supervised by Tom. is not supervised by Tom. John is supervised by is unknown.
What is the The date of The date of The date of
Attribute "BIR TH_DA TE'" of type "SPLIT" date of John's birth? John's birth is March 14, 1960. John's birth is not March 14, 1960. John's birth is unknown.
Attribute "CAN_REA D" of type "'CA N" Can John read? John can read. John cannot read Whether John can read or not is unknown. How John John How
Attribute "R EA DING _ ABILITY" of type "HO W_ WEL L_ CA N" well can John read'? can read very well. cannot read very well. well John can read is unknown.
References [1] R.F. Simmons and J. Slocum, Generating English discourse from semantic networks, Commun. ACM, 15 (1972) 891-905. [2] D. Chester, The translation of formal proofs into English, Artificial Intelligence, 7 (1976) 261-275. [3] M. Kay, Functional grammar, Proc. 5th Annual Meeting Berkeley Linguistic Society (Berkeley, 1979). [4] P.S. Jacobs, PHRED: a generator for natural language interfaces, Am. J. Comput. Linguistics, 11, 4 (1985) 219-242. [5] W.C. Mann and C. Matthiessen, A demonstration of the nigel text generation computer program, in J. Benson and W. Greaves (eds.) Systematic Perspectives on Discourse: Selected Applied Papers (Ablex, Newark, NJ, 1985). [6] K.R. McKeown, Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text (Cambridge University Press, Cambridge, 1985). [7] W.C. Mann, Discourse structures for text generation, in Proc. 1984 Coling/ACL Conf. (Stanford, 1984).
114
W.A. Perkins I Generation of natural language
[8] D.D. McDonald, Natural language production as a process of decision making under constraint, Ph.D dissertation, MIT Cambridge, MA, 1980. [9] L. Danlos, The Linguistic Basis of Text Generation (Cambridge University Press, Cambridge, 1987). [10] N.M. Goldman, Conceptual Generation, in R.C. Schank (ed.), Conceptual Information Processing (NorthHolland, Amsterdam, 1975). [11] W.J. Van Melle, System Aids in Constructing Consultation Programs (UMI Research Press, Ann Arbor, Ml, 1981). [12] W.R. Swartout, A digitalis therapy advisor with explanations, Proc. 5th Int. Joint Conf. Artificial Intelligence (1977) 819-825. [13] J.Y. Read, T.P. Howland, and W.A. Perkins, Use of communicating expert systems in fault diagnosis for space station applications, Proc. SPIE's Cambridge Symp. Optical and Optoelectronic Engineering, Cambridge, MA, "Space Station Automation It", 729 (1986) 30-39. [14] R.C. Schank, Conceptual Information Processing (North-Holland, Amsterdam, 1975). [15] V. Ehrich, The generation of tense, in G. Kempen (ed.), Natural Language Generation (Martinus Nijhof, Boston, 1987) 423-440.