Neural Networks,Vol.7, No. 3, pp. 575-577, 1994 Copyright© 1994ElsevierScienceLtd Printedin the USA.All rightsreserved 0893-6080/94$6.00 + .00
Pergamon
BOOK REVIEW
work to assign roles when given word-by-word input of sentences. The role representations of the sentences go one-byone to the story parser, which forms a single representation of the whole story, including inferred items. Similarly behaving words in the corpus of stories are now mapped to nearby places in a semantic memory, and similar stories to nearby places in an episodic memory. However, because numerical similarity of the inputs, highly blurred bitmaps representing only darkness of the printed characters, is irrelevant to semantic and episodic memory, the first task is to reencode the words in a more meaningful way (Part II. Processing Mechanisms). Because the information consists in role assignments, a recurrent backpropagation network (Chap. 4, Backpropagation Networks) can develop meaningful representations by learning these roles. The lexical system reads the story a word at a time. The input is partitioned into subfields for the words, the script, and the track, and the output layer into subfields for roles. The target, in each role subfield, is the current code for the word filling that role. For the code for each input word to appear in the correct output subfield, network weights must come to encode relations between words and roles latent in the distribution of words in the training examples. In a procedure called FGREP (Forming Global Representations with Extended back Propagation), meaningful codes evolve through the innovation of using the current activation of a hidden layer representation of a word, at any iteration, as the current symbol for that word, in both input and target in the next iteration (Chap. 5, Developing Representations in FGREP Modules). The method is equivalent to adding an extra input layer preceding the original input layer, which now becomes the first hidden layer. The original input pattern to this layer representing a word is simulated by using a local ( l-out-of-N ) representation at the new input layer and initializing the weights from that unit to the hidden layer to the initial code for the word. After each iteration of backpropagation, the adjusted set of weights (normalized) becomes the new symbol for the word, to be used in any target containing that word. Through this training, words with similar uses in the training stories become represented by similar symbol vectors. Symbols come to resemble one another according to crosscutting categories; for example, animate, edible, fragile, used to break something, possessed by people. A self-organizing network now maps these symbols into a two-dimensional array of neural units in such a way that semantically similar symbols try to become close neighbors. These symbols are now used as a lexicon for communication among all the modules in the system, which henceforth can be trained independently of the initial lexical-semanticmodule. Indeed, the whole program is modularized so that pairs of modules can be trained to work together without involving the rest of the system (Chap. 6, Building from FGREP Modules). From the initial training, the system can tell which word in the sentence represents which role in each sentence of a
Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and M e m o r y By Risto M i i k k u l a i n e n , Bradford Books, M I T Press, Cambridge, MA: 1993, $45.00, 391 pp., ISBN 0262-13290-7 This book merits the quotes on the jacket by eminent connectionists about setting a standard. The author poses a remarkable challenge (Part I, Overview): his model should • . . [show] how script-based inferences can be learned from experience; . . . [and] how the episodic memory organization can be automatically formed based on the regularities in the example data; and how word semantics can be learned from examples of their use; • . . give plausibleexplanations of how expectationsand defaults automaticallyemerge;.., exhibit plausiblememory interferenceerrors, role-binding errors, and lexical errors in overload situations and in the face of noise and damage;... [demonstrate] that a large-scale natural languageprocessing(NLP) system that performs at the level of symbolicNLP modelscan be built from distributed artificialneural networks;.., developgeneraldistributedneural networkmechanisms [ for] high-levelcognitivemodeling (Chap. 1, Introduction ). He achieves this ambitious goal through conceptually simple, well-chosen representation techniques, and by a well-designed modular organization of his program DISCERN (Distributed SCript processing and Episodic memoRy Network). After outlining these, I'll tell how I think this study relates to what I consider a fundamental goal of cognitive science. The book describes a fully connectionist system, of scripts, lexicon, and episodic memory, that answers questions about stories that are incomplete instances of stereotypical scripts (a restaurant script, having fancy, fast food, and coffee shop variants, or tracks, and three tracks each of shopping and travel scripts). Each action in a script is represented by roles (agent, patient, instrument, patient modifier) filled by individual constituents of the sentences in any particular instance of the script. (Scripts and comparison of symbolic and distributed methods are reviewed in Chap. 2, Background). Cullingford's ( 1981 ) well-known program, when given a script and a story instantiating part of a script, inferred the omitted information as well, and used it to answer questions. In contrast, Miikkulainen doesn't need to provide the scripts, which his program can derive as well (as encoded expectations) from a collection of fragmentary stories. A question finds its way to a place near the storage locus for the relevant script, from which it can access a memory trace, to derive answers from stored expectations derived from statistical regularities in the example stories it had seen (Chap. 3, Overview of DISCERN). The program is trained by showing it many stories, instantiating various tracks of various scripts, never a complete instantiation, but only fragments, such as we normally encounter in speech or writing. From the distribution of words in the totality of such fragments, the program evolves expectations for items not mentioned. These allow a recurrent net575
5 76
story. In Part III, Memory Mechanisms, the story, with case roles filled in, is stored, in a single presentation, in an episodic memory module, which is a hierarchy of self-organized networks (Chap. 7, Self-Organizing Maps), in which the top level comes to represent the script (restaurant, shopping, travel), the second level the track, and the third level, the specific instantiation. Each level receives only facts specific to the story at that level of description (Chap. 8, Episodic Memory Organization: Hierarchical Feature Maps). A partial story can cue the hierarchical memory to return the stored trace of the most similar story, by being mapped to a point in the episodic memory surface where the resulting focus of neural excitation, through lateral connections pointing to centers of activity, is shifted over to the location of the nearest stored memory; information stored at this location is converted to a full representation of the story (Chap. 9, Episodic Memory Storage and Retrieval: Trace Feature Maps). A question input to the system is converted to a partial representation of the most likely story that fits the question, by filling in as many case roles as can be determined from the question, and filling in expectations (averages of symbols having the same roles in the training corpus). This story cues the episodic memory to obtain the answer. Chapter 10, Lexicon discusses the architecture of the lexicon, and models of the human lexical system, comparing processing errors made by each. Part IV, Evaluation presents extensive tests and annotated transcripts (Chap. 1 1, Behavior of the Complete Model); describes successes and shortcomings of the model (Chap. 12, Discussion); compares the model with other studies of artificial neural models and human memory (Chap. 13, Comparison to Related Work); and discusses limitations, needed extensions, and directions for future research (Chap. 14, Extensions and Future Work). Appendices contain transcripts and implementation details. The comprehensive treatment could guide many tracks of a reading course, or a course on language that would trace out directions pointed to by this section. The chapter on related work, in itself, could guide a course. I'm glad I don't have to return this book to the library! I appreciated the clear exposition of the entire book, and was grateful to find, not some intricate system I could only envy, but conceptually simple, clear-cut methods I could use. Although much of the very large program is on lnternet, I easily programmed small versions of two basic mechanisms myself, from the book alone, to play with them quickly on a PC. The book's significance reaches far beyond the present application. Not presuming to compete with the author's comprehensive suggestions for further research, I'll discuss instead what I presume to be a primary goal of cognitive science, toward which this and related work make notable progress; surely, others will find their own applications of this masterful study. Generalizations of these techniques should apply to a much wider range of experience than understanding script-based stories. Because one can't expect to map all experience topologically into two dimensions, such mappings must be coordinated into more global structures through further modular organization. How could this organization arise? The idea that there are roles, and that they should go to particular fields of the target are both ideas created by the experimenter,
1~ H. Greene
not by the system. Could such ideas arise through Piagetian development? Must they be innate? Print darkness, of course, typifies any source of raw information lacking explicit schematic links with other experiences. A general analog of this explicitly uninteresting input could be raw sensor output or other feelings, or the activity of internal dynamical systems, for which significance may lie only in their patterns of co-occurrence. Miikkulainen extracts implicit information from knowledge of script and roles, specific to his task; for any generalization, a suitable learning task must be devised. Can we generalize this procedure to suit input sequences of experienced feeling complexes in typical scenarios (Greene & Chien, 1993 ) that can be archetypes of schemas, such as containment, which underlie so much of our thinking (Johnson, 1987)? These might be replayed in time-invariant form (Port & van Gelder, 1991 ). An obvious candidate target is prediction of the next feeling complex. An unsupervised method of Schmidhuber and Prelinger (1993) predicts not the specific next input (which may be impossible), but a class, discovered by the program, to which it is expected to belong. Will training converge for generalized training examples lacking the combinatorial nature (script, track, individuals) of Miikkulainen's inputs? And if so, will the evolved representations be as useful as his? Codes for feelings and clusters of feelings, and their extension to codes and nearby storage of similar sequences, might lead to compressed representations of the many sequences (and their relations) comprised by a schema such as containment (Johnson, 1987). Rough high-level rules embodied in these feeling-based representations would be combined with representations of a specific situation, to obtain the proper specialization. Symbols for compressed sequences might explain how we discriminate close synonyms we can't define, and instantly apprehend potentialities of our situation that are hard to derive through words and logic. Higher and lower animals differ in ability to apprehend an experience as a coherent scene (Edelman, 1989, 1992), and to apprehend previous and potential experiences--abilities we may be able to study through generalization of the book's techniques, along with recent techniques for compressing complex data (Pollack, 1990) and combination of distributed data into symbolic representations amenable to logic ( Legendre, Miyata, & Smolensky, 1991; Smolensky, Legendre, & Miyata, 1992). Kant, Lotze, Cassirer, Piaget, and others (detailed citations in Greene, 1959) argued that our thoughts are not copies of a definite world of facts, but require forming impressions into representations adequate to sustain logic and abstraction; they recognized that an individual element of experience must contain information about its connections with other elements of actual and potential experience, just to be an element of experience at all. Cassirer showed that bodily feelings, metaphor, and myth can constitute and organize meaning of spatial and other ideas; Lakoff ( 1987, 1989), Johnson (1987), Talmy (1988), and Greene and Kozak (1990) have abundantly elaborated their primary role in meaning, language, and movement. Action-generated feelings, unlike arbitrary codings, contain internal structure from which we may infer how to act, in relation to how we might act in a myriad of potential variant situations; they must become schematic in
Book Review
5 77
the sense of containing information about their relations with other representations. If a primary problem of cognition is, as argued by Kant, Lotze, and Cassirer, to "concentrate" experiences and "distill" them down to units that are amenable to logic, yet contain principles of ordering and connection with other units, then tools for making this a primary subject of research are at hand.
Peter H. Greene Computer Science Department Illinois Institute o f Technology l i T Center, Chicago I L 60616
REFERENCES Cullingford, R. (1981). SAM and Micro SAM. In R. C. Schank & C. K. Riesbeck (Eds.), Inside computer understanding (pp. 75135). Hillsdale NJ: Edbaum. Edelman, G. M. (1989). The remembered present: A biological theory of consciousness. New York: Basic Books. Edelman, G. M. (1992). Bright air, brilliant fire: On the matter of the mind. New York: Basic Books. Greene, P. H. (1959). An approach to computers that perceive, learn, and reason. Proceedings of the Western Joint Computer Conference (March 1959) (pp. 181-186). Greene, P. H., & Chien, G. T. H. (1993). Feeling-based schemas. In Neural architectures and distributed A1. From schema assemblages to neural networks. Center for Neural Engineering Technical Report, University of Southern California.
Greene, P. H., & Kozak, M. (1990). A body-based model of the world. Proceedings of the 5th IEEE International Symposium on Intelligent Control ( pp. 81-85 ). Johnson, M. (1987). The body in the mind." The bodily basis of meaning, imagination, and reason. Chicago: University of Chicago Press. Lakoff, G. ( 1987 ). Women,fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press. Lakoff, G. ( 1989 ). A suggestion for a linguistics with connectionist foundations. In Proceedings of the 1988 Summer School (pp. 301314). San Mateo CA: Morgan Kaufmann. Legendre, G., Miyata, Y., & Smolensky, P. (1991). Distributed recursive structure processing. In D. S. Touretsky & R. Lippman (Eds.), Advances in neural information processing systems 3 (pp. 591-597). San Mateo CA: Morgan Kaufmann. Pollack, J. B. ( 1990 ). Recursive distributed representations. Artificial Intelligence, 46, 77-105. Port, R. E, & van Gelder, T. ( 1991 ). Representing aspects of language. Proceedings of the Thirteenth Annual Conference, Cognitive Science Society Schmidhuber, J., & Prelinger, D. (1993). Discovering predictable classifications. Neural Computation, 5, 613-624. Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46, 150-216. Smolensky, P., Legendre, G., & Miyata, Y. ( 1992 ). Principles for an integrated connectionist/symbolic theory of higher cognition (Report CU-CS-600-92). Computer Science Department, University of Colorado at Boulder. Talmy, L. (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49-100.