Language and representation in information retrieval

Language and representation in information retrieval

lnfwmotion Processrng & Mmagemenr Vol. 27, Non. 2/3. pp. 245-248, 1991 Printed in Great Britain. Copyright 0 03064573/91 $3.00 + .w 1991 Pergamon Pr...

257KB Sizes 1 Downloads 107 Views

lnfwmotion Processrng & Mmagemenr Vol. 27, Non. 2/3. pp. 245-248, 1991 Printed in Great Britain.

Copyright 0

03064573/91 $3.00 + .w 1991 Pergamon Press plc

BOOK REVIEWS Language and Representation in Information Retrieval. D.C. BLAIR.Elsevier, Amsterdam and New York (1990). xiv + 335 pp., $89.75, ISBN O-444-88437-8. Blair’s book is an assemblage of observations on information retrieval, for the liberally educated reader. Its predominant theme is that the central problem of information retrieval, document representation, can best be attacked by studying the use (vs. word definitions) of Ianguage. Yet the discussion ranges from practical to metatheoretic concerns, and from intr~uctory to esoteric material. It should be read for its suggestiveness rather than for immediate solutions. The introductory chapter distinguishes data retrieval from document retrieval. Documents are complex entities in comparison with database fields, and lack simple, unique, mnemonic identifiers. Thus, it is claimed, users can be left floundering in a combinatorial explosion of keywords. The “indeterminacy of representation” (p, 22) comprises both the non-uniqueness of keyword descriptions, and a claim in later chapters that keyword meanings are simply impossible to define. Both senses drive the argument for studying language use, and Blair should distinguish them consistently. The second chapter presents existing information retrieval technology in a uniform, simple manner reminiscent of a corporate MIS tutorial. The third chapter addresses system evaluation, reporting in detail on a study conducted by Blair and Maron. The study assesses recall as well as precision for a particufar retrieval project, and documents several classes of system failure that are deserving of further study. It approaches an important problem in IR systematically and in depth. Readers interested in this work might appreciate the debate with Salton, in the pages of CACM and IPM. The fourth chapter introduces semiotics and philosophy of language, primarily Sassure, Eco, Zipf, Wittgenstein, and Austin. The material is well chosen and accurately presented, but contains unclarities attributable to both Blair and the cited works. -For one, “The well-known triangles of Ogden and Richards, Pierce, and Frege” are drawn on p. 129 with no further explanation. For another, it is unclear how the critique of mentalism would apply to those theories in which mental entities are understood to have a neurological grounding or encoding. For a third, Blair charges linguists with the assumption that one word has one meaning. This is rather a straw man, and it is unclear what Blair- or Wittgenstein - would say about actual models of lexical semantics, including homynyms and classes of word sense shifts and specializations. The claim is made that statistical methods in IR will inevitably fail to capture the sociaf framework and goals that enmesh our documents and queries. Chapter four is heavy going, with 36 pages of notes to 92 pages of text, but it does eventually lead to several suggestions for information retrieval systems. They are rather an eclectic lot, and would be greatly strengthened by incorporation into a single information retrieval architecture or by individual experiments. One interesting notion is that the indexing vocabulary distribution should fit Zipf’s Law, since it serves to communicate with information seekers. Another interesting notion is that documents could be represented as communicative acts, as in the COORDINATOR message handling system. Documents would have descriptors for relevant contextual information, such as witnesses and legal advisor for a contract, or the duration of a guarantee. Blair does not discuss the pros and cons of this document representation directly. One wonders how users are coaxed to provide reliable descriptions, and whether these will be of any use in high volume or communicatively homogeneous domains. One would expect it to be intuitive to business users, though. The fifth chapter concerns relevance feedback, the user’s communication of satisfaction and dissatisfaction to the retrieval system. An application of genetic algorithms to the problem is discussed in detail Like neural nets, genetic algorithms yield adaptive systems and are therefore very attractive. The open questions include scalability, whether infrequently accessed documents adapt at all, and how genetic algorithms compare with other adaptive techniques. The sixth chapter introduces philosophy of science, represented by Kuhn, Popper, and Nagel. It makes sociological observations about the IR community that are helpful to the newcomer, and concludes with a list of tasks facing the IR community. The seventh chapter, the conclusion, reiterates some of the observations in the text and adds a few new ones. The negative assessment of Artificial Intelligence, it should be noted, is specifically 245

Book Reviews

246

a criticism of the most naive expert systems approach to information retrieval. It is not intended to apply to connectionist models, in which Blair expresses interest, nor does it address current trends or primary sources, especially in the language processing and knowledge representation communities. Many natural language and knowledge representation applications systems suffer from brittle performance and narrowness of subject domain. However, there have been developments in language pragmatics that are relevant here, such as Cohen, Perrault, Hirschberg, Hinkelman as well as more directly IR-oriented work by Lewis, Mauldin. An introduction to the field is Allen. The bibliography is distinctly partisan, but does not pretend to be otherwise. Authors are indexed into the end notes, making bibliographic information easier to locate than references in the text. The document itself is heir to many of the failings of camera-ready copy, with numerous formatting and typing errors, and several uninformative or misleading section headings. Overall, the book would be strengthened by a listing or comparison of possible document representations, with some argument for keyword or phrase representations, which are assumed through most of the book. This type of gap will frustrate the more systematic reader. However, I would recommend the book to those reference librarians, MIS professionals, or others interested in playing with some new ideas in IR. BIBLIOGRAPHY Allen, J. (1987). Natural language understnnding. Menlo Park: Benjamin Cummings. Cohen P.R., & Levesque, H.J. (1990). Rational interaction as the basis for communication. In P.R. Cohen, .I. Morgan & M.E. Pollack (eds.), Intentions and communication. Cambridge, MA: MIT Press. Hinkelman, E. A. (1989). Linguistic and pragmatic constraints on utterance interpretation. Computer Science Department, University of Rochester, Rochester, NY. Hirschberg, J.A. (1985). A theory of scalar implicature. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Lewis, D.D., Croft, W.B., & Bhandaru, N. (1989). Language-oriented information retrieval. International Jour-

nal of Intelligent Systems, 4, 285-318. Mauldin, M. (1989). Information retrieval by text skimming, Ph.D. thesis, Carnegie Mellon University. Mauldin, M., Carbonell, J., & Thomason, R.H. Beyond the keyword barrier: Knowledge-based information retrieval. Proc. 29th National Federation of Abstracting and Information Services (NFAIS-87), North-Holland Press, Amsterdam, March 1987. Perrault, C.R. (1990). An application of default logic to speech act theory. In P.R. Cohen, J. Morgan, & M.E. Pollack (eds.), Intentions and communication. Cambridge, MA: MIT Press.

Center for Information University of Chicago Chicago, IL

and Language Studies

ELIZABETHANN HINKELMAN

PC Management: A How-To-Do-It Manual for Selecting, Organizing, and Managing Personal Computers in Libraries. M. SCHUYLERAND J. HOFFMAN.Neal-Schuman, New York (1990). 212 pp., $35.00, (How-to-do-it manuals for librarians; no. 6), ISBN l-55570-076-4.

PC Management is indeed a how-to manual on fabricating a uniform appearing computing environment for a library system. Its specific audience is the person or persons who have, by choice or by accident, fallen into the role of local computer “expert”- the person who has the nerve to poke around in the innards of the library’s micros. But it deserves a larger audience. Top administrators can gain insight into how to figure out how to deal with the mishmash of computer types and software in their libraries. The lone computer user at home or work can use the framework and specific approaches to create a more organized environment. And, the authors show how to keep costs down when considering the purchase of software, hardware, supplies, and maintenance. Schuyler and Hoffman advocate a uniform interface for all microcomputers in a library. Any machine, when turned on, would display the same style of menu, although the choices of software would vary according to the intended purpose of that machine. Software would also be standardized across all machines in the library, thus avoiding transfer problems and the difficulties in learning, for example, several different word processing programs. The machines themselves would be MSDOS based and, in recognition of the cost savings, would be clones of the IBM-PC standard. The writing is spritely, down-to-earth, and intended to convey the authors’ practical experience. They start by illustrating how to write batch files to set up a simple menu system and proceed to examine a more sophisticated interface created with a software program called Saywhat?! The third chapter examines two types of software products: first, those that help the local computer guru diagnose problems and second, the applications programs that computer users use to do their jobs (word processors, spreadsheets, database managers, communications programs). This chapter contains spe-