Expert Systems With Applications, Vol. 2, pp. 345-352, 1991 Printed in the USA.
0957--4174/91 $3.00 + .00 © 1991 Pergamon Press plc
Expert Document Retrieval via Semantic Measurement H . JOEL JEFFREY Computer Science Department, Northern Illinois University, DeKalb, IL, USA
Abstract--A new technology for intelligent full text document retrieval is presented. The retrieval o f a document is treated as an expert system problem, recognizing that human document retrieval is expert behavior. The technology is semantic measurement. A working prototype system, LIBRARY, has been built based on the technology. Input is a request for information, in unrestricted technical English; output is all documents with measured content similar to that o f the request, ranked in order o f relevance. Retrieval is unaffected by similarity or dissimilarity o f terms between request and document. LIBRARY's performance is comparable to that o f an expert human librarian, representing a significant improvement over traditional document retrieval systems.
Maron (1985) found that full text retrieval retrieved only 20% of the relevant documents. Further, even those documents found are not returned in order of relevance. It is a simple matter to rank documents in order of number of term matches, but due to the difficulties of associating term similarities with conceptual relevance, such a ranking does not have high correlation with ranking of subject matter similarity by experts. The essence of the problem is that full-text retrieval "is based o n the assumption that it is a simple matter for users to foresee the exact words and phrases that will be used in the documents they will find useful, and only in those documents" (Blair & Maron, 1985, p. 295), and this assumption is not borne out by the facts. The situation is discussed in more detail in Blair and Maron (1985). Thus, while few would question the usefulness of document retrieval systems, it seems clear that one cannot expect them to perform like an expert human reference librarian. (How well human librarians actually do perform is another question, akin to how well human medical diagnosticians actually perform.) By noting that the human librarian is an expert, we gain a useful insight into this inability to perform: the traditional approach to document retrieval has not addressed the problem as an expert system problem. It has attempted to substitute algorithmic techniques (keyword matching and its many variants) for expert skill. In recent years, researchers have begun to address some of the problems with traditional retrieval systems, with intelligent information retrieval systems. These are systems in which inference and probabilistic or fuzzy search are used to match documents with request. However, the underlying model remains descriptors composed of sets of terms; the intelligence addresses
1. EXPERT L I B R A R I A N V E R S U S D O C U M E N T RETRIEVAL
A GOOD human reference librarian is an expert. Specifically, he/she is expert at recognizing the subject matter of a library user's request and retrieving for the user documents relevant to their request. In more detail, four criteria are important: 1. All relevant documents should be retrieved. 2. No irrelevant documents should be retrieved. 3. The items retrieved should be ranked in order of relevance. 4. The particular vocabulary used by the user and the author of the document are of no import; the librarian should recognize relevance of a document regardless of whether the terms in the document are otherwise similar to those of the user's request. Historically, document retrieval systems were intended to function in this manner, thus replacing the human reference librarian. They have not done so. Although keyword-based retrieval systems have been shown to be valuable search tools, they do not perform as expert librarians. The two most commonly used measures of system performance are precision (percentage of retrieved documents found to be relevant to the request) and recall (percentage of relevant documents retrieved). A remarkable consistency of results is found in the literature of traditional document retrieval system performance: Recall + Precision ~ 100 (e.g., 50% recall + 50% precision, 90 + 20, 30 + 80, 70 + 30, etc. are the figures reported). Salton (1989) cites an average recall of 30%, and average precision of 51%. Blair and
Requests for reprints should be sent to H. Joel Jeffrey, Computer Science Department, Northern Illinois University, De Kalb, IL 60115.
345
346
11. J. Jeffrey
the related problems of determining better descriptors and improved mechanisms for matching descriptors. In the literature, ordinary keyword retrieval systems are called "traditional," as contrasted with "intelligent"; this paper adheres to that usage. Intelligent information retrieval is discussed in Section 5. The goal of the research presented in this paper was to construct a document retrieval system based on a technique for simulating expert document relevance judgments, and evaluate the performance of the system in response to actual requests by actual users. The system was specifically intended to function as an expert librarian system.
2. T H E LIBRARY S Y S T E M 2.1. System Overview LIBRARY is an interactive document retrieval system, accepting requests in ordinary technical English, with no specialized formatting, and producing a list of all documents in the data base relevant to the request, ranked in order of relevance. It is based on a technique for simulating expert judgment of relevance, semantic measurement, developed by P. G. Ossorio (1966), but never used in a working system. The goal of the research was to test the concept of expert retrieval, and the technique of semantic measurement, in a system large enough to be useful to actual users needing information for real work, but not so large as to entail massive programming and online storage. Retrieval is solely on the basis of content similarity, as found by semantic measurement. In order to be able to evaluate the effectiveness of semantic measurement as a technique, a number of enhancements were deliberately excluded. In particular, LIBRARY includes no Boolean query facilities, no ability to specify citations or particular authors, and no subdocument retrieval capabilities. The user may specify the relevance threshold, instruct the system to alter some of the values it uses in measuring semantic content, and add technical vocabulary by simply giving the system the definition of the new term. LIBRARY's operation is entirely automatic. Requests are processed automatically, with no formatting of requests or intervention of a human intermediary or search expert, and documents are processed and added to the system automatically.Documents are retrieved in order of relevance. Retrieval is controlled by either a threshold of relevance or by ending retrieval after three consecutive irrelevant documents are retrieved (irrelevant as determined by the user). The following transcripts of user sessions illustrate these capabilities.
Ordinary Retrieval Enter your request: l'd like whatever you have on paging. Do you want to alter the retrieval threshold? n Documents retrieved on response to your request: Doc. No. 246 Dissimilarity: 1.12 Title: Using Page Residency to Select the Working Set Parameter By: Barton G. Prieve Place: CACM October 1973, P. 619 Print Abstract? n Doc. No. 259 Dissimilarity: 1.42 Title: Empirical Working Set Behavior By: Juan Rodriguez-Rosell Place: CACM September 1971, P. 556. Print Abstract? y Abstract: The working set model for program behavior has been proposed in recent years as a basis for the design of scheduling and paging algorithms, although the words "working set" are now commonly encountered in the literature dealing with resource allocation . . . . (4 more documents retrieved in this session.) Judgment Revision Do you want to negotiate about the system's judgment of your request? y Your request appears to have the following content: Content type 10 Analysis of Algorithms Computational Complexity Amount: 5.4 Relative importance: 1.0 Content type 12 Data Structures Programming Techniques Amount: 2.7 Relative importance: 1.0 Content type 34 Graph Theory Amount: 5.8 Relative importance: 1.0 All further content amounts less than 2. Do you want them displayed? n Content type you wish to change: 12 Current amount: 2.7 Relative importance: 1.0 Any changes? y New amount: 0 New importance: 0 (User feels that the amount of this type of content is irrelevant to his needs.) Further changes? n
Expert Document Retrieval
(The relative importance feature was added to permit a user to, in effect, say, "Yes, this request does have this type of content, but do not give it as much weight in measuring similarity." Interestingly, the feature was rarely found valuable by users, contrary to expectation.) Defining Vocabulary
Do you want directions for changing the available vocabulary? n Do you want to use the system vocabulary? y
347
Recalling that the goal of LIBRARY was to perform specifically as an expert system, i.e., as an expert human reference librarian, it is appropriate to note that there seems to be a qualitative difference here, as well as the quantitative one: a system retrieving 85% of the relevant documents (recall), 86% of whose documents are relevant (precision), can reasonably be said to be acting as a competent librarian. The pilot study would appear to indicate that this form of expert retrieval can perform significantly better than traditional document approaches, and that semantic measurement is a useful new technique.
(Other vocabularies can be specified by the user.) 3. T H E S E M A N T I C M E A S U R E M E N T SPACE Do you want to use additional previously defined terms? n Do you want to add new terms? y New item: computational linguistics Definition of new item: This is the field which deals with the problem of natural language understanding. A good deal of formal language theory and parsing techniques are involved. The usual approach is to define a formal grammar and then add devices for representing the facts about the entities involved to control the parsing. Chomsky started it with his work in modeling a natural language with a context-free language. The field is closely allied to artificial intelligence and formal languages and automata theory. This is a good example of how a user "defines" a term to LIBRARY: simply by describing it as he/she would to another person. The description, as will be noted in this example, is not so much a formal definition as it is a discussion of the concept. The better the description, the better L I B R A R Y ' s semantic measurement of the term. Unlike other techniques, additional known terms rather than causing false drops improve the usefulness of the term.
2.2. Performance L I B R A R Y ' s performance is: Recall + Precision = 158
(using a prespecified threshold ) and Recall + Precision = 171
(when retrieval ended after three irrelevant documents). Thus, L I B R A R Y appears to perform 60% to 70% better than traditional systems. Due to the operation of the semantic measurement technique, items are retrieved in order of relevance. Further, retrieval is not sensitive to the particular vocabulary used in the request and documents; retrieval is on the basis of content similarity, not term similarity.
The heart of L I B R A R Y is a s e m a n t i c m e a s u r e m e n t space, which measures (in a sense to be defined below) the content of each document in the system, and the content of a request, and selects documents whose measured content is most similar to that of the request. A semantic measurement space is an n-dimensional Euclidean vector space in which: 1. There is an orthogonal set of basis vectors; 2. each basis vector represents a type of semantic content; 3. each component of an item's n-component location in the space represents the expert judgment of the degree to which the item has that type of content. Words and phrases are of course used in the operation of the system, but not in a matching algorithm. The terms of a piece of text (document, subdocument, or query) are used (via the algorithm given below) to calculate the location of the text in the space, which is the measurement of the a m o u n t of each (orthogonal) type of content present. (Ordinary retrieval systems calculate a vector for a document and retrieve by similarity of the document's vector and the request vector. However, the vectors are vectors of keywords, perhaps with weights, and do not represent expert knowledge. Further, vector similarity represents similarity of terms between document and request, not expert judgment, as in an expert retrieval system.) "Semantic content" is a deliberately general term chosen to reflect the variety of criteria experts use to judge the suitability of a document. Examples are as follows: subject matter relevance, properties of the elements discussed in the document, and categories of the elements (e.g., mathematical, chemical, informationprocessing, etc.). The type of content used by LIB R A R Y as of this writing is subject matter relevance.
3.1. Construction of the Space We will present the construction of the semantic measurement space used in LIBRARY. As will be seen, the construction is quite general.
348 3.1.1. Domain Selection. As with the construction of any expert system, the first step is to define the domain of expertise. Further, the goal of the work was limited and quite specific: a pilot test of the semantic measurement technology in a system large enough to be a valid pilot study. We were not attempting to explore the limits of the technology. It was decided to construct a system that could answer requests for information within the field of computer science according to the sole criterion of subject matter relevance. (Earlier work by Ossorio [ 1966 ] had shown strong evidence that retrieval by subject matter relevance plus category and attribute similarity was significantly superior. Thus, this appeared to be a conservative test of the technology.) Further, it was decided that documents would be abstracts of articles, as this would allow enough text to provide the pilot study, but not present large programming problems. A document base of 600 abstracts was used. A list of 62 subject-matter fields within computer science was developed by referring to the fields listed in such standard references as Computing Reviews. Examples are as follows: analysis of algorithms, automata theory, data structures, graph theory, memory management, operating systems, paging, sorting, and searching. The only restriction was that the fields cover the area of interest. All questions of overlap of fields and all of the myriad relationships that might hold between them were deliberately ignored. 3.1.2. Selecting Vocabulary. The second step was to select the vocabulary the system would recognize. A very simple procedure was followed: First, a selection was made of articles in the various fields. The articles were scanned by an expert who simply underlined technical words and phrases. Second, the list of 62 subfields was examined, and technical terms added to ensure that there were at least three terms from each field. In this way a system vocabulary of 843 technical terms was developed. 3.1.3. Expert Rules. Third, expert human judges rated each item of the system vocabulary with respect to each of the 62 subject matter fields. Numerical values were used to express the relevance of each term to each field, as follows: 1. Irrelevant. Rating: 0. 2. Might be. This term might have some relevance to the field, but would not ordinarily be considered relevant. Rating: 1 or 2. 3. Peripheral. The term has some relevance, but is basically peripheral to the field. Rating: 3 or 4. 4. Relevant. This term is definitely part of the field. Rating: 5 or 6. 5. Highly significant. The item is an important or significant concept in the field. Rating: 7 or 8.
H, J. Jeffrey Within categories B-E, the higher value indicates greater relevance. In making the ratings, judges take no account of the relations between fields; they do not analyze implications, but simply give their overall expert opinion as to this case, their "first reaction." The amount of labor to acquire ratings of every term with respect to every field is significant, but not overwhelming. Several graduate students reported that rating 400 terms with respect to 20 fields required 8 hours. LIBRARY itself is written in approximately 2,000 lines of FORTRAN. The heart of the system, the routines to calculate a request's location in the vector space and the distance to each document, are about 50 lines each. As with any expert knowledge base, one must be concerned with reliability. Ossorio (1966) found that to represent an "average" expert, three judges per entry are needed. However, if this requirement is not met, the knowledge base is not defective; it simply represents a particular expert's viewpoint, rather than a consensus. 3.1.4. Orthogonal Basis. The result of the above step is a knowledge base of relevance of terms to fields in the form of a matrix. This knowledge base could be used for retrieval in this form because each term found in a document or request has an associated content vector. However, this seemed a likely source of problems and errors since the subject matter fields are not, in general, an orthogonal basis for the data. To see the problem, consider the following example. Suppose we have documents Dl, D2, and D3, and the fields of Analysis of Algorithms, Computational Complexity, and Assembly Language Programming, with ratings shown in Table I. Since the fields of analysis of algorithms and computational complexity are more closely related than either is to assembly language programming, Dt and D2 are more closely related than are D~ and D3 or D2 and D3, but this similarity is not reflected in the numerical values. A system using only this form of the knowledge base would, thus, be unable to properly simulate expert retrieval. To solve this problem, rather than use the expert judgment matrix directly, we find an orthogonal basis for it in which each basis vector represents a set of correlated subject matter fields as measured by the correlation coefficients of the judgment matrix. The basis is found by factor analyzing the judgment matrix.
TABLE 1 A Problematical Example
D1 D2 D3
AA
CC
ALP
8 0 0
0 8 0
0 0 8
Expert Document Retrieval
349
Measurable c o m m o n factors, plus any necessary unique factors, are the basis vectors. The factor loadings give the relationship between the original subject matter fields and the factors in two ways: (1) the loading is the cosine of the angle between the factor and the field, and (2) the square of the loading is the portion of the variance of the field accounted for by the factor. Due to the meaning of the original data, the basis vectors represent orthogonal types of subject matter content. Correlated fields have been grouped into c o m m o n factors, and the troublesome redundancy in the original knowledge basis is removed. The resulting vector space is the semantic measurement space. The semantic measurement space for LIBRARY consists of 12 c o m m o n factors and 44 unique factors. Some examples of the factors with the loadings of their component fields are shown in Table 2. The treatment of the field of Assemblers is typical: the content has been represented in the space by two axes, one representing the content of the field that it has in c o m m o n with others, and one representing its unique content.
3.1.5. Term Locations. The space is used by calculating the location in the space of any text, document, request, or (as we shall see later) the definition of a new vocabulary term. To do this we must have the location of each of the terms in the space. The original ratings of terms with respect to subject matter fields and the loadings of the fields on the factors are used to calculate a vector for each term. LIBRARY uses the following formula: TL,,j -
~, rik" 13 ~, 13j Lj,
(1)
where
TL~,j is the value of term i on axis j ri~ is the rating of term i with respect to field k
lkj is the loading of field k on factor j Lj is the maximum of all loadings on factor j. 3.1.6. Document Location. Finally, each document in the system is located in the measurement space. This is done by combining the vectors for each term ha the document into a single "content summary" vector. Obviously, a wide range of algorithms could be used for this calculation; LIBRARY uses the following: m
b tk DLk = IOgb j =l----k-----, m where
DLk is the k th component of the document location t~' is the rating of term j with respect to field k b is base for the log average, typically 2 < b -< 10 m is the number of recognized terms in the document In other words, the k zhcomponent of the document's location vector is the log average of the values on that axis of each term found in the document. Here we see the crucial contribution of orthogonality of the axes: we are guaranteed that the value of component k of the document's vector can be calculated entirely from the k th components of the terms in the document. A log average was used for ease of experimentation. By varying b in the above formula, the contribution due to lower values of the component is increased or decreased.
3.1.7. Ready for Use. The procedure just described is for the initial construction of the space. After calculating the location of the documents, the space is complete and ready for use; none of these steps is part of using the completed system. In particular, gathering vocabulary, expert ratings, and factor analysis are not part of retrieval. 4. USE O F LIBRARY
TABLE 2 Selected LIBRARY Axis
Axis
Fields
Loading
2
Assemblers Compilers & Generators Programming Languages Error Recovery Translation Content Analysis Retrieval System Evaluation File Searching Information Retrieval Vocabulary Finite Automata Regular Expressions Assemblers
0.71 0.86 0.78 0.75 0.86
3
5 17
(2)
0.90 0.85 0.93 0.80 0.80 0.82 0.47
4.1 Retrieval
A user has two options for retrieval: by request, or by similarity to a specified document.
4.1.1. Requests. When retrieving by request, the system prompts the user to type his/her request, as complete and detailed as desired. The request is scanned for known vocabulary items, and then located within the measurement space via the algorithm above. It should be noted that this is different from traditional retrieval systems in which careful phrasing of the query, with no extraneous terms, is quite important. Since the location in the space represents expert
350
judgment of the subject matter content of both documents and requests, similarity of content of the request to that of any document can now be calculated by calculating the distance from the request to the document. In a sense, all of the work in the foregoing section was done to ensure this characteristic, namely, that texts close in the space have similar content. This is precisely the characteristic that would not have been possible had the original data matrix been used, rather than the orthogonal basis. Therefore, the system calculates the distance from the request to each document in the system, sorts the documents by distance, and returns documents with subject matter content similar to that of the request to the user. The user may use either of two criteria for stopping retrieval: distance exceeding a threshold distance, or three successive returned documents judged irrelevant. The threshold was pre-determined empirically as 4.6. It is user changeable. 4.1.2. "One L i k e This. '" It is not u n c o m m o n for a library user to find a document that is particularly relevant, and ask the librarian for documents "like this one." LIBRARY provides this capability. The same distance-calculation algorithm is used, but of course the particular document used as criterion already has a location in the space, and is therefore not scanned for terms.
4.2. Adding Vocabulary As with any retrieval system, for a piece of text to be indexed, it must have terms known to LIBRARY. A significant advantage of the semantic measurement approach is that the more known terms in a document, or request, the better LIBRARY's estimate of its subject matter content; additional terms do not cause "false drops" due simply to term coincidence, as with nonexpert retrieval systems. Formula 2 shows the reason for this: in the presence of several terms, one extraneous term value, even if high, is "diluted" by the log average. Since it is highly undesirable for a user to have to know LIBRARY's vocabulary, LIBRARY includes the capability of defining new items. In this mode, LIB R A R Y asks for the term and then prompts the user, as with a request, to explain the term in as much breadth and depth as the user desires. The term is then added to the system vocabulary and its location in the measurement space is calculated. The term is then available for use in requests or further definitions. As illustrated above, using semantic measurement to add new terms has some interesting implications. To add a term, the user simply "talks" about the term as he/she might to a colleague, mentioning relevant concepts, use, related terms, perhaps relevant history, etc., rather than giving a formal definition. Since re-
H. J. Jeffrey
trieval is not done by term matching of any kind (traditional or expert), additional already-known terms in the definition on the new one do not cause false retrievals or misdirect the search.
4.3. Current Limits L I B R A R Y is limited in two respects by the content measurements it can make. For example, human expert librarians are sometimes specialized; a physics expert may know nothing of chemistry. LIBRARY has the same limitation; if a request is received with little or no content in LIBRARY's area of expertise, LIBRARY will not be able to judge its content. However, unlike most expert systems, L I B R A R Y can recognize when it has a request it cannot handle. If all of the components of a request's vector are small, then LIBRARY is judging the request to be basically irrelevant to all of the content types represented, It is easy to test for this condition. Second, due to the pilot-study nature of the LIB R A R Y effort to date, the system has no hierarchical structure in its content measurement. For example, it has no representation of the subspecialties within Artificial Intelligence. While, obviously, such can be added, as was done in Operating Systems, Systems Programming, etc., we are limited by the total number of fields: it would not be practical to construct a Library of Congress system by gathering expert ratings for 20,000 terms against 1,000 fields. The result of this lack of finer distinctions is lower precision. For example, since LIBRARY does not have the subfields of game playing and natural language understanding, a request for information on computer chess will retrieve articles on natural language understanding, since both are relevant to artificial intelligence. Interestingly, however, a request about automatic theorem proving gives somewhat better results, since it is evaluated as significantly relevant to another axis, Mathematical Logic. This precision limitation mirrors a h u m a n expert limitation: a h u m a n reference librarian who did not know of these subfields would make the same error. As noted earlier, LIBRARY only operates on subject matter similarity. Therefore, it cannot respond properly to distinctions between, e.g., mathematically oriented versus physically oriented articles about operating systems, except insofar as other types of subject matter are also addressed.
5. RELATION TO OTHER WORK The technique of Semantic Measurement was first developed by P. G. Ossorio (1966). Ossorio developed the technique itself and established its statistical reliability with respect to number of judges, factor extraction method, and method of rotation. Additionally,
Expert Document Retrieval
his work established the suitability of the technique for subject matter classification, as used in LIBRARY, as well as attribute and category classification. Further, he used the spaces to measure the content of documents and requests, and showed that an actual interactive retrieval system based on this seemed possible. This approach lay dormant until the construction of LIBRARY, the first working retrieval system to use semantic measurement. The concept of expert document retrieval, as contrasted with traditional retrieval, appears to be new. As noted earlier, there is a surface similarity between LIBRARY and traditional document retrieval technology in that both rely on vectors and the matching of vectors for retrieval. However, this similarity is deceptive; beneath the surface the approaches are quite unlike. There are two important differences, one conceptual and one technical. The conceptual one is that in a semantic measurement space the vector for a piece of text denotes the amount of each of the orthogonal types of content the text has. The concept of quantifying content in this way is unlike either traditional or intelligent retrieval. An excellent discussion of the vector space approach in retrieval systems may be found in Salton (1989). The vectors are vectors of terms in the documents (or query), typically weighted 0 or 1 (although other values are sometimes used). Similarity of document to query may be calculated by a variety of techniques, but all similarity is based on the term vectors. The systems do not contain an expert knowledge base. The technical difference is that while the vector space approach is quite c o m m o n in document retrieval systems, the vector spaces used do not have an orthogonal basis. Further, the dimensionality is not well-defined, as a document or request may have a variable number of components in its vector. (Certainly one could define the dimensionality of the space as the maximum length of the descriptor vectors, but this issue appears not to have been addressed.) It was noted above that documents are automatically added to the data base. We have deliberately avoided the term "automatic indexing," because it is misleading in this context. LIBRARY's insertion of a document into the data base is done via the semantic measurement algorithm discussed above. Automatic indexing is the technique for automatically adding a document to a traditional retrieval system, but "indexing" means selection of terms to be used to describe the document. The definition in Salton (1989) is standard: "constructing document surrogates by assigning identifiers to text items" (p. 275). While LIBRARY certainly "assigns document surrogates," they are quantified amounts of orthogonal content, not identifiers of any of the usual items. Factor analysis can be used as a clustering technique, and has been used for automatic classification. Se-
351
mantic measurement is different in two ways. First, the data matrix is expert knowledge, not statistical or probabilistic data. Second, we use the resulting factors as an orthogonal basis for calculating locations, rather than using the factors as clusters.
5.1. Intelligent Information Retrieval Intelligent retrieval systems attempt to model the behavior of expert retrievers (or a portion thereof). LIBRARY is, thus, in this category. A good example of the approach is the Archivist's Assistant (De Salvo, Glamm, & Liebowitz, 1987), an expert retrieval assistant designed to reproduce the expertise of the archivists. The Archivist's Assistant is essentially a rule-based expert system in which rules capture three kinds of expertise: ( l ) abstracting topics from a query, (2) matching the abstracted topics to general types of documents and specific record series in the document base, and (3) refining the initial match. (In addition, the system itself is controlled by rules so that the process of retrieval, as well as judgments used, are modeled). A recall of 74% and a precision of 66% (corrected to 96% when system errors due to factors their prototype was not expected to handle were accounted for) were calculated. This compares very favorably with traditional systems, and is similar to the performance of LIBRARY. It is difficult to tell how well the rule-based approach used in the Archivist's Assistant would work on a more standard document base, or how well semantic measurement would compare to the rule-based approach on the archives document base. It seems likely that the key issue is the degree to which the clauses in the rule base overlap, conceptually. If they are quite distinct, the factor analysis to produce the orthogonal basis would return essentially the original axes as (unique) factors. The author has produced exactly this sort oforthogonal basis in another context. Experimental verification of this conjecture would be of significant practical value. A second instructive example of an intelligent retrieval system is 13R, a knowledge-based retrieval system in which user goals and domain knowledge determine the search strategy followed (Thompson & Thompson, 1987). The system uses a rule base to recognize concepts based on terms used in the request and a separate rule-based module to gather additional concepts and query terms. 13R also uses rules to control system operation itself. It seems fairly well agreed that the nonbinary nature o f relevance must be represented somehow in the retrieval system, or performance will be degraded. Probabilistic rules are one way to do this. A second is the use of fuzzy sets, basing retrieval on the intersection of the fuzzy sets representing request and document content. The sets, however, are (fuzzy) sets of terms.
352
H. J. Jeffrey
Radecki (1979) presents a good discussion of this approach.
6. FUTURE WORK L I B R A R Y was explicitly built as a pilot study. As a pilot study, it appears successful. Perhaps the most obvious next step is a larger system, with a larger community of users and document base, and a careful evaluation of its performance. In terms of extensions of the technology, one of the most immediate needs is for a hierarchical structure of content domains, such as described briefly above. This would allow the system to identify a request for "recent results in computer chess," for example, to be identified first as relevant to AI, more specifically to game playing, and more specifically to computer chess. Implementation of at least one additional criterion for retrieval should yield significant performance improvement. Continuing the same example, one would expect the system to perform better if it is able to distinguish between articles on specialized chess hardware and those dealing with psychological aspects of chess playing. (However, this is not a simple question. A certain a m o u n t of this benefit is already achieved by the fact that L I B R A R Y measures, and uses, relevance to all domains at once, in this case hardware or psychology.)
The indexing algorithm at this time is global; subdocumentary indexing is not done. H u m a n experts, however, can recall, for example, a relevant section of an otherwise irrelevant article. Subdocumentary indexing would appear to be of significant value to users, in much the same way that hypertext systems are. A number of related questions may prove of great interest. For example, if a large article or book is indexed by paragraph, page, chapter, and entire document, how do the locations compare? If each paragraph of an article is indexed, what is the shape of the curve traced by the successive locations? These and similar questions await further investigation.
REFERENCES Blair, D.C., & Maron, M.E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28, 289-299. Thompson, W.B., & Thompson, R.H. ( 1987). 13R:A new approach to the design of document retrievalsystems. Journal oftheAmerican Societyfor Information Science. 38(6), 389-404. De Salvo, D.A., Glamm, A.E., & Liebowitz, J. (1987). Structured design of an expert system prototype at the national archives. In B.G. Silverman (Ed.), Expert systems for business (40-77) Reading, MA: Addison Wesley. Ossorio, P.G. (1966). Classification space. Multivariate Behavioral Research, 1,479-524. Radecki, T. (1979). Fuzzy set theoretical approach to document retrieval. Information Processingand Management, 15( 5 ), 247259. Salton, G. Automatic text processing. Reading, MA: Addison Wesley.