Information
Processing & Management, Vol. 31, No. 2, pp. 161-171. 1995 Copyright 0 1995 Elsevier Science Ltd Printed in Great Britain. All rights reserved 03064573/95 $9.50 + .oo
0306-4573(94)00043-3
TERM RELEVANCE FEEDBACK AND MEDIATED DATABASE SEARCHING: IMPLICATIONS FOR INFORMATION RETRIEVAL PRACTICE AND SYSTEMS DESIGN
School
AMANDA SPINK of Library and Information Sciences, University of North P.O. Box 13796, Denton, TX 76203, U.S.A.
Texas,
(Received 1 February 1994; accepted in final form 13 June 1994)
into both the algorithmic and human approaches to information retrieval is required to improve information retrieval system design and database searching effectiveness. This study uses the human approach to examine the sources and effectiveness of search terms selected during mediated interactive information retrieval. The study focuses on determining the retrieval effectiveness of search terms identified by users Abstract -Research
and intermediaries from retrieved items during term relevance feedback. Results show that terms selected from particular database fields of retrieved items during term relevance feedback (TRF) were more effective than search terms from the intermediary, database thesauri or users’ domain knowledge during the interaction, but not as effective as terms from the users’ written question statements. Implications for the design and testing of automatic relevance feedback techniques that place greater emphasis on these sources and the practice of database searching are also discussed.
1. INTRODUCTION
The study reported in this paper developed a classification of search terms according to the source of the search term and a categorization based on the effectiveness of the search term in the retrieval of items judged relevant by the user. The effectiveness of search terms derived from different sources in retrieving relevant documents during mediated information retrieval (IR) was then compared. This paper focuses on the effectiveness of search terms derived during term relevance feedback or those search terms selected from the bibliographic database fields of the retrieved items during 40 mediated database searches. The process of search term selection is a major area within information retrieval research, which seeks to improve the effectiveness of interactive information retrieval. Within information retrieval this research can be divided into two major areas. First, research within the human approach to search term selection examines users’ representations of their question and whatever tools (e.g., thesauri) or domain knowledge they use to derive or modify a set of search terms during the database searching process. The human approach emphasizes the investigation and observation of human decision making, human behavior, and the variables in the human process of search term selection. The human search term selection process may involve the user and information retrieval system and, during a mediated situation with both a user and human intermediary present, the process may also involve a human intermediary and IR system, or all three. The database searching literature contains many book sections and papers on the selection of search terms. An overwhelming majority of these are descriptive or prescriptive in nature, with thoughtful discussions on the selection of search terms from retrieved items by Harter (1986), Blair (1990), and Walker and Janes (1993). A number of studies have also compared the retrieval performance of search terms classified as either text-words or A version
of this paper
was first presented
at the ACM SIGIR 161
conference
in Dublin
(Spink,
1994).
162
A. SPINK
descriptors (Blair & Maron, 1985; Cleverdon, 1962; Dubois, 1987; Keen, 1973; Lancaster, 1980; Parker, 1971). There have been few major research studies within the human approach to search term selection. In a large study of the selection of search terms, Fidel (1991a,b,c) analyzed 281 searches performed by 47 professional search intermediaries, and developed a decision tree model of the search term selection routine based on the searchers’ selection of textwords or descriptors. Her results show that searchers used 3,635 search terms during the 281 searches; 1,607 or 44% were descriptors and 2,028 or 56% were textwords (i.e., free text words). Bates et al. (1993) conducted an analysis of the search terminology used by 22 humanities scholars, and classified each of the 1,068 search terms into one of the following subject categories: works or publications as subject, individuals as subject, geographical name, chronological name, discipline name, other proper term, other common term, or uncertain classification. This categorization revealed the importance of proper or noncommon search terms and individual names for humanities scholars during database searching. No previous study has investigated the sources from which search terms are selected, the effectiveness of search terms based on their source, and whether search terms are selected from retrieved items during mediated database searching (referred to as term relevance feedback (TRF)) and the retrieval effectiveness of those TRF terms. In contrast to the human approach, extensive research within the algorithmic approach has investigated automatic and semi-automatic search term selection techniques for query expansion (Efthimiadis, 1992, 1993; Harper, 1980; Harman, 1992; Robertson, 1990; Sparck Jones, 1971; van Rijsbergen et al. 1981). Automatic query expansion techniques utilize the text of a user’s question and/or retrieved document/s found to be relevant by the user, as input for techniques to derive a set of search terms to retrieve additional relevant documents. The emphasis in automatic query expansion techniques or relevance feedback techniques is on formulating and testing algorithms and automatic techniques that select and weight search terms. Different search term selection strategies have been developed for different relevance feedback techniques. Since the early 196Os, researchers have developed different relevance feedback methods for automatic query expansion using the Vector Space Model (Rocchio, 1971; Ide, 1971; Salton & Buckley, 1990). The Vector Space Model “automatically reweights query terms by adding the weights from the actual occurrence of those query terms in the relevant documents, and subtracting the weights of those terms occurring in the nonrelevant documents (Harman, 1988, p. 3). An alternative probabilistic model proposed by Robertson and Sparck Jones (1976) is based on a term weight or the query term distribution in nonrelevant and relevant documents. Within this approach each retrieved document is ranked based on the addition of the term weights for terms matching the query terms appearing in the document (Harman, 1988). Many modifications to the Probabilistic Model have been developed (Croft & Harper, 1979; Harper & van Rijsbergen, 1978; Robertson, 1986; Wu & Salton, 1981), including Harper (1980), who found significant performance improvement by ranking a union of all terms from retrieved relevant documents. Harman (1988) found that multiple iterations and the automatic addition of 20 selected noncommon terms from relevant documents sorted by one of three different techniques was more effective than adding all terms from relevant retrieved documents. In a recent study Efthimiadis (1992, 1993) evaluated six algorithms for their effectiveness in ranking terms for query expansion, comparing their performance in term ranking for query expansion to rankings by users. Other automatic and semi-automatic techniques are reviewed in Efthimiadis (1993) and Harman (1992). To improve IR effectiveness further, research is needed in both the “algorithmic” and “human” streams, and further joint efforts utilizing the results of human studies in interface and system design (Belkin et al., 1992). The aim of this study is to contribute to our understanding of the human search term selection process and, particularly, term relevance feedback in order to provide guidelines for the development of improved information retrieval interfaces and database searching practice.
Term relevance
feedback
and mediated
2. RESEARCH
database
searching
163
QUESTIONS
This paper examines the following research questions: 1. Are term relevance feedback terms selected from retrieved items during mediated database searching? 2. How effective are TRF terms in retrieving relevant items, as compared to other sources from which search terms are selected? 3. What is the role of the user and intermediary in the selection of TRF terms during mediated information retrieval? 4. From which bibliographic database fields of retrieved items are TRF terms selected? The principal aim of this study was to investigate the human task of term relevance feedback to provide guidelines for more effective information retrieval system design and database searching practice. 3. RESEARCH
DESIGN
3.1 Data collection The analysis presented is part of a large human-oriented study of feedback and search terminology in information retrieval @pink, 1993a,b). Data analyzed was collected during a pioneering study of mediated information retrieval (Saracevic et al., 1991) in which 40 users with real requests were videotaped participating in a pre-online interview and interactive database searches, using the online database service DIALOG, conducted by one of four professional search intermediaries. Each of the four intermediaries specialized in the databases used during his or her respective mediated database searches, and had an average of 8.5 years’ experience as a professional search intermediary. The 40 user-intermediary dialogue transcripts, videotapes, and the online search logs with users’ relevance judgments were analyzed. Precision scores ranged from 8% to lOO%, with a mean of 57%. The study of term relevance feedback presented follows analysis reported previously (Spink, 1993a,b; Spink & Saracevic, 1992, 1993). 3.2 Methodology To determine if search terms are selected through term relevance feedback during mediated information retrieval, the source of each search term was first identified. The method utilized to identify the search terms and their sources has also been described in previous papers (Spink, 1992a; Spink & Saracevic, 1992, 1993) and summarized here. For each database search the set of search terms was extracted from the database search log. Using the user’s written question statement, the database search log, and the transcript of the discourse between the user and intermediary, the FIRST mention of the search term was then classified according to its source and categorized according to its retrieval of relevant or nonrelevant items. A more detailed methodology was developed to examine the effectiveness of TRF as a source of search terms when compared with other sources, including consideration of items retrieved by search statements of two or more search terms selected from different sources with logical operators. The methodology utilized a weighting scheme that distributed credit for retrieval in proportion to search terms in various logical combinations. Each item retrieved (relevant or nonrelevant) was assigned a weighting of 1. Each search term that contributed to the retrieval of that item was identified (matched in the search statement and the displayed relevant documents) and assigned a portion of the weighting of 1. If only one search term was responsible for the retrieval of the relevant item, that search term was assigned a retrieval weighting of 1; but, if more than one search term was responsible for the retrieval of an item, each search term was assigned a proportional retrieval weighting. For example, in the search statement “dog and cat,” since both search terms contributed to the retrieval of the relevant item they both received a proportional retrieval weight-
A.
164
SPINK
ing of .5. In the search statement “dog or cat,” if both search terms were in the item, each term was allocated .5; but if only one search term retrieved the item, that search term was allocated a weighting of 1. The proportional retrieval weighting was then calculated for each search term and then each search term source. Provided below is an example of the identification of a new term during TRF from Search Number 8. During the display of 15 retrieved items, the user identifies the term “school role” in the descriptor field, which is subsequently used as a search term. The user comments: User: Right. Exactly, uh hum. Let’s see we have school role, did you see that one. To determine the role of the user and intermediary in the additional micro-analysis was conducted utilizing each transcript logue in conjunction with the accompanying database search terms identified by the user or intermediary during the database database field from which each TRF term was identified was micro-analysis.
selection of TRF terms, an of a user-intermediary dialog, to determine the TRF search. The bibliographic also determined during the
4. RESULTS The results of the study are provided four research questions.
in the following
sections
corresponding
to the
Are term relevance feedback terms selected from retrieved items during mediated database searching? The 40 different from five sources:
mediated
database
searches used a total of 593 search terms selected
1. User question statement (QS): terms derived from the written statement submitted by the user. 2. User interaction (UI): terms suggested by the users during their interaction with the intermediary prior or during the database search from their domain knowledge. 3. Thesaurus (TH): terms derived from the thesaurus, when not suggested a priori by any other source (e.g., if a search term already appeared in the question statement and was later translated directly into a descriptor from a thesaurus, the thesaurus was NOT credited as a source, the question statement was.) 4. Term relevance feedback (TRF): terms extracted by the user or intermediary during the database search from the bibliographic database fields of the retrieved items. 5. Intermediary (IN): terms suggested by the intermediary prior to or during the database search from the intermediary’s domain knowledge. Term relevance feedback (TRF) or terms extracted by the user or intermediary during the database search from the items retrieved was one of five sources of search terms during the 40 mediated database searches. Table 1 shows the number of terms and percentage of the total search terms utilized from each source. The average number of search terms per search for the 40 searches was 14.8 with a maximum of 43, a minimum of 4, and a standard deviation of 8.77. Although not as large in number as search terms provided by the user, from the question statement or during userinteraction, TRF terms contributed a total of 67 (11 olo) search terms.
How many searches utilized TRF terms? TRF terms were small in total number, but how many database searches utilized these terms? Table 2 shows the number and percentage of searches that used each source to select search terms.
Term relevance
feedback
Table 1. Sources Source
and mediated
database
searching
of search terms (N searches
Number
of terms
165
= 40)
% of total terms
Cum.%
User a. Question statement b. User interaction Subtotal user
227 -134 361
38% 23% 61%
Thesaurus
113
19%
80%
67
11%
91%
52
9%
100%
593
100%
Term relevance
feedback
Intermediary Total
38W 61% 61%
TRF terms were small in total number and were utilized during only 22 (55%) of the forty mediated interactions. How effective were TRF terms in retrieving relevant items when compared with search terms from other sources? TRF terms were small in total number, but how effective were they in retrieving relevant items? The analysis of the retrieval effectiveness of each search term (as described in the methodology section) revealed four categories of search terms based on retrieval effectiveness: 1. T(R): search terms that retrieved relevant items only, and nothing but relevant items. 2. T(R + N): search terms that retrieved at times relevant and at other times not relevant items, that is, the retrieval of these search terms as to relevance was mixed. 3. T(N): search terms that retrieved only items judged not relevant by the user. 4. T(Z): search terms that had zero retrievals, that is, retrieved nothing at all. The first two categories of search terms, T(R) and T(R + N), represent the positive retrievals and were responsible for the retrieval of all relevant items, even though these included at times not relevant as well as relevant items. The categories T(N) and T(Z), or the negative retrievals, were responsible for not relevant only and zero retrievals. Table 3 gives the number and percentage of search terms based on their source in relation to the relevance of items they retrieved. As a number of search terms retrieved both relevant and nonrelevant items within the same search, the categories are not mutually exclusive. Table 3 shows that only a small percentage of total search terms (4%) retrieved only relevant items, and more than 60% were mixed retrievals, retrieving both relevant and not relevant items. Thirty-six percent of search terms retrieved only not relevant items or zero retrievals. How were TRF terms distributed among the four retrieval effectiveness categories?
Table 2. Number and percentage of searches used search terms from each source Source Question statement User interaction Term relevance feedback Intermediaries Thesaurus
that
No. of searches
% of searches
40 28 22 20 20
100% 70% 55% 50% 50%
166
A. SPINK Table 3. Number of search terms in each category of retrieval effectiveness (number searches = 40; total number of search terms = 593) No. of terms
%
Relevant items only, T(R) Relevant and not relevant, T(R + N) Subtotal with any R (positive retrieval)
25 j5J 378
4% 60% 64%
Not relevant only, T(N) Zero retrieval, T(Z) Subtotal T(N) + T(Z)
147 68 215
24% 12% 36%
593
100%
Items retrieved
(negative
retrieval)
Total
Table 4 shows the number and percentage of TRF terms in each retrieval effectiveness category. Although TRF terms were small in number a large proportion (72%) of these terms contributed to the retrieval of relevant items. What percentage of TRF terms retrieved relevant items and how did this compare with other sources? If a large proportion of TRF terms contributed to the retrieval of relevant items, how did this compare with the retrieval effectiveness of terms from other sources? Table 5 shows the number and percentage of terms from each source that contributed to the retrieval of relevant items. Table 5 shows that a large proportion of TRF terms were productive in the retrieval of relevant items, as 48 (70%) out of the 67 TRF terms contributed to the retrieval of items were judged relevant by the user. This percentage was less than question statement terms, but greater than thesaurus, user interaction, or intermediary terms. What was the mean percentage of relevant items per search that were retrieved by TRF terms? To further analyze the effectiveness of TRF terms, Table 6 shows the mean percentage of items per search retrieved by search terms from each source. Although TRF terms were used in 22 database searches (Table 2), as Table 6 shows they contributed to the retrieval of relevant items in only 14 database searches, retrieving a mean percentage of 9% of relevant items per search. This contrasts with question statement terms that contributed to the retrieval of relevant items in 37 database searches, retrieving a mean of 70% of relevant items per search. Were TRF terms selected more effectively by users or intermediaries? As Table 5 shows, a large proportion of TRF terms contributed to the retrieval of relevant items, but who selected the TRF terms: users or intermediaries? Table 7 shows the number and retrieval effectiveness of TRF terms selected by users and intermediaries during the 22 mediated searches where TRF terms were used.
Table 4. Number and percentage retrieval effectiveness Category
T(R)
Number
of TRF terms in each category
of TRF terms
% of TRF terms
2
3%
T(R + N) T(N) T(Z)
45 16 4
67% 24% 6%
Total
67
100%
Term relevance
feedback
and mediated
database
searching
167
Table 5. Percentage of search terms from each source that retrieved relevant items (number searches = 40) Source
No. relevant
Question statement User interaction
Total terms from source
186 - 12 258
Total for users Term relevance
terms
feedback
41
% of relevant
terms
227 134
82% 54%
361
Mean 71%
61
70%
Intermediaries
25
52
48%
Thesaurus
50
113
44%
Users derived a total of 25 (37%) of the total TRF terms, 15 (31%) of the 48 relevant TRF terms. Intermediaries derived a total of 42 (63%) of the TRF terms, and 33 (69%) of those terms contributed to the retrieval of relevant items. Despite the user’s greater domain knowledge, intermediaries were more effective in selecting TRF terms that contributed to the retrieval of relevant items. This result may be a function of intermediaries’ greater experience and training in utilizing the interactive capabilities of information retrieval systems. From which bibliographic database fields were TRF terms selected? If a large proportion of TRF terms contributed to the retrieval of relevant items, then from which bibliographic database fields were they selected? To answer this question an analysis of the bibliographic fields displayed during each search was conducted. Table 8 lists the fields included in the different DIALOG formats and the bibliographic field displayed by each format. Most searches utilized several formats, particularly format 3, format 5, and format 7. Table 9 shows the DIALOG formats utilized during the 40 database searches and the 22 searches that utilized term relevance feedback. Many of the searches utilized either format 3, format 5, or format 7 to display retrieved items. During the 40 mediated database searches, the title, author, journal name, and descriptor fields were generally displayed; 55% of the formats requested also displayed the abstract. Which bibliographic fields were displayed during the 40 database searches? Table 10 shows each database field and the percentage of items retrieved that included that database field during the 40 searches and then the 22 searches that utilized TRF. Table 10 shows that more than half the retrieved items were displayed using database fields including the abstract, but Table 11 indicates there was a low utilization of abstracts for the selection of new search terms. From which bibliographic database fields were TRF terms selected? Table 11 shows the fields from which TRF terms were selected and the retrieval effectiveness of those terms.
Table 6. Mean percentage of relevant items per search retrieved search terms from each source (number searches = 40) Source Question statement User interaction Thesaurus Intermediary Term relevance feedback
No. searches 37 26 19 19 14
by
Range (%)
Mean (070)
l-100 2-91 2-58 2-30 i-26
70% 22% 21% 14% 9%
A.
168
SPINK
Table 7. Selection of term relevance feedback search terms by users and intermediaries (number searches = 22) Total number TRF search terms 25
Users Intermediaries Total RF terms
42 67
%
Number relevant TRF search terms
37% 63%
15 33 48
100%
% 31% 69% 100%
Table 11 shows that 85% of the 67 TRF terms were selected from the title or descriptor field; 88% of the title selections and 62% of the descriptor selections contributed to the retrieval of relevant items. Although many of the searches included the display of the full abstract (55% relevant items displayed), only a small number (3%) of TRF terms were selected from the abstract field. The data indicates that users and intermediaries selected TRF terms from titles and descriptors, rather than abstracts. 5. SUMMARY
OF RESULTS
5.1 Selection of TRF search terms The majority (55%) of users and intermediaries utilized term relevance feedback to identify search terms during the 40 mediated database searches, but these terms represented only 11.3% of the total search terms. Thus, the total number of TRF terms (67) represented only a small proportion of the total number of search terms (593) when compared to other sources (less than user and thesaurus terms, but slightly more than intermediary terms). 5.2 Effectiveness of TRF search terms Due to their small number, TRF terms were not as numerically significant as other sources in retrieving relevant items. On average, TRF terms only contributed to the retrieval of 9% of relevant items per search for the 22 searches in which they were used. But additional analysis (Table 4) presented a different picture, showing that a high proportion of TRF terms were quite productive in retrieving relevant items. The proportion of TRF terms that contributed to the retrieval of relevant items was high (70%) -higher than the intermediary (480/o), thesaurus (44%) terms, 01 user interaction terms (54%), but lower than question statement terms (82%). 5.3 User and intermediary selection of TRF search terms Intermediaries identified the major proportion of the TRF terms that contributed to the retrieval of relevant items, with the users identifying about one third. Of the intermediaries’ TRF terms, 79% contributed to the retrieval of relevant items and, of the users’ TRF terms, 59% (15/26) contributed to the retrieval of relevant items. This indicates that intermediaries selected a greater number of TRF terms and selected more effective terms, and
Table 8. DIALOG formats and bibliographic fields displayed Format 2 3 5 6 7 8
Content-Bibliographic
fields displayed
Full bibliographic record except abstract Bibliographic citation only Full bibliographic record Title only Bibliographic citation and abstract Title and indexing
Term relevance
feedback
Table 9. DIALOG
Format
formats
and mediated
database
utilized (number
searching
searches
169
= 40)
No. of items 40 searches
%
461 1484 1631 222 1541 904
7% 24% 26% 4% 25% 15%
207 841 1333 261 837 485
S% 21% 34% 7% 21% 12%
6225
100%
3964
100%
2 3 5 6 7 8 Total
No. of items 22 TRF searches
070
end users should be encouraged to utilize the interactive capability of the IR systems and identify TRF terms to enhance query expansion and strategy reformulation. 5.4 Database fields The major proportion of the TRF terms were selected from the title and descriptor fields, and only 3% were selected from abstracts, despite the 55% of documents that included the display of an abstract. The data indicates that users and intermediaries did not utilize abstracts for TRF term selection, possibly due to the complex and lengthy nature of many of the abstracts. The selection of TRF terms from the title and descriptor field probably reflects the intermediaries’ training in using controlled vocabularies. The degree to which abstracts are utilized for the selection of search terms by end users is an area for further research. 6.
IMPLICATIONS
6.1 Database searching practice The overall results of this study have implications for the practice of database searching by end-users and intermediaries. It seems that users and intermediaries in the study were interactive to the extent that they frequently modified their search strategies (Spink, 1993a,b), but they selected relatively few search terms interactively through Term Relevance Feedback. When search terms were selected interactively, they were highly productive in contributing to the retrieval of relevant items. End-users should be trained to identify additional search terms from retrieved items for query expansion, and to reflect their domain knowledge in the construction of a written question statement that can be utilized and modified as a source of search terms during their interaction with an information retrieval system. This study also shows that the value of professional search intermediaries lies not in their search terms from their domain knowledge, but in: (a) identifying search terms from the user’s question statement, (b) facilitating the user’s identification of additional search
Table 10. Bibliographic fields and percentages (number searches = 40 & 22) Bibliographic
field
Title Author Source Abstracts Descriptors Identifiers Subject headings
% 40 searches 100% 87% 78% 54% 48% 48% 44%
Vo 22 TRF searches 100% 81% 91% 55% 51% 5I% 45%
A. SPINK
170
Table 11. TRF terms and database
Field name Title Journal name Abstract Descriptor Identifier Subject heading Total
No. of TRF terms selected
fields (number
%
searches
= 22)
No. of relevant TRF terms selected
%
18 3 2 39 2 3
27% 3% 3% 58% 3% 3%
16 2 1 25 2 2
33% 4% 2% 52% 4% 4%
67
100%
48
100%
terms from their domain knowledge during their verbal interaction, and (c) identifying effective search terms during term relevance feedback. Human search intermediaries should be trained to: (a) facilitate the identification of search terms for query expansion from the user’s domain knowledge, and (b) identify query expansion terms in the retrieved items and facilitate the user in this behavior. 6.2 Information retrieval systems design Relevant feedback techniques research has focused on the automatic selection of search terms for query expansion from retrieved items. Harman (1988) suggested that more effective automatic techniques or relevance feedback techniques are based on the selection of between 20-40 terms from the relevant retrieved documents. Results of the current study of human behavior suggest the need to test automatic relevance feedback techniques that select, or give more weight to terms from the title and descriptor fields and terms from the user’s domain knowledge. During interactive information retrieval, users makes notes and record ideas on paper. An interface could be designed that allows the user to enter a user information problem statement (UIPS) in natural language, preceding the initial search statement. This UIPS cou’l be continuously modified by the user interactively, reflecting changes in the user’s information problem and the identification of new search terms resulting from term relevance feedback or user interaction with a human intermediary, a thesaurus, or terms users derived from their own domain knowledge during the interaction. Terms identified by users in a written question statement and term relevance feedback during the interaction were found to be highly effective in the retrieval of relevant documents. The UIPS or working document could be utilized by the IR system (in an automatic process) or by the user (in a semi-automatic process) to identify search terms for query expansion. Results also highlight the need for IR interfaces that encourage users to utilize their own domain knowledge and examine the items retrieved to select search terms for query expansion. 7. FURTHER
RESEARCH
The process of term relevance feedback as an effective source of search terms is of significant interest in information retrieval research. Therefore, further research is currently in progress to examine: (a) the effectiveness of TRF terms during query expansion excluding the original search query, (b) the sources used to select terms prior to and during the database search, and (c) the utilization of TRF terms by end-users. The results of the enduser study will be compared with the results of this study of mediated database searching. Also, the surprisingly low utilization of abstracts for new search term selection indicates that further research is needed to examine the differences between search term selection in bibliographic and full-text databases. A series of articles is currently being completed by the author reporting a full analysis of the results of the large study relating to the sources, effectiveness, stages, and sequences of search term selection, and findings related to search outcome.
Term relevance
feedback
and mediated
database
searching
171
Acknowledgemenr-The author would like to acknowledge the contribution of Prof. Tefko Saracevic to the development of this paper and Zhwei Zhang for his contribution to the data analysis. The valuable comments and suggestions by the anonymous reviewers are also gratefully acknowledged.
REFERENCES Bates, M.J., Wilde, B.N., & Siegfried, S. (1993). An analysis of search terminology used by humanities scholars: The Getty Online Searching Project Report Number I. The Library Quarterly, 63(l), l-39. Belkin, N.J., Marchetti, P.G., & Cool, C. (1992). BRAQUE: Design of an interface to support user interaction in information retrieval. Information Processing & Management, 29(3), 325-344. Blair, D.C. (1990). Language and representation in information retrieval. New York: Elsevier Science Publishing. Blair, D.C., & Maron, M.E. (1985). An evaluation of retrieval effectiveness for a full-text document retrieval system. Communications of the ACM, 28, 289-299. Cleverdon, C. (1962). Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Cranfield, England: College of Aeronautics, Aslib Cranfield Research Project. Croft, W.B., & Harper, D.J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4), 285-295. Dubois, C.P.R. (1987). Free text vs. controlled vocabulary: A reassessment. Online Review, II, 243-253. Efthimiadis, E.N. (1992). Interactive query expansion and relevance feedback for document retrieval systems. Unpublished Ph.D. dissertation, City University, London, U.K. Efthimiadis, E.N. (1993). A user-centered evaluation of ranking algorithms for interactive query expansion. fro-
ceedings of the 16th Annual International ACM SIGIR Conference of Research and Development in Information Retrieval, 16, 146-159. Fidel, R. (1991a). Searchers’ selection of search keys: I. The selection routine. Journal of the American Society for Information Science, 42(7), 490-500. Fidel, R. (1991b). Searchers’ selection of search keys: II. Controlled vocabulary or free-text searching. Journal of the American Society for Information Science, 42(7), 501-514. Fidel, R. (1991~). Searchers’ selection of search keys: II. Searching styles. Journal of the American Society for Information Science, 42(7), 515-527. Harman, D. (1988). Towards interactive query expansion. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Grenoble, France. Harman, D. (1992). Relevance feedback revisited. Proceedings of the 15th International Conference on Research and Development in IR, SIGIR 1992, Copenhagen, Denmark, June 21-24, 1992. N. Belkin, P. Ingwersen, & A.M. Pejtersen (Eds.), pp. I-10. Harper, D. J. (1980). Relevance feedback in document retrieval systems: An evaluation of probabilistic strategies. Unpublished Ph.D. dissertation, Jesus College, Cambridge, England. Harper, D.J., & van Rijsbergen, C.J. (1978). An evaluation of feedback in document retrieval: Using co-occurrence data. Journal of Documentation, 34(3), 189-216. Harter, S.P. (1986). Online information retrieval: Concepts, principles and techniques. New York: Academic Press. Ide, E. (1971). New experiments in relevance feedback. In G. Salton (Ed.), The SMARTretrievalsystem (pp. 337-354). Englewood Cliffs, N.J.: Prentice-Hall. Keen, E.M. (1973). The Aberystwyth Index Language Test. Journal of Documentation, 29, l-35. Lancaster, F.W. (1980). Trends in subject indexing from 1957 to 2000. In P.J. Taylor (Ed.), New trends in documentation and information. London: Aslib. Parker, J.E. (1971). Preliminary assessment of the comparative efficiencies of an SD1 system using controlled vocabulary or natural language for retrieval, Program, 5, 26-34. Robertson, S.E. (1986). On relevance weight estimation and query expansion. Journal of Documentation, 42(3),
182-188. S.E. (1990). On term selection for query expansion. Journal of Documentation, 46(4), 359-364. S.E., C Sparck Jones, K. (1976). Relevance weighting of search terms. Journalof the American Society for Information Science, 27(3), 129-146. Rocchio, J.J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMARTretrievalsystern (pp. 313-323). Englewood Cliffs, N.J.; Prentice-Hall. Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the Amer-
Robertson, Robertson,
ican Society for Information Science, 41(4), 288-297. Saracevic, T., Mokros, H., Su, L., & Spink, A. (1991). Interaction between users and intermediaries in online searching. Proceedings of the 12th Annual National Online Meeting, 12, 329-341. Sparck Jones, K. (Ed.) (1971). Automatic keyword classification for information retrieval. London: Butterworths. Spink, A. (1993a). Feedback in information retrieval. Unpublished Ph.D. dissertation. Rutgers University, School of Communication, Information and Library Studies. Spink, A. (1993b). Interaction with information retrieval systems: Reflections on feedback. Proceedings of the 56th Annual Meeting of the American Society for Information Science, 30, 115- 12 1. Spink, A. (1994). Term relevance feedback and query expansion: Relation to design. Proceedings of the 17th Inter-
national Conference on Research and Development in Information Retrieval, Dublin, Ireland. Spink, A., & Saracevic,
T. (1993). Dynamics
of search term selection during
mediated
online searching.
Proceed-
ings of the 56th Annual Meeting of the American Society for Information Science, 30, 63-72. A., & Saracevic, T. (1992). Sources and uses of search terminology in mediated online searching. Proceedings of the 55th Annual Meeting of the American Society for Information Science, 29, 249-255. van Rijsbergen, C.J., Harper, D.J., & Porter, M.F. (1981). The selection of good search terms. Information Processing & Management, 17(2), 77-91. Walker, G., & Janes, J. (1993). Online retrieval: A dialogue of theory andpractice. Englewood, CO: Libraries
Spink,
Unlimited. Wu, H., & Salton,
G. (1981). The estimation
umentation, 37(4), 194-214.
of term relevance
weights using relevance feedback.
Journalof Doc-