Library & Information Science Research 30 (2008) 86–93
Contents lists available at ScienceDirect
Library & Information Science Research
Exploring connections between information retrieval systems and information literacy standards Laura Saunders Adjunct Faculty and Ph.D. Student, Simmons College Graduate School of Library and Information Science, 300 The Fenway, Boston, MA 02115, USA
a r t i c l e
i n f o
a b s t r a c t Information literacy involves the location and access of information through retrieval systems, and many information retrieval systems are designed with specific capabilities to support these very skills. In some cases, system enhancements go beyond simple support and alleviate some of the searcher's responsibilities by performing certain tasks for them. By comparing information retrieval functions to the Association of College and Research Libraries' information literacy standards, this article investigates the extent of support that these enhanced systems can offer, and gives librarians greater insight into how these design enhancements could have an impact on information literacy instruction. © 2008 Elsevier Inc. All rights reserved.
One of the primary purposes of information retrieval systems is to locate the most relevant documents in a collection for a particular query. The success of a search is often evaluated in terms of recall (the number of relevant documents retrieved in comparison to the overall number of relevant documents in the collection) and precision (how closely the retrieved documents relate to the user's information need). The ability to navigate information retrieval systems successfully, including locating, evaluating, and using information, is included in the Association of College and Research Libraries' (ACRL) definition of information literacy (Association of College and Research Libraries, 2007). Both information literacy and information retrieval focus on efficiently and effectively locating and accessing information. Information literacy tends to emphasize the searcher's knowledge and capabilities, while information retrieval focuses on design functions that can support or perhaps even replace the searcher's skills. In traditional information retrieval systems, the success of a search depends on two things: the search terms, or query, entered by the user and the system's ability to match and retrieve documents. As straightforward as the research process might seem, users may face frustrating challenges (e.g., lack of skill in choosing terms or using search functions and the ambiguities inherent in language). In an attempt to alleviate or possibly eliminate some of these burdens, certain information retrieval systems include design enhancements meant to improve the precision and recall of the user's search results by expanding or refining the initial query. A number of methods exist by which the system can match the user's initial query with related terms or concepts and thereby expand an otherwise narrow search. Other systems refine results through relevance feedback. This feedback can range from a list
E-mail address:
[email protected]. 0740-8188/$ – see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.lisr.2007.10.003
of alternate terms from which the user can select to broaden or narrow the query to more subtle ways of interpreting the user's interaction with the retrieved documents. Regardless of how the systems expand or refine queries, nearly all are designed with some method of ranking the retrieved documents by relevance. The possible strategies for relevance ranking are vast— from extrapolating information from the documents marked by the user to calculating the amount of screen time users spend reviewing individual documents. Once the potential relevance of each retrieved document has been calculated, the results are presented to the user in an ordered list of descending relevance. Whatever the design, the ideas and methods behind these systems have potential connections to information literacy. This is especially true for the definition given by the ACRL (Association of College and Research Libraries, 2007), which offers the most widely accepted definition in the field of library and information science. In its Standards for Higher Education, ACRL sees the central aspects of information literacy as six competencies: the abilities to 1. 2. 3. 4. 5. 6.
determine the extent of information needed access information efficiently and effectively evaluate information incorporate new information into the knowledge base use information to accomplish a specific task understand the legal, social, economic, and ethical implications of information (Association of College and Research Libraries, 2007).
For each of these competencies, ACRL offers additional performance indicators and objectives that expand the original definition and give more specific examples of the abilities expected of information literate individuals. Of these six competencies, the last three deal with the cognitive abilities of the searcher and occur after the information has been located and evaluated (in other words, after the process of information retrieval has been completed). The first three
L. Saunders / Library & Information Science Research 30 (2008) 86–93
components are directly related to the process of information retrieval. As such, they can be particularly affected by the various methods of query formulation and expansion, with their attendant emphasis on relevance feedback and ranking. Whether interactive or automatic in their deployment, these system enhancements focus on improving or even potentially replacing the first three aspects of information literacy for the user. 1. Problem statement Information literacy competencies include a user's information retrieval abilities. Likewise, information retrieval systems are designed to help users find desired information. Despite the potential connections between these fields, no study has analyzed information retrieval systems within the context of ACRL's definition of information literacy to see how well the systems support those competencies. This study addresses that deficiency by exploring the following questions: • Do existing information retrieval systems, as they are described in the literature, support specific parts of the ACRL information literacy framework? • If so, can aspects of these systems be mapped to individual facets of that framework? • In particular, do system enhancements such as query expansion, relevance feedback, and ranking address skills and competencies that are traditionally seen as the user's responsibility? • What impact might this system support have on information literacy instruction? By directly comparing information literacy standards with information retrieval systems, this research adds to the current, extremely small body of literature. The results of this study contribute to the knowledge of system development and design from a user and user instruction perspective. In addition, much of the writing on information literacy focuses on searching skills and pays scant attention to the design context of information retrieval systems. Thus, this research informs instruction practices in information literacy programs by allowing librarians to understand better and exploit those functions that directly support the information literacy competencies. This research may also extend instruction into typically underserved areas of information literacy—most allocated instruction time tends to focus on immediately needed retrieval competencies. 2. Literature review Despite commonalities in searching techniques and refinements in information literacy and information retrieval, the vast majority of literature in the one rarely references the other. Gordon (2002) acknowledged this split, claiming that the two disciplines have “grown up in parallel worlds, although they share a theoretical foundation.” Articles on information literacy focus largely on particular search skills, such as Boolean logic. However, they typically present these skills independent of the systems on which they are employed. Rather than relating skills to systems, these articles mainly focus on the general outcomes expected of information literate students. The literature also includes case studies that detail a specific instruction program tailored to a certain database or collection (e.g., Cheng, 2003; Larkin & Pines, 2005). Going one step further, Grafstein (2007) separated information literacy from technology literacy. She maintained that over-associating the two undermines the cognitive aspects of information literacy by over-emphasizing the skills related to locating and accessing information. The writings in information retrieval, on the other hand, typically describe the models behind or the workings of particular information retrieval systems. These articles investigate the system's capacity for retrieving superior results. This is ostensibly for the user's benefit, but many largely ignore the user's role in the process. People state information needs verbally as questions every day. Research indicates that
87
people are better at articulating their needs in natural language. However, Murdock, Kelly, Croft, Belkin, and Yuan (2007) pointed out that the vast majority of search engines rely on keyword queries and Boolean logic. This structure does not account for the user's strengths or familiarities. Losee (2007) described the use and purposes of thesauri and ontologies and acknowledged that the information provided by these products is meant to be used by searchers. He asserted that decisions about thesaurus production criteria should be based on expected user performance. In discussing the limited use of interactive query expansion outside of experimental settings, White and Marchionini (2007) suggested that the functions needed for this type of searching are often not a part of searcher's established information-seeking behavior. They argued that query expansion options seldom are offered at the user's point of need: when formulating an initial query. These authors described a method of real-time query expansion that attempts to correct for these problems by providing users with additional search terms as they enter their initial query. This enhancement led to improved queries. These authors put the user at the center of the equation, recommending system development based on user preferences and performance. At the opposite extreme, Abdelali, Cowie, and Soliman (2007) argued for removing the user from the equation by employing automatic query expansion methods. They pointed out that searchers often use short, imprecise terms when entering a query and perform poorly with interactive query refinement. Only a handful of articles reference one topic from the point of view of the other. Lucas and Topi (2004), as well as Shiri and Revie (2005), conducted system-specific research with implications for information literacy. Shiri and Revie investigated the effects of thesaurus-enhanced searching on a group of faculty and postgraduate students, specifically exploring how domain knowledge affected the users' interaction with the thesaurus. Although their research focused on the system's usability and its influence on search results, the findings also revealed some interesting gaps in the users' search skills. Specifically, although the searchers appreciated the assistance of the thesaurus in broadening or narrowing their search or for suggesting alternative search terms, they had difficulty understanding how to integrate Boolean operators effectively into their search. This revealed that “they were not sure about the difference between the AND and OR operators” (p. 652). The authors concluded that further instruction in these search strategies would be particularly helpful in increasing search success. Lucas and Topi (2004) found that when searchers interacted with a simple interface, even minimal training in Boolean logic improved search results. However, such training had a negative effect when searchers used an assisted interface that offered specific support in formulating Boolean queries. Overall, users of the assisted interface achieved more relevant results than those using the simple interface. Both of these studies investigated the overlap between information retrieval and information literacy to some extent. However, Lucas and Topi pointed out that “it is not the search technology that is under investigation here but rather the users' interactions with that technology” (p. 1183). MacPherson (2004) made a more pointed connection between information literacy and information retrieval. She claimed that the two do not form a hierarchical relationship. Instead, there is “an interaction between the two processes such that information retrieval draws necessarily on those aspects of information literacy–including critical thinking and analysis–that enable the process of retrieval to be effective” (p. 230). She emphasized that information literacy is not system-dependent; it should be transferable among various systems and platforms. Her framework for teaching information literacy is based in part on her research into information retrieval systems. The first module of her framework focuses on locating information. It has sections for analyzing the question and developing a search strategy that specifically reference a neural network model of information retrieval. Thus, unlike most of the other researchers, MacPherson links information literacy competencies with particular types of information
88
L. Saunders / Library & Information Science Research 30 (2008) 86–93
retrieval systems. Nevertheless, her focus is still largely on information literacy, especially on the cognitive processes that take place once the search is completed. The information literacy skills and knowledge described in the literature are largely derived from the ACRL definition. In addition to the six overarching competencies of information literacy, ACRL identified specific performance indicators and outcomes that further define the competencies. These indicators offer specific actions or abilities that compose information literacy. They also guide faculty and librarians in assessing information literacy in their students. The first three competencies in the ACRL definition, along with their performance indicators and outcomes, stress the information literate person's ability to locate and evaluate information sources. These competencies focus on skills such as identifying keywords, employing search functions such as Boolean operators, and assessing the quality of the result set. The information systems designed to support these information literacy skills may offer different features or search capabilities. In general, they fall into one of several basic categories outlined in a taxonomy developed by Baeza-Yates and Ribeiro-Neto (1999). The taxonomy highlights the major systems including those predicated on the Boolean, or binary, systems and others (e.g., vector and probabilistic, which use special algorithms). The taxonomy (see Fig. 1) only names the systems, but Baeza-Yates and Ribeiro-Nero described each model in detail. They revealed that each model assists the searcher with retrieval tasks: refining searches, employing a variety of methods to match queries to documents for increased recall without loss of precision, and making relevance judgments in displaying results to the user. 3. Procedures This study relied on concept mapping: exploring and analyzing the interrelationships between groups of ideas and then presenting those interrelationships in the form of a picture or map, “creating a visual, geographic representation of the topic of interest” (Kane and Trochim, 2006). The author analyzed articles containing descriptions of query expansion, relevance feedback, or relevance ranking methods to de-
termine whether and how the search systems implemented designs that supported ACRL's information literacy competencies. Any system capabilities or design measures of information retrieval systems that enhanced or improved the retrieval process were then mapped to the relevant sections of the information literacy framework provided by ACRL. Review articles were retrieved from the Library and Information Science Abstracts (LISA) database. This database's broad scope includes major journals from both the information literacy and the information retrieval disciplines. The search (kw: query expan⁎ OR kw: relevance feedback OR de: relevance ranking) AND yr: 2004–2007 yielded 58 articles. After eliminating non-English articles, the result was 53 documents. In order to verify this set, a professional reference librarian tested the search string and duplicated the results. Upon further review, certain articles were found to have a scope beyond the purposes of this paper. For instance, many articles focused on systems designed specifically for cross-language retrieval. Others used the World Wide Web as their primary information retrieval system. Once these articles were removed from the analysis, the remaining 26 articles were reviewed and analyzed. Retrieval system capabilities were matched to information literacy skills and knowledge.
4. Findings The concept map in Fig. 2 identifies the individual systems or system features and links them to the specific competencies, performance indicators, or outcomes from the ACRL standards that they support. Several information retrieval systems or system design features (listed in Fig. 1) relate closely to the first three information literacy competencies, especially those emphasizing the ability to plan and implement a successful search. Nearly all of the systems reviewed included some method for improving upon the user's search by refining or expanding the initial query. Fig. 2 offers a visual overview of the overlap between the two disciplines.
Fig. 1. A taxonomy of information retrieval models.
L. Saunders / Library & Information Science Research 30 (2008) 86–93
89
Fig. 2. Information literacy and information retrieval concept map.
Six of the systems (23.1%) offered thesaurus-enhanced searching, whereby users' search terms are mapped to a controlled vocabulary. Each system then presents users with broader, narrower, and alternative terms with which to continue their search. In addition to improving recall by offering additional search terms, this design feature also tends to improve precision. The terms come from the controlled vocabulary specific to each system, and suggestions for narrowing or focusing the search help give more exact results. Four of the 26 articles (15.3%) described systems that employ clustering, or concept-searching (Berrocal, Alonso, Figuerola, & Zazo, 2005; Rajapakse & Denham, 2006; Rooney, Patterson, Galushka, & Dobrynin, 2006; Tudhope, Binding, Blocks, & Cunliffe, 2006). Such systems use a variety of methods to compute how terms are related semantically. They then expand queries automatically using words or concepts closely related to the original terms. In some cases, vector space models engage in a type of clustering known as latent semantic indexing. Here, the relationship among concepts is inferred from angles among eigenvectors calculated from term frequencies within a document. Another method of concept-searching uses similarity thesauri, which link sy-
nonymous terms within the controlled vocabulary. In essence, these systems group articles on the same topic, even if those articles do not necessarily use the same terminology within their texts. By increasing document recall through retrieving related items, each of these query expansion methods makes searches more effective with minimal effort on the user's part. This directly relates to the second competency of the ACRL definition of information literacy: the ability to access information efficiently and effectively. According to the second competency, individuals should also be able to employ appropriate search operators, including Boolean and proximity. Two of the 26 systems (7.7%) use algorithms to generate Boolean queries automatically from search terms selected by the user. In each, searchers choose terms suggested by the system (Choi, Kim, & Raghavan, 2006; Cordón, Herrera-Viedma, & Luque, 2006). Some terms are synonymous and meant to broaden the initial search, while others elaborate on facets of the users' expressed need. Once the users make their selections, the systems employ the appropriate “AND” or “OR” between search terms. This relieves users of needing to understand the difference between the two operators and implement them properly.
90
L. Saunders / Library & Information Science Research 30 (2008) 86–93
Relevance feedback, a process by which an information retrieval system uses implicit or explicit feedback from the user to refine a search and/ or re-rank the retrieval set, appeared in many of the systems reviewed (e.g., Jones, MacFarlane, Sormunen, and Vakkari 2004; Vechtomova and Karamuftuoglu (2006). It directly supports several of the information literacy competencies. In the fourth performance indicator (Competency Two), ACRL indicates that an information literate person should revise and refine his search strategy as necessary. By using information from the initial search query and retrieval set to construct a refined query, systems that employ relevance feedback essentially are doing just that. Moreover, these systems typically present users with retrieval sets that are sorted by relevance. As they collect feedback, the systems re-rank documents to reflect the user's needs better. This supports another outcome of the second competency: searchers must evaluate the relevance of information collected. As previously stated, incorporating new information into the user's existing knowledge base, as described in the third competency, involves cognitive processes that would imply some human interaction with the information. However, two articles described systems that could emulate this learning, remembering, and decision-making process (Choi et al., 2006; Cordón et al., 2006). Each of these systems uses neural networks, which are loosely modeled on the structure of the brain. Index and query terms create interconnected nodes almost like tree branches. When a search is input, the system follows a particular pathway through the nodes to retrieve documents (Abdi, Valentin, & Edelman, 1999). Generally, the system follows the paths that seem to match the user's query best. However, each system can incorporate information gathered from relevance feedback to refine the process. This information is “remembered” from session to session, so overall search capabilities can improve over time. Indeed, Abdi et al. (1999) maintained that the goal of neural network systems is to analyze input and output in order to discover patterns or associations, a process they called “learning.” Although information retrieval systems may be limited as to how far they can interpret new information, their ability to make changes based on past experiences certainly has implications for improving user searches. 5. Discussion This research reveals apparent connections between some of the functions of information retrieval systems and information literacy competencies. Nevertheless, some questions remain: • How well do these systems function to support users in their searches? • How does this support ultimately affect information literacy instruction? Below is a more in-depth examination of each of these areas. 5.1. Competency Two, performance indicator two (access of information) The second competency in the ACRL definition of information literacy states that an information literate person can access information efficiently and effectively. Performance indicators and outcomes suggest that such a person can identify keywords, synonyms, and related terms with which to search (Association of College and Research Libraries, 2007). Typically, searchers begin their search process using natural language to describe their information need. The system takes this query, matches it to terms in documents, and retrieves relevant items. Often, however, the words searchers choose to describe their information need are different than those used in titles, text, or indexes. Relevant items might be missed due to this discrepancy, known as “term mismatch.” To overcome this problem, searchers are encouraged to expand their initial queries by using synonyms or other terms that relate to their information needs. The set theoretic and algebraic systems from Fig. 1 are often designed with search features that directly support these particular tasks.
The classic Boolean model assigns binary weights to query and index terms. Terms either match or they do not; if they do not, the document is not retrieved. Other systems make allowances for partial matches that can be aligned with synonym or related term searching. In the vector model, for instance, documents that match a general description are gathered into a cluster, even if the terms used within those documents are not exactly the same. This allows for partial matches. The terms used by these information retrieval systems to expand queries may be selected through a “goodness of fit” match. That is to say, some systems are designed to match original query terms to synonyms or other conceptually related words (e.g., Shiri and Revie, 2005; Tudhope et al., 2006; Berrocal et al., 2005). To create these clusters, the system computes the degree of similarity between the query and document terms, retrieving only those above a predetermined cut-off point. It displays the results to the user in descending order of similarity. Fuzzy set systems and other types of conceptual searching function in a similar manner, employing a type of “semantic expansion [that] spreads from a given concept over the semantic network to yield a neighborhood of concepts considered semantically close for retrieval purposes” (Tudhope et al., 2006, p. 510). Macfarlane, Bhogal, and Smith (2007) offered an extensive review of possible methods of query expansion. They focused on designs that consider the context, or words surrounding individual terms, rather than just matching the terms themselves. As the authors noted, the surrounding context often clarifies the meaning of words. It thus helps to increase relevance by resolving some of the ambiguities of language. Similarly, Abdelali et al. (2007) and Vechtomova and Karamuftuoglu (2007) each described a process that considers the context of the search and document words in order to clarify meaning. Allowing for partial matching helps to avoid term mismatch and to improve recall because “a concept search will return documents that relate to the same concept as the query word, irrespective of the specific word chosen by the user and the specific words in the document” (Tudhope et al., 2006, p. 509). Although an information literate person is expected to use the thesaurus to find and select search terms, some systems' thesauri automatically identify appropriate query expansion terms. These thesauri may be built around hierarchical relationships of words, meaning that they can offer broader and narrower terms to expand or refine a search as appropriate. Further, thesauri can also be designed to find semantically related concepts, which is similar to an information literate person finding synonyms and related concepts with which to search. Tudhope et al. (2006) and Berrocal et al. (2005) focused on the ability of thesauri to alleviate term mismatch problems through “the integration of semantic closeness in the matching function” (Tudhope et al., p. 530). This allows for the retrieval of documents related to the query that do not employ the same terminology. Tudhope et al. described a system which relies on a semantic closeness algorithm. The algorithm takes into account the number of steps between concepts in order to determine how closely related, or relevant, each is to the initial query. Berrocal et al. constructed a similarity thesauri that they claimed “makes it possible to expand the complete query (query concept), and not only each individual term separately” (p. 1166). While systems like these focus on improving recall by expanding the query with related terms, many searchers lack the knowledge to choose appropriate terms from the highly specialized vocabulary of the discipline. This is especially true for novices in a technical field, such as medicine or law. Phillips-Wren and Forgionne (2006) described a model that assists the user in choosing terminology. In this system, users select from checkboxes of terms and categories. In some cases, users see terms that appear more like natural language. Behind the scenes, the terms are mapped to the Medical Subject Headings (MeSH), a very specialized and technical controlled vocabulary with which a lay person is unlikely to be familiar. The search engine is designed to use the underlying MeSH term that relates to the searcher's natural language words. In this way, searchers can retrieve
L. Saunders / Library & Information Science Research 30 (2008) 86–93
more precise results, even if they are not familiar with the field's technical jargon. These systems perform many of the search functions expected of an information literate person: they automatically expand queries with related concepts or refine queries by mapping to controlled vocabulary. In addition to specifying that searchers should choose appropriate query terms, the second competency also says that information literate searchers should employ appropriate search tools, such as Boolean operators. Shiri and Revie (2005) found that subject specialists often lacked this skill, prompting them to suggest that training in Boolean logic may be necessary to further improve searching. Choi et al. (2006) addressed this issue within an extended Boolean retrieval context using clustering methods. The system they described depends on relevance feedback, through which the search engine derives expansion terms from documents chosen by the searcher. These terms are then weighted according to their prevalence in relevant and non-relevant documents: highly prevalent terms are automatically “OR-ed” together, terms of middle weight are “AND-ed,” and terms with lowest prevalence are “NOT-ed.” The system thus creates new queries with which to expand the initial search and automatically constructs a Boolean search with them. Similarly, Cordón et al. (2006) implemented an Inductive Query by Example (IQBE) paradigm that also uses “genetic programming to build Boolean queries for text retrieval through relevance feedback” (p. 615). In both cases, the system is designed to alleviate the searcher's burden of creating proper Boolean searches. This directly correlates to the second competency of the ACRL definition. 5.2. Competency Two, performance indicator four (evaluation of information) In the fourth performance indicator of the second competency, ACRL (Association of College and Research Libraries, 2007) stressed the importance of evaluating information for its reliability and validity. Evaluating information touches upon a subset of the second competency, which states that the information literate person “assesses the quantity, quality and relevance of results.” Information retrieval systems are often designed to evaluate information automatically on the basis of relevance, which is reflected in their display of documents to the searcher. Many ranking methods exist, from Web search engines that order lists according to the perceived popularity of a particular site to systems that employ complex software enhancements to track user interactions with a document, including the amount of time a user spends viewing a particular item. The classic Boolean model, with its binary weighting system, does not allow for partial matches. Therefore, it does not rank the retrieval set because it does not distinguish between degrees of relevance. The vector space model and most of the probabilistic systems, on the other hand, allow for partial matches. They compute the degree of similarity between a query and a document based on several measures, including inverse document frequency. This gives more weight to a query term that appears often in a single document but rarely throughout the whole database. The system also considers document length so that longer documents do not necessarily rank higher than shorter ones. Desai and Spink (2005) acknowledged that locating and evaluating relevant documents can be difficult for the user if the documents are not ranked, because potentially useful articles are scattered throughout the retrieval set. They proposed a clustering algorithm that groups documents in the retrieval set according to their relevance or nonrelevance in relation to a given search. Jang, Park, and Ra (2005) suggested using sentence–query similarity as the basis for ranking documents, meaning that documents whose representative sentences best match query terms would receive higher rank. Khoo and Wan (2004) employed a relevancy-ranking algorithm that uses seven different criteria to determine the relevancy of retrieved documents. These criteria include breadth of match, which takes into account the number of matches between query words and
91
document keywords. Thus, if a three-word query is entered, a document with all three words appearing should rank higher than a document with just two of the words. Another criterion, section weighting, gives greater weight to words that appear in certain parts of the record, such as the title. The proximity of query words criterion means that documents in which query words appear closer together are judged to be more relevant. Some systems will also consider variant word forms, but they typically give more weight to exact matches. Although these systems rank the initial retrieval set based solely on the initial query, many also employ relevance feedback to re-rank documents to reflect the user's needs (e.g., Choi et al. (2006); Khan and Khor (2004); Rooney et al. (2006); Sihvonen and Vakkari (2004)). Such systems employ a variety of methods to gather feedback from the user as to the relevance of the initially retrieved documents. They then use this information to reformulate the initial query and re-rank the documents. The relevance feedback from the user can be gathered directly and interactively, such as asking the user to mark documents (Sihvonen and Vakkari, 2004). Alternatively, systems may gather relevance feedback automatically based on user actions such as length of time spent viewing a document. Khan and Khor (2004) described an algorithm that analyzes initial retrieval sets of Web documents and extracts key phrases to reformulate queries without user feedback. Jung, Herlocker, and Webster (2007) detailed a process by which click data, or the sequence of pages and links a user follows throughout a Web search, can be used as a form of implicit feedback to evaluate the relevance of result sets. Interestingly, the authors indicated that such click data can be used to improve current and future searches. The search engine can “remember” the data, allowing it to group more relevant pages together when similar queries are entered later. This sort of learning function also relates to Competency Three of the information literacy framework–incorporating new knowledge–as described below. 5.3. Competency Three: performance indicator four (incorporating new knowledge) One section of the third competency of the information literacy definition indicates that information literate people can “incorporate selected information into one's knowledge base” (Association of College and Research Libraries, 2007) and draw conclusions based on the new knowledge. Interestingly, relevance feedback can also be used as a type of “learning” for the information retrieval system, allowing the system to make predictions about the user's future behavior. For instance, certain systems are designed to sort and file the user's incoming email automatically, based on decisions the user made about previous emails. If the system makes incorrect assumptions, it questions the user in order to adjust its methods. Other systems create user profiles, or outlines of the user's needs and interests based on the user's Web searching patterns. These systems annotate links after new searches based on whether or not the site is predicted to be relevant to the user. In some cases, such systems can be set to search and update users on their interests based on their profile. Similarly, these systems may offer suggestions of links to follow if the user is unsure how to proceed with a search. Cordón et al. (2006) and Choi et al. (2006) described extended Boolean systems incorporating a learning mechanism that essentially allows the system to incorporate new knowledge into its database. These systems are based on neural network models in which the system “remembers” the results of previous searches, including the re-formulations and re-rankings. The system can update its network of index terms and connections between terms in order to improve future predictions and return more relevant results. The decision support system for navigating a medical database presented by Phillips-Wren and Forgionne (2006) is an example of such a learning system. Initially, this model offers the user checklists
92
L. Saunders / Library & Information Science Research 30 (2008) 86–93
of search terms that are mapped to the controlled vocabulary. Once the user begins choosing terms, however, the system tracks the user's preferences and forecasts results for different combinations of preferred terms. Further, the system will also recommend actions to the user by offering a prioritized list of search terms. It will then weight and rank results based on the user's profile rather than just query terms. 5.4. Implications for instruction Despite the fact that the information retrieval literature almost never references information literacy, this concept mapping study demonstrates that certain system functions align with specific expectations of the information literacy standards put forth by ACRL. Those functions can be mapped to specific parts of the ACRL standards. The literature shows that many systems are designed to identify synonyms and related concepts with which to expand or refine searches. Some are even capable of automatically combining search terms with the proper Boolean operators in order to generate multiterm searches. Furthermore, many of the systems reviewed here rank initial retrieval lists based on relevancy predictions generated from the initial query. Some can incorporate user feedback to re-rank subsequent retrieval sets and make ever more accurate predictions of relevancy. Some systems are even capable of “learning” new bits of information, such as potential links between semantic concepts, and using that new knowledge to improve searching and ranking capabilities. Each of these capabilities directly supports a part of the ACRL information literacy standards, as shown in Fig. 2. In some cases the systems perform these functions automatically, relieving the user of the need to understand and implement these skills. Perhaps more importantly, the majority of studies conclude that the information retrieval systems using relevance feedback or thesaurus-enhanced searching perform at least as well as or better than the baseline. This suggests that these systems are indeed able to support certain information literacy tasks. The ability of these information retrieval systems to support and even improve search tasks for users suggests some intriguing implications for instruction librarians. After all, the areas of information literacy standards that align with these information retrieval systems are typically skills-based competencies such as identifying keywords or formulating proper searches. Other parts of the standards, however, involve more cognitive processes and critical thinking. Indeed, with an emphasis on integrating information into a knowledge base and using information responsibly and effectively, certain information literacy standards align very well with higher-order concepts outlined in Bloom's taxonomy (Clark, 2007). According to the taxonomy, there are six categories of learning: knowledge, comprehension, application, analysis, synthesis and evaluation. Of these, analyzing, synthesizing, and evaluating information are considered higher-order categories because they involve comprehending and applying knowledge, rather than just recalling it. Unfortunately, after teaching the skills of accessing information, instruction librarians often have little time left to devote to the other parts of the standards. If systems such as the ones described here could alleviate the need for some of this instruction, instruction librarians would be free to focus on other more complex facets of information literacy. Interest in the shift of focus in information literacy instruction is reflected throughout the profession. In lamenting the overemphasis of technology literacy and skills in connection to information literacy, Grafstein (2007) argued for more attention to be paid to those higherorder thinking skills inherent in the final three competencies of the information literacy definition. In recent information literacy literature, authors such as Andersen (2006), Lloyd (2005), and Simmons (2005) have lamented the fact that current instruction and standards seem to emphasize the skills related to finding and accessing information. They argued that the focus of library instruction should be the understanding and synthesis of information. These authors looked to
genre and discourse theory as a basis for library instruction. They maintained that instructional librarians should teach students the discourse of their chosen field rather than generic tasks such as finding information. One of the driving forces behind this shift in focus is the fact that university administrators, their accrediting bodies, and other stakeholders are demanding accountability from their institutions. They are emphasizing the importance of student learning outcomes or demonstrable changes in student knowledge that are directly related to a particular course or educational activity. These expectations make it essential for librarians to focus on broader concepts and devote time to outcomes assessment in order to provide “evidence of accountability… to link to the institutional mission and to demonstrate that learning actually occurred” (Hernon and Dugan, 2004, p. xvii). Interestingly, the importance of disciplinary knowledge is supported by several studies in the information retrieval field. Sihvonen and Vakkari (2004) found that experts in the field fared better than novices in employing thesauri to aid in searching, leading them to suggest that increased subject knowledge is a “vital condition for benefiting from a thesaurus” (p. 688). Likewise, Shiri and Revie (2005) found that experts and students used thesauri differently. Experts mainly used the thesaurus to narrow their searches, while the students focused on its ability to broaden a search and suggest new terms. Phillips-Wren and Forgionne (2006) also emphasized the importance of subject knowledge to search success, especially in technical and medical fields. This recent attention to the role of subject knowledge in information retrieval and literacy is also reflected in information literacy standards. In its standards, ACRL acknowledged the importance of disciplinary knowledge. The standards stated that information literate individuals should understand that knowledge can be “organized into disciplines that influence the way information is accessed,” and they should “implement the search using investigative protocol appropriate to the discipline” (Association of College and Research Libraries, 2007). Nevertheless, these standards are meant as a general guideline for all higher education, with no distinction between fields or levels of study. In June of 2006, ACRL's Task Force on Information Literacy for Science and Technology approved a new set of standards that make special recommendations for science, engineering, and technology. These new standards recognize the “unique challenges in identifying, evaluating, acquiring, and using information” faced by science, engineering and technology students and offer specific recommendations for information literacy instruction in these fields (Task Force on Information Literacy for Science and Technology, 2006, p. 634). For instance, in the general framework, the third performance indicator of the first competency indicates that the information literate student “considers the costs and benefits of acquiring the needed information.” The corresponding section in the Science and Engineering/Technology standards focuses on the student's knowledge of the literature of the discipline. These standards maintain that information literate students in science and engineering/technology must understand how the literature of these specific disciplines is produced and disseminated. They must be aware of relevant sources, such as scientific manuals and handbooks. In addition, the Science and Engineering/Technology standards address the ability of information literate individuals to “use other methods of search term input, such as structure searching and image searching,” which is not addressed in the general standards (Task Force on Information Literacy for Science and Technology, 2006, p. 637). This attention to specific information literacy skills and knowledge appropriate to individual disciplines reflects the growing trend toward discourse knowledge and shows that the ACRL is working toward this goal. 6. Conclusion Although instruction librarians still need to teach people how to locate and access information, newly designed information systems
L. Saunders / Library & Information Science Research 30 (2008) 86–93
can assist the user with these tasks. This opens up possibilities for instruction librarians to focus on other facets of information literacy. Both the information literacy and information retrieval literatures draw the same conclusions about the importance of discourse familiarity to search success, and, by implication, the importance of instructing students in the discourse of their particular field. If instruction librarians have a good grasp of information retrieval, they can adapt their teaching to focus on parts of information literacy that often get overlooked in favor of access skills. However, a thorough understanding of how the systems work and how much they are capable of doing for the user is necessary before any changes are made to instruction programs. This study begins to explore these possibilities, but more needs to be done. Future research should consider actual information literacy practices, as well as the extent to which current programs already incorporate the information retrieval capabilities discussed here. How aware are instruction librarians of how information retrieval systems work? Do they take these functionalities into account in their instruction? Answering these questions will be the first step in determining the impact that information retrieval capabilities can have on information literacy instruction, especially with the focus on the development of student learning outcomes. References Abdelali, A., Cowie, J., & Soliman, H. S. (2007). Improving query expansion using semantic expansion. Information Processing & Management, 43, 705−716. Retrieved October 1, 2007, from ScienceDirect. Abdi, H., Valentin, D., & Edelman, B. (1999). Neural networks. Thousand Oaks, CA: Sage. Andersen, J. (2006). The public sphere and discursive activities: Information literacy as sociopolitical skills. Journal of Documentation, 62, 213−228. Retrieved October 1, 2007, from Emerald Insight. Association of College and Research Libraries. (2007, May 21). Information literacy competency standards for higher education. Retrieved October 1, 2007, from http:// www.ala.org/ala/acrl/acrlstandards/informationliteracycompetency Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM Press. Berrocal, J. L., Alonso, C. G., Figuerola, E. R., & Zazo, A. F. (2005). Reformulation of queries using similarity thesauri. Information Processing & Management, 41, 1163−1173. Retrieved October 1, 2007, from LISA database. Cheng, G. Y. T. (2003). Educational workshop improved information-seeking skills, knowledge, attitudes and the search outcome of hospital clinicians: A randomized controlled trial. Health Information and Libraries Journal, 20, 22−33. Retrieved October 1, 2007, from LISA database. Choi, J., Kim, M., & Raghavan, V. V. (2006). Adaptive relevance feedback method of extended Boolean model using hierarchical clustering techniques. Information Processing & Management, 42, 331−349. Retrieved October 1, 2007, from LISA database. Clark, D. (2007, May 6). Learning domains or Bloom's taxonomy. Retrieved October 3, 2007, from http://www.nwlink.com/~donclark/hrd/bloom.html Cordón, O., Herrera-Viedma, E., & Luque, M. (2006). Improving the learning of Boolean queries by means of a multiobjective IBQE evolutionary algorithm. Information Processing & Management, 42, 615−632. Retrieved October 1, 2007, from the LISA database. Desai, M., & Spink, A. (2005). An algorithm to cluster documents based on relevance. Information Processing & Management, 41, 1035−1049. Retrieved October 1, 2007, from ScienceDirect. Gordon, C. A. (2002). Methods for measuring the influence of concept mapping on student information literacy. School Library Media Research, 5. Retrieved October 1, 2007, from http://www.ala.org/ala/aasl/aaslpubsandjournals/slmrb/slmrcontents/ volume52002/gordon.cfm Grafstein, A. (2007). Information literacy and technology: An examination of some issues. Portal: Libraries and the Academy, 7 (pp. 51–64). Retrieved October 1, 2007, from Project Muse. Hernon, P., & Dugan, R. E. (Eds.). (2004). Outcomes assessment in higher education: Views and perspectives. Westport, CT: Libraries Unlimited.
93
Jang, M., Park, E., & Ra, D. (2005). Techniques for improving web retrieval effectiveness. Information Processing & Management, 41, 2107–1223. Retrieved October 1, 2007, from ScienceDirect. Jones, S., MacFarlane, A., Sormunen, E., & Vakkari, P. (2004). Query exhaustivity, relevance feedback, and search success in automatic and interactive query expansion. Journal of Documentation, 60, 109−127. Retrieved October 1, 2007, from the LISA database. Jung, S., Herlocker, J. L., & Webster, J. (2007). Click data as implicit relevance feedback in web search. Information Processing & Management, 43, 791−807. Retrieved October 1, 2007, from LISTA. Kane, M., & Trochim, W. M. K. (2006). Concept mapping for planning and evaluation. Thousand Oaks, CA: Sage Publications. Khan, S. M., & Khor, S. (2004). Enhanced web document retrieval using automatic query expansion. Journal of the American Society for Information Science and Technology, 55, 29−40. Retrieved October 1, 2007, from ABI-Inform. Khoo, C. S. G., & Wan, K. (2004). A simple relevancy-ranking strategy for an interface to Boolean OPACs. Electronic Library, 22, 112−120. Retrieved October 1, 2007, from LISTA. Larkin, J. E., & Pines, H. A. (2005). Developing information literacy and research skills in introductory psychology: A case study. Journal of Academic Librarianship, 31, 40−45. Retrieved October 1, 2007, from LISA database. Lloyd, A. (2005). Information literacy: Different concepts, different contexts, different truths? Journal of Librarianship and Information Science, 37, 82−88. Retrieved October 1, 2007, from Sage. Losee, R. M. (2007). Decisions in thesaurus construction and use. Information Processing & Management, 43, 958−968. Retrieved October 1, 2007, from ScienceDirect. Lucas, W., & Topi, H. (2004). Training for web search: Will it get you in shape? Journal of the American Society for Information Science and Technology, 55, 1183−1198. Retrieved October 1, 2007, from the LISA database. Macfarlane, J., Bhogal, A., & Smith, P. (2007). A review of ontology based query expansion. Information Processing & Management, 43, 866−886. Retrieved October 1, 2007, from ScienceDirect. MacPherson, K. (2004). Undergraduate information literacy: A teaching framework. Australian Academic and Research Libraries, 35, 226−241. Retrieved October 1, 2007, from the LISA database. Murdock, V., Kelly, D., Croft, B. W., Belkin, N. J., & Yuan, X. (2007). Identifying and improving retrieval for procedural questions. Information Processing & Management, 43, 181−203. Retrieved October 1, 2007, from LISTA database. Phillips-Wren, G. E., & Forgionne, G. A. (2006). Aided search strategy enabled by decision support. Information Processing & Management, 42, 503−518. Retrieved October 1, 2007, from ScienceDirect. Rajapakse, R. K., & Denham, M. (2006). Text retrieval with more realistic concept matching and reinforcement learning. Information Processing & Management, 42, 1260−1275. Retrieved October 1, 2007, from LISA database. Rooney, N., Patterson, D., Galushka, M., & Dobrynin, V. (2006). A relevance feedback mechanism for cluster-based retrieval. Information Processing & Management, 42, 1176−1184. Retrieved October 1, 2007, from LISA database. Shiri, A., & Revie, C. (2005). Usability and user perceptions of a thesaurus-enhanced search interface. Journal of Documentation, 61, 640−656. Retrieved October 1, 2007, from the LISA database. Sihvonen, A., & Vakkari, P. (2004). Subject knowledge improves interactive query expansion assisted by a thesaurus. Journal of Documentation, 60, 673−690. Retrieved October 1, 2007, from Emerald Insight. Simmons, M. H. (2005). Librarians as disciplinary discourse mediators: Using genre theory to move toward critical information literacy. Portal: Libraries and the Academy, 5 (pp. 297–311). Retrieved October 1, 2007, from Project Muse. Task Force on Information Literacy for Science and Technology. (2006). Information literacy standards for science and engineering/technology. College and Research Libraries News, 67, 634−641. Tudhope, D., Binding, C., Blocks, D., & Cunliffe, D. (2006). Query expansion via conceptual distance in thesaurus indexed collections. Journal of Documentation, 62, 509−533. Retrieved October 1, 2007, from the LISA database. Vechtomova, O., & Karamuftuoglu, M. (2006). Elicitation and use of relevance feedback information. Information Processing & Management, 42, 191−206. Retrieved October 1, 2007, from the LISA database. Vechtomova, O., & Karamuftuoglu, M. (2007). Query expansion with terms selected using lexical cohesion analysis of documents. Information Processing & Management, 43, 849−865. Retrieved October 1, 2007, from ScienceDirect. White, R. W., & Marchionini, G. (2007). Examining the effectiveness of real-time query expansion. Information Processing & Management, 43, 685−704. Retrieved October 1, 2007, from ScienceDirect.