Weak information work in scientific discovery

Weak information work in scientific discovery

Information Processing and Management 43 (2007) 808–820 www.elsevier.com/locate/infoproman Weak information work in scientific discovery Carole L. Pal...

189KB Sizes 0 Downloads 35 Views

Information Processing and Management 43 (2007) 808–820 www.elsevier.com/locate/infoproman

Weak information work in scientific discovery Carole L. Palmer *, Melissa H. Cragin 1, Timothy P. Hogan

2

Graduate School of Library and Information Science, University of Illinois at Urbana–Champaign, 501 E. Daniel Street, Champaign, IL 61820-6212, United States Received 27 October 2005; received in revised form 11 June 2006; accepted 16 June 2006 Available online 8 September 2006

Abstract Scientists continually work with information to move their research projects forward, but the activities involved in finding and using information and their impact on discovery are poorly understood. In the Information and Discovery in Neuroscience (IDN) project we investigated the information work involved as researchers make progress and confront problems in the practice of brain research. Through case studies of recent neuroscience projects, we found that the most difficult and time-consuming information activities had parallels with Simon’s explication of weak methods in scientific problem solving. But, while Simon’s weak/strong distinction is an effective device for interpreting information work, his general conception of how discovery takes place is artificially constrained. We present cross-case and case-based results from the IDN project to illustrate how the conditions of problem solving Simon associated with weak methods relate to information work and to identify additional weak aspects of the research process not considered by Simon. Our analysis both extends Simon’s framework of what constitutes the discovery process and further elaborates how weak approaches influence the conduct of research. Ó 2006 Elsevier Ltd. All rights reserved. Keywords: Scientific discovery; Information practices; Information seeking; Neuroscience; Research processes

1. Introduction Information is an essential resource in the process of scientific discovery, and scientists are continually working to gather information from the literature, databases, web resources, and colleagues. In turn, they evaluate, collect, manage, consult, integrate, and apply that information to move research forward. This ‘‘information work’’ has never been assessed on the large scale in terms of time spent or impact on the advancement of science. But, its importance is evident in the number of scientific researchers and information scientists striving to find better ways to mobilize and work with the ever growing body of information resources. In the Information and

*

1 2

Corresponding author. Tel.: +1 217 244 0653; fax: +1 217 244 3302. E-mail addresses: [email protected] (C.L. Palmer), [email protected] (M.H. Cragin), [email protected] (T.P. Hogan). Tel.: +1 217 244 8729; fax: +1 217 244 3302. Tel.: +1 217 333 3280; fax: +1 217 244 3302.

0306-4573/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2006.06.003

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

809

Discovery in Neuroscience (IDN) project we investigated the information work involved in the practice of brain research (Palmer, Cragin, & Hogan, 2004). We found that the most difficult and time-consuming activities paralleled Simon’s explication of weak methods in scientific problem solving. As discussed by Simon, Langley, and Bradshaw (1981), weak problem solving is associated with specific research conditions, including an ill-structured problem space, unclear or unsystematic steps, and a lack of prior domain knowledge. Conversely, strong problem solving is applied when a research problem is well defined, and it tends to proceed through systematic, routine activities and with a high level of domain knowledge. Strong, expert methods are practiced in what Kuhn (1962) referred to as normal science. Revolutionary, paradigm-altering science, on the other hand, advances with methods that have not been refined for the given application. While ‘‘crude and cumbersome’’ these weak approaches are not a second-best choice for solving research problems but rather ‘‘may be the only ones at hand on the frontiers of knowledge, where few relevant special techniques are yet available’’ (Langley, Simon, Bradshaw, & Zytkow, 1987). In our analysis of information work in the practice of brain research, Simon’s distinction between weak and strong approaches proved to be an effective device for interpreting the activities involved in finding and using information. However, we found Simon’s more general ideas about the process of discovery to be artificially constrained. Thus, our results extend Simon’s conception of what constitutes the discovery process while further elaborating how weak approaches influence the conduct of research. In this paper, we begin by introducing the literature on scientific discovery and problem solving that informed our analysis and by describing our case study methods. Based on our cross-case analysis, we discuss our conception of weak information work (WIW), the prominence of WIW in certain stages of research, and its role in specific modes of information seeking. Two short case studies are presented to provide a more detailed illustration of how the conditions Simon associated with weak methods extend to information work. We conclude by arguing that understanding the dynamics of weak and strong information work is important for determining how information systems and services can make the greatest contribution to the discovery process. 2. Background 2.1. Conceptions of discovery The mechanisms of scientific discovery have been characterized from different scholarly perspectives. Practicing scientists have written about the process of discovery to raise the awareness of others involved in the scientific enterprise and the interested public (e.g., Root-Bernstein, 1989). Historical accounts are most numerous, with many authors concentrating on the complex and esoteric nature of science or particular high-profile events (e.g., Bernal, 1953; Harwit, 1981; Holton, 1973). Kuhn’s (1962) influential book distinguished revolutionary science from ‘‘normal’’ science, providing a more socially based interpretation than many earlier works. He situated significant research advances in the context of the larger landscape of activities that build and sustain scientific paradigms and disciplines. Few information scientists have conducted empirical investigations of the discovery process for application to information service and system development, although Bawden’s (1986) discussion of the connection between creativity and information strategies is worthy of note. He recommended certain information technology features for enhancing creative problem solving and discovery, such as access to peripheral material and explicit representation of analogies, patterns, and exceptions. Similarly, Martyn (1974) argued that the information needed to formulate and solve problems often lies outside of the core material that supports professional competencies and may not appear immediately relevant. Cognitive science has contributed much to our understanding of scientific thinking. Dunbar’s work (e.g., Dunbar, 1993) in particular has interesting implications for information systems. For example, his finding that the setting of goals impacts the discovery of new concepts suggests that particular kinds of information could assist scientists in reworking goals as new evidence or inconsistent findings emerge. Other scholars have recognized the importance of information in the discovery process. Newell (1969) identified information acquisition as an important but strictly cognitive component of discovery. Fujimura (1987) accounted for certain types of information gathering and exchange in articulation work—the collecting, coordinating, and integrating tasks that make research projects ‘‘doable’’.

810

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Within the body of research on computational scientific discovery (e.g., Darden, 1997; Valde´s-Pe´rez, 1999), the work of Simon et al. introduced in Section 1 has important implications for how information activities fuel the research process. Their conceptualization of strong and weak scientific approaches suggests that certain kinds of information activities can make high impact contributions to discovery. Our study confirms this idea, not only showing that WIW is influential but that it is difficult to carry out and not well supported by current information systems and services. At the same time, Simon’s framework for understanding discovery does not account for important segments of the research process. 2.2. Information and the discovery process All of the scholars discussed in 2.1 have associated information with discovery, but none has offered a good representation of the variety and necessity of the work of finding and using information. de Jong and Rip’s (1997) discussion of the future of computer-supported discovery environments (CSDEs) in the practice of science is a notable exception. Using a fictional scenario they illustrate how the process of discovery unfolds as scientists work through inter-related questions, each of which requires a number of information-based activities before the project can move forward. The central role of information in the discovery process is evident as they show, for example, how ‘‘in many situations the problem space is not readily available, but has to be constructed from the heterogeneous set of (electronic) resources which scientists have at their disposal’’ (p. 237). While de Jong and Rip’s ideal scenario succeeds at showing the socio-technical dimensions of the design and use of CSDEs, they offer a truncated picture of the research process. For instance, in the real projects documented in our study, research almost always took a much more circuitous path and of course took longer than the three days depicted in their sequence of events. Their representation skips over the preliminary yet critical activities of defining an emerging research problem, becoming familiar with associated intellectual domains, identifying and evaluating possible paths of investigation, as well as the ongoing work of interacting with collaborators, colleagues, and competitors. Simon and his colleagues, like de Jong and Rip, also gave little attention to processes or activities outside the data collection and analysis stages of inquiry. They briefly note that conducting science is different from other kinds of problem solving because research is a social process that often involves many scientists and proceeds over an extended period of time. Nonetheless, they assert that in spite of such differences ‘‘the component processes, which when assembled make the mosaic of scientific discovery’’ have ‘‘no special properties’’ that distinguish them from other problem solving situations (Simon et al., 1981, p. 2). Instead these tasks, many of which involve finding, managing, and using information, are covered by their concept of ‘‘meta-activities’’. Meta-activities can play a seminal role in discovery, as Simon et al. (1981) point to with the example of how ‘‘Mendeleev discovered the periodic table while planning the arrangement of topics for an elementary chemistry textbook’’ (p. 4). However, the only other meta-activities they explicitly identify are writing and disseminating research results. The ‘‘mosaic’’ they invoke does not cover substantial parts of the research process, especially the activities that precede data collection and analysis. Whether information work is thought of as a meta-activity (Simon et al., 1981) or as a fundamental part of discovery (de Jong & Rip, 1997), without it little progress would be made in scientific research. Our cases show that it is a consequential and distinct part of the mosaic of discovery that surrounds, connects, and fuels the data and analysis activities emphasized by Simon. Moreover, our results indicate that, like the heuristics of problem solving or the techniques applied in revolutionary vs. normal science, information work processes can also be weak or strong. 2.3. Weak and strong conditions As discussed in Section 1, Simon et al. (Langley et al., 1987; Simon, 1986; Simon et al., 1981) associated weak scientific methods with certain problem solving conditions, such as an ill-structured problem space, unclear or unsystematic steps, and limited prior domain knowledge. They also explain that weak approaches operate with less information and that a particular weak method may be applicable to many different tasks or domains. Weak approaches tend to be used by novices and for solving problems in novel domains. The novice

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Weak Elements Ill-structured problem space Low domain knowledge Unsystematic steps Wide application Data driven Seek and search

811

Strong Elements Structured problem space High domain knowledge Systematic steps Task specific Theory driven Recognize and calculate

Fig. 1. Continuum of conditions Simon et al. associated with strong and weak scientific approaches.

problem solver is driven by data, and they search and test to figure out what to do next. While weak methods are less powerful than strong methods, they are often the only ones available on the fronts of science where specialized techniques have not yet been developed. Cutting edge, transformative science proceeds through weak methods by necessity. In contrast, strong approaches tend to be driven by existing theory and ‘‘truths’’. They proceed with clear and calculated steps and depend on expert domain knowledge. Since they are more specific in their application, they are not easily extended to new tasks or domains. But, strong approaches are powerful and often allow solutions to be found with little or no search. Instead of seeking and searching, the expert problem solver tends to recognize and calculate (Langley et al., 1987; Simon, 1986; Simon et al., 1981). Fig. 1 represents a summary of the conditions Simon and his colleagues associated with weak and strong approaches. Note that these conditions describe aspects of the scientific problem under investigation as well as the knowledge and practices of individual scientists. In our analysis we found that these conditions can be applied to different levels of information work. They influence specific information searching techniques such as browsing, as well as more long-term processes such as exploring how to design a new experimental procedure. Thus we use strong and weak to describe situations that involve isolated and routine activities, techniques, approaches, and processes. We use ‘‘information work’’ as a general term to refer to information practices at any of these levels of granularity. 3. Methods To investigate the information work involved in brain research, the IDN project team developed case studies of neuroscience projects at four laboratories located at three research universities across the country. The focus of the data collection was initially on specific instances when researchers made progress or confronted problems in the course of research. These incidents served a dual role. They anchored the study to current, significant research activities, and they provided a point of entrance into the larger research projects that defined the parameters of a case. Additional project cases were identified as we worked with participants and became more firmly embedded at the research sites. The results presented here are based on data collected primarily during the first half of the project in 2003–2004. At present, data collection has concluded on all but two cases and analysis is ongoing. 3.1. Sample and body of data We enrolled a total of 25 participants in the project, 11 of which were key informants selected because they were leading ongoing research projects. The remaining 14 participants were other senior and junior biological and computer scientists, postdoctoral researchers, graduate students, and laboratory technicians and managers who played important roles in the case projects. The participants represented four distinctly different laboratories. One laboratory is a small group that does behavioral and neuronal research on learning and memory. The second is a larger operation with a number of research scientists working on brain imaging related to psychiatric disorders such as schizophrenia. The third lab is a large interdisciplinary biology center involved in informatics development. The fourth is run by a single investigator but is involved in several interdisciplinary projects concerning bioinformatics and neurologic diseases.

812

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Some of our participants (32%) were drawn from a group of field testers for the Arrowsmith Project which is developing a data mining tool that searches MEDLINE for complementary but disconnected literatures (Smalheiser, 2005; Swanson & Smalheiser, 1999). The Arrowsmith team has been an important partner in gaining access to neuroscientists working in a variety of brain research specializations. The field testers’ literature searches were often used as points of entrance into the individual case projects and for learning about related research going on at their laboratories. 3.2. Case development Thirty-eight projects were tracked of which eight were developed into full case studies. Case data include face-to-face and telephone interviews, search diary records, observation field notes, and project documents. These largely qualitative data generated rich, local information on how scientific work is carried out within the framework of a specific project (Fujimura, 1987; Vaughan, 1992). Data from the remaining thirty projects provided additional context and details on related research being conducted in the laboratories. We documented the research process at a level that brings important aspects of information work into view, such as the things that make information important, useful, or difficult to find or use, and the social linkages by which information and knowledge move and coalesce. Interviews: We conducted a total of 71 semi-structured interviews. Key informants were interviewed multiple times, with each session increasing in focus and specificity over time. Typically, we conducted 60–90 min sessions with the first one concentrating on the scientist’s projects, their specific interests and responsibilities, the larger laboratory context and related projects, and basic information seeking and use practices. We then used this background information to inform the next interview, where we probed for project details and specific information incidents, followed up on search diary records, and identified other project participants to be interviewed for a given case. Subsequent interviews followed the trajectory of the projects and related incidents. Search diary: The field testers used an electronic log developed by our collaborators on the Arrowsmith Project to document literature searches and other types of information activities. The log contained two forms, an Arrowsmith Diary for searches using the literature mining tool and an Information Activity Diary to record other kinds of information seeking. Data from the diary entries were coded and analyzed to identify patterns in anticipated and emergent categories. The analysis was validated through an intercoder reliability process in which the three project team members developed consensus on each category and its application to the data. For those diary entries that did not fall clearly into a category, we consulted with the field tester to confirm or correct our classification. The importance rankings reported in Section 4.3 were specified by the field tester as part of their diary entry. The diary also played an important role in identifying critical incidents for interviewing and adding detail to the more descriptive interview and observation data. Analysis of the diary entries required understanding the participants’ research areas and current projects, therefore we often returned to our background interview transcripts when coding the diary entries and regularly verified coding decisions in later interviews with the researchers. Observation: A total of approximately twenty hours of observation was conducted at the laboratory sites, primarily with key informants. These data gave us a broader view of the information activities, resources, and personnel associated with the case projects. In addition to recording field notes on day-to-day bench work, we also had the opportunity to photograph legacy computer systems and other experimental apparatus, review organizational charts, and look through microscopes. Project documents: Materials collected for content analysis included lab notes, experiment documentation, and reports, proposals, and publications used or produced by the scientists in the projects being studied. From these sources we will be extracting information about the people and literature referenced by the scientists as well as additional evidence of research progress and information interactions. Case files consist of transcribed verbatim and descriptive texts of interviews and observations, coded diary entries, and document data. Open coding of case file data was followed by more refined axial coding, and the overall process of coding proceeded through iterative, comparative analysis (Strauss & Corbin, 1998). Each transcript received two to three rounds of descriptive and thematic coding using NVivo, a software package

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

813

designed for qualitative analysis and theory building. As is typical of this type of qualitative approach, the goal of analysis was to reveal new insights, not produce comprehensive or widely generalizable results (Becker, 1998; Glaser & Strauss, 1967). Individual cases were analyzed longitudinally to capture progress and changes in research work, and comparative analysis across cases was conducted to identify commonalities and differences in information practices among the different research teams. 4. Analysis 4.1. Weak and strong information work in practice Our approach in analysis has been to understand information work in the discovery process by focusing on advances that take place in a research project, as well as instances where progress is deterred. As Gerson (2002) notes, to understand discovery we need to analyze the conditions under which effective intersections form and the circumstances that block or retard fruitful intersections. In building these intersections, the researchers we studied explored and gathered diverse information for many purposes—to assess their own work and that of others, to integrate background knowledge, to solve short-term instrumental problems, to consult and talk shop, and to sort out complex intellectual relationships among the vast body of neuroscience findings. The handling and processing of information is part of the task structure of every kind of work. All work ‘‘involves some kind of information production/construction/consumption/use’’ (Gerson as cited in Strauss, Fagerhaugh, Suczek, & Wiener, 1985, p. 253). In general, the information work we documented was not inherently weak or strong. Instead it tended to be influenced by the conditions listed in Fig. 1 which are related to the scientific problem under investigation and the knowledge and activities associated with individual scientists. And, as we will see in 4.2, weak and strong work were aligned with certain stages of the research process. A few types of activities, however, were found to be generally weak or strong across problems, scientists, and stages of research. Footnote chasing is one example of a consistently strong practice. Most researchers followed references in the literature for various purposes within the course of research. While footnote chasing was common in the early stages of a project, we also documented numerous instances of the activity later in the research process, for example when a researcher needed to relate new findings to a larger or different body of knowledge or locate specifics on a person, lab, or technique used elsewhere. Relating Simon’s conditions in Fig. 1, footnote chasing follows a clear, structured path of bibliographic references. The searcher recognizes items of interest and calculates their potential relevance and the next step in the chaining process. At times researchers may pursue leads into literature where they have limited domain knowledge, but this does not necessarily deter them from the task at hand of identifying potentially relevant literature. In contrast, browsing in a large set of texts or a bibliographic database is a weaker literature searching technique, since the path or next steps are not always obvious and a lack of domain knowledge or terminology is more likely to inhibit one’s ability to find relevant materials. Data and literature mining techniques are weaker yet, as they depend on datadriven search and are often conducted with an ill-defined problem focus. Strong approaches were commonplace in experimental work. Researchers frequently searched for protocols and instrumentation information from standard or locally established sources. The problems encountered tended to be tightly constrained, domain knowledge was usually high, and the steps to be taken were relatively routine. There were also many strong processes at work in what Simon would consider the meta-activity level of discovery. For example, strong information work was used to rebuild expertise when a scientist had not been active in a core research area for a period of time and needed to do remedial reading and ‘‘retooling’’ to catch up. That kind of work is strong as long as the scientist is highly aware of what they need to brush up on and how to go about it. Similarly, strong information work also tends to be used for ‘‘core maintenance,’’ a strategy used to sustain a firm position in a disciplinary specialization (Palmer, 1999, 2001). Scientists or research teams will maintain productivity through systematic studies in their established core while selectively targeting new, more high-risk opportunities in new areas. For core maintenance, researchers use strong techniques and the well developed expertise in their specialization. The new work at the high-risk research fronts involves much weaker activities of building strategic alliances and collaborations, exploring new domains, and testing new ideas and techniques.

814

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

4.2. Stages of research and WIW Four stages of research were evident in the case projects we followed: preparation, data collection, analysis, and dissemination. The heaviest concentration of WIW was in the preparation stage. We also documented a moderate level of WIW in the analysis stage, with considerably less in dissemination, and only rare occurrences in the experiment stage where most problem solving relates to techniques and tools. Information work in the preparation stage generally preceded the other stages, laying the foundation for future discovery. However, some preparation activities were performed in tandem with other stages. Current awareness reading is one example. It tends to be an ongoing although not an altogether routine or strong activity. Strong information work was prevalent in preparing or setting a base for discovery—in the work of keeping a laboratory up and running, improving its technical and intellectual abilities, and retaining a research team and other collaborators. But in the preparation stage there were many weaker activities that included determining the feasibility and potential impact of a project, assessing initial hypotheses relative to the existing body of research, enrolling new collaborators, and developing speculative yet convincing funding proposals. The WIW in the analysis stage tended to be more intermittent. Weak approaches came into play when data interpretation was not straight forward, there were unexpected findings, or it was determined that results might be applied in new ways or extended to a new domain. For instance, in one case, before an investigator could determine the meaning of the recent findings produced by his lab, he needed to first explore other possible explanations for the results, including that they may be observing an artifact of imprecise measurement. In another instance, an author had to re-assess data in light of more recently published research from another laboratory. In the data collection and dissemination stages, information work tended to be stronger because the problems at hand were more structured, the steps to be taken were more routine, and domain knowledge was generally high. For instance, the work of building influence through publication and other forms of scholarly communication proceeded fairly systematically. But, as researchers moved out of their core knowledge base and familiar intellectual and social structures, seemingly strong approaches became much weaker and additional information needs were introduced. For instance, if the results of an experiment have broader implications than originally thought the dissemination process becomes less systematic. The literature in outside domains may need to be consulted, which may require deciphering unfamiliar terminology and additional reading for background and context. In such cases, the information gathered from far afield will need to be weighed, evaluated, and confirmed, and experts in other fields may need to be consulted. To make further progress, new partnerships may need to be initiated, assessed, and nurtured. 4.3. Weak searching The Arrowsmith testing conducted by some of our participants offered a unique opportunity to examine weak information searching activities. A data mining tool by design, Arrowsmith was conceived as a system to support weak, data-driven approaches to literature searching. It allows searchers to find links among disconnected literatures or areas of research (Smalheiser & Swanson, 1994, 1996, 1998). The field testers were encouraged by the Arrowsmith team to perform searches to test hypotheses and new ideas in the literature, and they received training to improve their searching abilities with the system and with MEDLINE more generally. They worked cooperatively with us by recording these kinds of search activities and other types of database and Internet searches on a regular basis. Table 1 presents a typology of reported searches based on motivating situation or impetus for the search. Using the procedures described in Section 3.2, we identified eight primary categories and nine subcategories in a total of 139 diary entries. The typology in Fig. 1 was developed prior to our WIW analysis and was first presented in Palmer et al. (2004). As might be expected, weak conditions were associated with the searches assigned to categories A, C, and F, where researchers were testing new ideas or searching for information outside of their primary research domain. These searches generally required more time and effort than those in other categories, since scientists often began without knowing the scope of what they were looking for and then needed to undertake multiple iterations of searching and assessing to make progress. There were also interesting instances of weak searching within category B. Most of these situations involved searching literature databases or the Web, often with limited subject knowledge, but may also have included scanning a set of journals or a textbook for a lead.

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

815

Table 1 Information search typology A. Assessing hypothesis: Own preliminary hypothesis Established hypothesis B. Assessing local finding relative to literature C. Searching for specific information outside domain D. Searching deeply in literature of own domain E. Exploring literature in own domain: General literature reviews Current awareness efforts F. Exploring outside domain G. Problem-solving: Local methods and instrumentation problems Intellectual problems (specific questions, fact finding) H. Known-item searching: Footnote chasing Known-item search Known person/author search

120

Importance ranking (%)

Categories

100

Percent ranked Potentially or Definitely Important

80

60

40 29 23

25 19

20 11

11

3

2

0

G C B Specific Specific search Assessing finding outside domain questions and fact finding

H Known-item searching

D F Deep search Exploring own domain outside domain

A Assessing hypothesis

E Exploring own domain

Categories with importance rankings

Fig. 2. Importance of information resulting from searching activities.

The frequency of hypothesis assessment (category A) and specific out of domain searching (category C) was lower than expected considering the ease of access and encouragement to use Arrowsmith for this purpose. However, the specific out of domain searches that were performed were considered by the field testers to be very important to the research process. As illustrated in Fig. 2, only three instances of category C searching were documented, but each one was ranked as highly important. Interestingly, the next highest ranked category in terms of importance was a strong practice—G, instrumental problem solving and fact finding. 4.4. WIW in context In this section we present two excerpted cases to further elaborate the nature of WIW processes. They both focus on the preparation stage of research, but each represents a very different contribution to the discovery process. The first case is about testing a new idea, where a junior researcher is a novice in a prospective research area and is highly dependent on a more senior collaborating partner. The second case is about the

816

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

development of a new technical procedure that needs to be implemented before an entire area of research can advance. The two cases demonstrate some fairly typical as well as some less intuitive aspects of WIW. They also help to expand Simon’s conception of the components of the discovery process by showing in some detail the range and necessity of the research work that precedes the actual implementation, or experimental part, of a research project. 4.4.1. Case 1: Testing an idea This case involved a post-doctoral researcher who happened upon a unique opportunity to design and perform a study outside his domain of expertise. The idea for the study emerged from a particular WIW activity, and assessing the feasibility and value of embarking on the study required further WIW. The researcher was part of a laboratory dedicated to studying the neural substrate of learning and memory. He was well-versed in electrophysiology and behavioral methods and his experiments often involved recording neuronal activity in small mammals. One of his long-time interests was theta rhythm, rhythmic activity in the brain thought to be important for the control of voluntary movement and sensory processes. Along with investigations of this kind, he was field testing the Arrowsmith literature mining system to assess the tool’s usefulness in his daily research activities. In one of his Arrowsmith searches he used the query terms ‘‘theta rhythm’’ and ‘‘neurogenesis’’. He characterized this as an undirected exploratory activity and performed the search with various interesting keywords: ‘‘There’s a keyword list that’s published by the Society for Neuroscience. I looked down the list to see things that might be related to theta rhythm, that people haven’t really thought of before. [I] just typed them in.’’ [P1A3, June 9, 2003]. While reviewing the abstracts that resulted from this search, he began thinking about the particular types of experiments he could do to explore theta rhythm and neurogenesis. He explained that this thinking helped him stumble across a somewhat different idea: ‘‘One way to do this study is to anesthetize the animals and induce theta by giving them certain drugs. And the type of theta that is induced by doing that procedure is virtually the same brain profile that occurs during REM sleep, rapid eye movement sleep, which is dream state sleep. And that was interesting . . . that’s an interesting idea.’’ [P1A3, June 9, 2003]. The researcher found himself faced with both challenges and opportunities. Although he was familiar with the literature on theta rhythm, the new project he envisioned was sleep-related, a research domain with which he was much less familiar. Normally he did not read publications about sleep research, and he had little sense of the work currently going on in the field. Also, he only had the resources to do the study in a crude fashion with limited controls. However, a more senior neuroscientist he knew elsewhere through his association with the Arrowsmith project had former collaborators who had expertise in different, more advanced techniques. With their help it would be possible to generate cleaner, more reliable data. The senior colleague offered to assist him by making contact with the other potential collaborators to help him set up the equipment necessary for the study and to work with him to further brainstorm on how to conduct the study. Over the next three months, the researcher engaged in a variety of information work activities in an effort to better grasp the feasibility and value of a study about REM sleep and neurogenesis. He spoke with additional scientists at a nearby laboratory, performed new Arrowsmith searches, and began to explore the sleep-related literature, reading about the pros and cons of different techniques he could use to block sleep in the study. Throughout the course of these activities, he regularly emailed status reports, search results, and questions he was wrestling with to the senior associate who responded in a steady stream of email messages. For the post-doctoral researcher, assessing the feasibility of a REM sleep and neurogenesis study meant exploring a new research domain and reading about and assessing techniques of which he had little knowledge. He had an area of interest and a somewhat clear research question. However his ideas about an experimental design were ill-structured and undeveloped, and he had no entre´e points or direct links to the sleep research community. Both of these situations required WIW, which progressed in a trial and error manner with the literature and depended largely on confirmation and leads from his associate. Fig. 3 is an estimation of the position of Simon’s conditions on a weak/strong continuum for this case. 4.4.2. Case 2: Developing a research procedure Magnetic Resonance (MR) imaging researchers are facing a problem with anonymity and the HIPAA (Health Insurance Portability and Accountability Act). It is conceivable that someone could reconstruct brain

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 Unsystematic steps Low domain knowledge Seek and search Weak

817

Ill-structured problem Strong

Fig. 3. Simon’s conditions on a weak/strong continuum for Case 1.

images in such a way as to render the ‘‘imaged’’ person recognizable. Institutional Review Boards (IRBs) are working harder to protect human subjects’ anonymity at a time when imaging researchers want to increase the practice of sharing data to support larger and more complex studies. In addition, there is at least one neuroscience journal that requires deposit of image data into a public repository as a condition of publishing in that journal. Therefore, face recognition is a current and critical problem that is being studied in several ways by MR researchers and other scientists. One potential solution has focused on a process called de-facing in which a computer algorithm can trim away certain features (i.e. tissue) from brain scan images. One of our participants in the IDN project is a computational modeling and MR research scientist who has been working on the testing of one de-facing algorithm. Notably, she has had a leading role in this project even though the work was outside her domain of expertise. Her research team had to conduct a statistical assessment of the de-faced images to test the efficacy of the algorithm and also test a group of study subjects to see whether the de-faced images could be recognized as people they knew. A range of issues arose in the course of this study, including a complicated IRB application process and difficulties managing data and writing up the results. There were two questions driving the work: ‘‘What features of a face and head are needed to make it recognizable?’’ And, ‘‘How do we quantify that an image has been properly defaced?’’ The first, more conceptual problem necessitated the most varied information work. The researcher had to determine how best to think about the phenomenon of recognition, understand how it fits with previous research, and develop the behavioral paradigm for conducting the experiment. These problems led to literature searches, the use of various information tools and services, and the reconsideration of old coursework materials—all in a series of starts and stops, the kinds of unclear steps associated with weak approaches. To design the behavioral experiment the researcher had to build an understanding of how humans recognize people they know. In addition to ‘‘looking for ways to validate that the algorithm successfully works’’ they had to understand ‘‘what features are important for (face) recognition.’’ The problem solving process was complex in that she had to use a variety of search strategies and had no ‘‘tried and true’’ way to find the information needed. She described her searching as encumbered by a lack of ‘‘fit’’ among the question she was asking, the search terms she was using, and the literature returned. Based on some local research concerning cranial-facial measurements and schizophrenia, she used Arrowsmith to search for possible connections between, for example, anthropometry and face recognition. However, she noted ‘‘I was going from the viewpoint of cranio-facial, and so how do forensic scientists recreate a face based on a skull that they have; how do they know roughly what the features should be like? And that search was not fruitful . . . doing it that way.’’ [C1A1, February 22, 2005] She moved then to ‘‘just kind of looking through the literature to try to figure out what were the appropriate search terms, and nothing really fit.’’ To get beyond the purely medical literature, she also ran some searches in PSYCHINFO. A suggestion from a colleague put her on a new path related to the facial features. She then remembered the concept of ‘‘internal and external facial features,’’ from a graduate school course on computational facial recognition and began ‘‘digging more into that literature.’’ She found the work of a researcher who had done quite a lot of work on how people use facial features to recognize others, read some of her papers and parts of a book. Based on this work she determined, ‘‘. . . to really do this right, if we’re (going) to have someone recognize an image, (we) really have to look at people who are familiar versus unfamiliar, because it’s possible that the shape of the skull might be enough for me to say, I know who that is.’’ [C1A1, February 22, 2005]. This case presents an unusual example of WIW processes. In most other cases, the activities associated with measurement, instrumentation, and protocol problems tended to be strong. Similar to those more typical cases, the de-facing research problem was well defined, but other weak elements dominated the information work approach over all. The process of carrying out the information activities was rarely linear or continuous. There was considerable seeking and searching to find a fit between information sources and the problem, and

818

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820 Unsystematic steps Low domain knowledge Seek and search Weak

Structured problem space Strong

Fig. 4. Simon’s conditions on a weak/strong continuum for Case 2.

there was low domain knowledge in both the behavioral approaches and facial recognition area. Fig. 4 is an estimation of the position of Simon’s conditions on a weak/strong continuum for this case. 5. Discussion As stated previously, information work surrounds, connects, and fuels the data collection and analysis activities emphasized in Simon’s conceptualization of the discovery process. Moreover, information work activities are subject to the same conditions Simon associated with weak and strong scientific problem solving. Thus, as conditions from the left side of Fig. 1 increase in number and degree, information work becomes weaker. In the particular cases presented in 4.4 and across our cases more generally, two of the conditions of weak work, ill-structured problem and low domain knowledge, were prevalent indicators of WIW. Problem structure and domain knowledge have been examined previously in relation to information needs and practices more generally. For example, MacMullin and Taylor (1984) and Vakkari (1999) equated high problem structure with knowledge of central variables and their interrelations, suggesting that this type of research situation allows for highly determined information requirements, processes, and outcomes. And, the impact of low subject or domain knowledge on searching has been widely studied (e.g., Hsieh-Yee, 1993; Sihvonen & Vakkari, 2004; Wildemuth, 2004). But these and related studies have given little attention to how the information practices associated with problem structure and domain knowledge actually impact the research and discovery process. Fry and Talja’s (2004) application of Whitley’s theory of the social organization of scholarly fields provides a perspective that is closer to the actual practice of science. They identify information features and activities associated with high levels of task uncertainty in a disciplinary specialization, and it is interesting that a number of these examples would constitute WIW, especially the use of material scattered across diverse fields and high reliance on emerging personal networks to interpret information. An additional indicator of WIW in our cases was newness, a condition not identified by Simon. Newness can take many forms in scientific work. In the cases excerpted in 4.4 and numerous other instances documented in our data, weak approaches were the means for doing something new like starting down a new path of inquiry or developing a new technique. In other cases the problem at hand was new and unfamiliar to the researcher but not necessarily ill-structured or out of domain. New working relationships can also be a weakening influence. Developing new collaborations is a much weaker process than relying on an established team of experts. Newness was not only consequential in the information work process; it also tended to be a key attribute of high impact information. Even a small bit of new knowledge can lead to a debate in the field. In the projects we followed, ‘‘new’’ information could be developed or disclosed. Experimental data could lead to the development of substantial new findings, but uncovering an existing but previously unknown study or researcher in a cognate area might also push a project forward or into a new phase. Moreover, there was high value in seeing information presented in new ways. Some of the most salient instances of research advancements in our case studies hinged on new combinations or visualizations of existing data. While it may not be a surprising observation that newness complicates, or weakens, the research process, it is an important area for further investigation. Newness may be a more overarching indicator than the conditions identified by Simon of when high levels of WIW will be involved in a project. And it is clearly of importance as a characteristic of information itself. Finding new information and mobilizing old information in new ways are activities that have high potential for making a significant contribution to the research process. 6. Conclusions In this paper our aim has been to clarify and elaborate Simon’s conception of weak and strong scientific methods in terms of information work. Our empirically based analysis shows how research processes and

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

819

practices can be assessed to identify, and possibly predict, the kinds of activities and the stages of research where weak and strong processes will be centralized. In relation to the development of information systems and services, the most productive points of intervention are likely to be at the ends of the weak/strong continuum presented in Fig. 1. As shown in Fig. 2 the weakest and strongest approaches, at least in terms of information searching, were judged by our participants to be the most important for advancing research. This finding raises interesting questions for further study as to why the most influential information work falls at either end of the continuum. Nonetheless, the implications are clear. Information support for very strong, routine activities would generally reduce the information work burden for scientists. Strong information searching does not require high levels of specialized scientific expertise and can therefore be more easily delegated to information professionals. Support is also needed for what should be strong processes involved in managing data and information, especially in the increasingly demanding areas of standardization, archiving, and the development of digital repositories for the dissemination of results. On the other hand, weaker, messier practices have higher potential for promoting innovation and new discoveries. As Simon et al. (1981) note, the ‘‘fundamentality of a piece of scientific work is almost inversely proportional to the clarity of vision with which it can be planned’’ (p. 5). WIW is arduous and often speculative and in many cases not the best use the expert researcher’s time. It is true that scientific domain knowledge can always offer an interpretive advantage, but preliminary scanning and ‘‘connecting’’ of literatures such as that done when applying the Arrowsmith searching technique is one layer of WIW that could be performed by information specialists, perhaps best in consultation with domain scientists. Simon’s weak/strong conceptual framework is more powerful than reflected in his narrow conceptualization of the research process. It can be extended beyond data collection and analysis to the various levels of information work at all stages of research production. This has broad implications for service to science. Because of its importance in fueling innovative research and its difficulty in practice, WIW is a locus of activity that could benefit greatly from increased support from information professionals on research teams or through the development of specific systems and information services. The potential for scientific discovery can be improved by making it easier to conduct WIW at the fronts of science where weak processes are commonplace, especially in the preparation stages of research. In addition, delegation of stronger processes to information specialists would allow scientists to spend more time and effort on the intellectual work of discovery and less on finding ways to locate and manage the information they need to carry out that work. Acknowledgement We thank Les Gasser for drawing our attention to the relationship between our work and Herbert Simon’s ideas on scientific discovery. We also wish to acknowledge the highly constructive comments offered by the GSLIS Research Writing Group. This research was supported by the National Science Foundation, Grant no. 0222848. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF. References Bawden, D. (1986). Information systems and the stimulation of creativity. Journal of Information Science, 12, 203–216. Becker, H. (1998). Tricks of the trade: How to think about your research while you’re doing it. Chicago: University of Chicago Press. Bernal, J. D. (1953). Science and industry in the nineteenth century. London: Routledge & Kegan Paul. Darden, L. (1997). Recent work in computational scientific discovery. In M. Shafto & P. Langley (Eds.), Proceedings of the nineteenth annual conference of the Cognitive Science Society (pp. 161–166). Mahwah, NJ: Lawrence Erlbaum. de Jong, H., & Rip, A. (1997). The computer revolution in science: Steps towards the realization of computer-supported discovery environments. Artificial Intelligence, 91, 225–256. Dunbar, K. (1993). Concept discovery in a scientific domain. Cognitive Science, 17, 397–434. Fry, J., & Talja, S. (2004). The cultural shaping of scholarly communication: Explaining e-journal use within and across academic fields. In L. Schamber & C. L. Barry (Eds.). Proceedings of the American Society for Information Science and Technology annual meeting (vol. 41, pp. 20–30). Medford, NJ: Information Today. Fujimura, J. H. (1987). Constructing ‘do-able’ problems in cancer research: Articulating alignment. Social Studies of Science, 17(2), 257–293.

820

C.L. Palmer et al. / Information Processing and Management 43 (2007) 808–820

Gerson, E. (2002). Premature discovery is failure of intersection among social worlds. In E. B. Hook (Ed.), Prematurity and scientific discovery (pp. 280–291). Berkeley: University of California Press. Glaser, B. G., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Aldine De Gruyter. Harwit, M. (1981). Cosmic discovery: The search, scope and heritage of astronomy. Brighton: Harvester Press. Holton, G. (1973). Thematic origins of scientific thought: Kepler to Einstein. Cambridge: Harvard University Press. Hsieh-Yee, I. (1993). Effects of search experience and subject knowledge on the search tactics of novice and experienced searchers. Journal of the American Society of Information Science, 44(3), 161–174. Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press. Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative process. Cambridge: MIT Press. MacMullin, S. E., & Taylor, R. S. (1984). Problem dimensions and information traits. The Information Society, 3(1), 91–111. Martyn, J. (1974). Information needs and uses. Annual Review of Information Science and Technology, 9, 3–23. Newell, A. (1969). Heuristic programming: Ill-structured problems. In J. S. Aronofsky (Ed.), Progress in operations research: Relationship between operations research and the computer (pp. 361–414). New York: Wiley. Palmer, C. L. (1999). Structures and strategies of interdisciplinary science. Journal of the American Society for Information Science, 50(3), 242–253. Palmer, C. L. (2001). Work at the boundaries of science: Information and the interdisciplinary research process. Dordrecht: Kluwer. Palmer, C. L., Cragin, M. H., & Hogan, T. P. (2004). Information at the intersections of discovery: Case studies in neuroscience. In L. Schamber & C. L. Barry (Eds.). Proceedings of the American Society for Information Science and Technology annual meeting (vol. 41, pp. 448–455). Medford, NJ: Information Today. Root-Bernstein, R. S. (1989). Discovering: Inventing and solving problems at the frontiers of scientific knowledge. Cambridge: Harvard University Press. Sihvonen, A., & Vakkari, P. (2004). Subject knowledge improves interactive query expansion assisted by a thesaurus. Journal of Documentation, 60(6), 673–690. Simon, H. A. (1986). Understanding the processes of science: The psychology of scientific discovery. In T. Ganelius (Ed.), Progress in science and its social conditions: Proceedings of a Nobel symposium (pp. 159–170). Oxford: Pergamon. Simon, H. A., Langley, P. W., & Bradshaw, G. L. (1981). Scientific discovery as problem-solving. Synthese, 47, 1–27. Smalheiser, N. R. (2005). The Arrowsmith Project: 2005 status report. In A. Hoffman, H. Motoda, & T. Scheffer (Eds.). Lecture notes in artificial intelligence (vol. 3735, pp. 26–43). Berlin: Springer. Smalheiser, N. R., & Swanson, D. R. (1994). Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease. Neuroscience Research Communication, 15, 1–9. Smalheiser, N. R., & Swanson, D. R. (1996). Linking estrogen to Alzheimer’s disease: An informatics approach. Neurology, 47, 809– 810. Smalheiser, N. R., & Swanson, D. R. (1998). Calcium-independent phospholipase A2 and schizophrenia. Archives of General Psychiatry, 55, 752–753. Strauss, A. L., & Corbin, J. (1998). The basics of qualitative research: Techniques and procedures for developing grounded theory (second ed.). Thousand Oaks, CA: Sage. Strauss, A., Fagerhaugh, S., Suczek, B., & Wiener, C. (1985). Social organization of medical work. Chicago: University of Chicago Press. Swanson, D. R., & Smalheiser, N. R. (1999). Implicit text linkages between Medline records: Using Arrowsmith as an aid to scientific discovery. Library Trends, 48(1), 48–59. Vakkari, P. (1999). Task complexity, problem structure and information actions: Integrating studies on information seeking and retrieval. Information Processing & Management, 35, 819–837. Valde´s-Pe´rez, R. E. (1999). Principles of human-computer collaboration for knowledge discovery in science. Artificial Intelligence, 107(2), 335–346. Vaughan, D. (1992). Theory elaboration: The heuristics of case analysis. In C. C. Ragin & H. S. Becker (Eds.), What is a case?: Exploring the foundations of social inquiry (pp. 173–202). Cambridge: Cambridge University Press. Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic formulation. Journal of the American Society for Information Science and Technology, 55(3), 246–258. Carole L. Palmer is an associate professor at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. Her research explores how information systems and services can best support interdisciplinary inquiry, discovery, and collaboration in the sciences and the humanities. Melissa Cragin is a doctoral student in the Graduate School of Library and Information Science at UIUC. Her research interests include biomedical information work, data curation, scholarly communication and the roles of libraries in supporting scientific research. She is currently investigating the use of shared digital data collections in neuroscience. Timothy P. Hogan is a doctoral candidate in the Graduate School of Library and Information Science, University of Illinois at Urbana– Champaign. His research focuses on how people living with chronic and/or acute illnesses interact with and use information and the development of effective consumer health information services and systems.