Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics

Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics

YJBIN 2534 No. of Pages 9, Model 5G 12 March 2016 Journal of Biomedical Informatics xxx (2016) xxx–xxx 1 Contents lists available at ScienceDirect ...

1MB Sizes 0 Downloads 36 Views

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 Journal of Biomedical Informatics xxx (2016) xxx–xxx 1

Contents lists available at ScienceDirect

Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin

2

Methodological Review

6 4 7 5

Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics

8

Gregory W. Hruby a, Konstantina Matsoukas b, James J. Cimino c, Chunhua Weng a,⇑

9 10 11 13 12 14 1 2 6 7 17 18 19 20 21 22 23 24 25 26

a

Department of Biomedical Informatics, Columbia University, New York, NY, USA Memorial Sloan Kettering Library, Memorial Sloan Kettering Cancer Center, New York, NY, USA c Informatics Institute, University of Alabama at Birmingham, AL, USA b

a r t i c l e

i n f o

Article history: Received 20 September 2014 Revised 3 March 2016 Accepted 4 March 2016 Available online xxxx Keywords: Information storage and retrieval Electronic health records Human computer interaction

a b s t r a c t Electronic health records (EHR) are a vital data resource for research uses, including cohort identification, phenotyping, pharmacovigilance, and public health surveillance. To realize the promise of EHR data for accelerating clinical research, it is imperative to enable efficient and autonomous EHR data interrogation by end users such as biomedical researchers. This paper surveys state-of-art approaches and key methodological considerations to this purpose. We adapted a previously published conceptual framework for interactive information retrieval, which defines three entities: user, channel, and source, by elaborating on channels for query formulation in the context of facilitating end users to interrogate EHR data. We show the current progress in biomedical informatics mainly lies in support for query execution and information modeling, primarily due to emphases on infrastructure development for data integration and data access via self-service query tools, but has neglected user support needed during iteratively query formulation processes, which can be costly and error-prone. In contrast, the information science literature has offered elaborate theories and methods for user modeling and query formulation support. The two bodies of literature are complementary, implying opportunities for cross-disciplinary idea exchange. On this basis, we outline the directions for future informatics research to improve our understanding of user needs and requirements for facilitating autonomous interrogation of EHR data by biomedical researchers. We suggest that cross-disciplinary translational research between biomedical informatics and information science can benefit our research in facilitating efficient data access in life sciences. Ó 2016 Published by Elsevier Inc.

28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

48 49

1. Introduction

50

Biomedical research has long benefited from a valuable and cost-effective data source: patient health records [1]. For example, the Apgar Scale [2] and the Goldman multifactorial index of cardiac risks [3] were both derived from analyses of patient health records. With the increasingly pervasive adoption of electronic health record (EHR) worldwide [4], many have recognized the rich clinical data increasingly made available by EHRs as a promising data resource for accelerating medical knowledge discovery [5] and for enabling comparative effectiveness research [6–10]. Subsequently, the demand for reusing EHR data for research among biomedical researchers has been rising rapidly [11–15]. Assisting biomedical researchers to interrogate EHR data has been a vital mission for the biomedical informatics research community. How-

51 52 53 54 55 56 57 58 59 60 61 62

⇑ Corresponding author at: Department of Biomedical Informatics, Columbia University, 622 West 168 Street, PH-20, New York, NY 10032, USA. E-mail address: [email protected] (C. Weng).

ever, this task faces significant human and technological barriers [10,16,17]. Current data captured by EHRs are not optimized for secondary uses beyond clinical care or administration-centered documentation practices so that many institutions employ intermediating data analysts to retrieve EHR data for biomedical researchers, with varying degrees of assistance from self-service query tools. The use of intermediaries may not scale to large data networks such as the clinical data research networks (CDRNs) as part of the PCORnet [18] established by the Patient Centered Outcomes Research Institute [19]. For example, the heterogeneity of data representations across institutions and the complex, idiosyncratic local data collection processes that often remain ‘‘black boxes” to intermediaries are serious barriers facing users of the data contained in PCORnet. To contain cost for involved expensive operations, many institutions have to charge clinician scientists for reusing such data collected during patient care for research. Meanwhile, self-service query support is still at its early stage of development and may not support sophisticated data queries [20,21].

http://dx.doi.org/10.1016/j.jbi.2016.03.004 1532-0464/Ó 2016 Published by Elsevier Inc.

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 2 81 82 83 84 85 86 87 88 89 90

G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

By identifying and reviewing existing theories and best practices for EHR data interrogation, we aim to inform the design of next-generation EHR data interrogation aids that directly facilitate biomedical researchers to autonomously retrieve and reuse this data for clinical and translational research. Towards this goal, this paper contributes a literature review on this topic. We summarized existing approaches, identified research gaps, and recommended research priorities. Although this review focuses on EHR data, the knowledge gained may generalize to interactive end-user data interrogation for other reusable health data resources.

91

2. Methods

92

2.1. Development of a conceptual framework for interactive data retrieval

93

137

An information retrieval process addresses information needs using a sequence of tasks [22]. The complexity of the task sequence is dependent on the information retriever’s a priori knowledge of the information need, the information retrieval process stipulated by data owners, and the complexities of each of the tasks used to complete the information retrieval process [23–27]. Many models have been developed for characterizing the information retrieval process or for investigating how information systems enable users during this process [28–37]. For example, the berry-picking model [28] and the sense-making model [30] focus on how the user iteratively refines his or her information needs based on their conceptualizations of the information space. Among all existing models, only one developed by Bystrom and Jarvelin explicitly defined three entities that influence the complexities of an information retrieval process: user, channel, and source, to characterize the information retrieval process [24]. The user entity focuses on the user’s profiles, communication styles, and knowledge of data. The source concerns data representations towards optimal data retrieval efficiency. The channel masks the complexities of the source and translates user information needs to data representations. Therefore, source is the container of information and channel guides efficient navigation of the source. We adopted this conceptual framework to organize the literature for interactive EHR data retrieval. In this paper, we survey related methods and theories in the context of EHR data retrieval for secondary use by end users who are unfamiliar with the data, such as biomedical researchers and clinician scientists. Since this paper aims to augment end users with improved query formulation, we focus primarily on effort supporting the user and the channel, while briefly describe existing efforts on the source. We co-opted the constructs of user, channel, and source combining them with the concepts of query formulation and query execution, as shown in Fig. 1. For example, a researcher may want to identify an institution’s mortality rate among its patients undergoing coronary artery bypass. Query formulation transforms vague data requests (e.g., ‘‘adult patients younger than 75 years old with coronary artery bypass surgery last year”) into contextualized data requests consisting of specific EHR data elements (e.g., ‘‘patient DOB, Current Fiscal Year, billing code for coronary artery bypass billed in this current fiscal year”). The step of query execution further translates this query from contextualized data elements into executable database queries consisting of disparate data types and represented by local terminologies using The Structure Query Language (SQL).

138

2.2. Literature search

139

We adopted the post-positivist model for research [38]. We iteratively searched for related work published between 2009

94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136

140

and 2013 allowing us to treat all experience as data. Following this model, we searched beyond the field of biomedical informatics or clinical research informatics that were obvious to be relevant and included the literature in informatics and computer and information science. Also, we categorized all included citations by their focus on user, source, and channel so that significant amounts of qualitative information were categorized to produce quantitative information to help us draw the big picture and deduce evidence gaps. We seeded our search with 29 articles proposed by the senior author (CW) [19,23,24,28,29,39–62]. Additionally, citation searches within these articles provided an additional 45 references [16,22,25–27,30–34,37,63–102]. These 74 articles served as the basis for the development of the search query. With our initial search query, we iteratively searched and reviewed the identified articles, incorporated new search keywords as they emerged, revised our search string and article inclusion/exclusion criteria iteratively according to their relevance as determined by manual review. We surveyed both the information science literature (i.e., http://dl.acm.org) and biomedical informatics literature (i.e., MEDLINE). We limited our search to the main journal citation databases, the ACM Digital Library, and Medline, for the respective fields of information science and biomedical sciences because we feel this will provide a representative sample for our topic. Fig. 2 is a flow chart that highlights the final search strings for PubMed and ACM databases and the inclusion and exclusion criteria for selecting articles for this review. The first author generated the final search string and reviewed the title and abstracts of the returned articles; articles containing any of the exclusion criteria were removed from the pool. Next, the first author iteratively reviewed and annotated the 125 included articles using the conceptual framework developed in Section 2.1. For each annotation, the first author wrote a summary and justification paragraph. After annotation, the first author reviewed the summary and justification paragraphs for each set of articles against the conceptual framework components and derived themes within these sections. For the source, the major themes identified were EHR data modeling (how data is structured and the standards used to store data elements) and warehousing (dedicated silos of data for secondary use). For the User, the major themes identified were Information Need (defining the complexity of the need) and user modeling (understanding the user attributes and the information seeking strategies used). For the channel, the major themes identified were query formulation (the process of defining an information need) and execution (the process of translating an information need into an executable database query). Table 1 organizes the articles according to our conceptual framework. As shown in Table 1, more work fell onto user modeling, human intermediaries, and reference interview in the field of information science than in the field of biomedical informatics. In the following sections, we will synthesize the major themes from each discipline and compare and contrast their ideas from difference sources.

141

3. Source

194

The barriers to task-based data access in life sciences fall into two categories: (1) human factors (e.g., a user lacking a correct conceptualization of task complexities); (2) system factors (e.g., technology limitations in the existing systems such as data heterogeneity and fragmentation) [17]. We presented example known barriers and corresponding recommended solutions in Table 2. Human factors relate to the user, while system factors relate to the metadata of the source, or in this instance, the lack of metadata concerning known and underlying EHR data quality issues.

195

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193

196 197 198 199 200 201 202 203

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

3

Fig. 1. A conceptual framework for interactive data retrieval.

Fig. 2. The search strings and article selection flowchart (⁄ Articles could be classified into multiple categories as some content spans multiple categories).

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 4

G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

Table 1 The distribution of relevant topics in two bodies of literature. Biomedical informatics

Information science

Source EHR data modeling EHR data warehousing

[8,12,41,46,48,52,70,71,90,103–109] [40,97,110]

User Information need Information need complexity

[50,53,72,111–113]

[24,60,92]

User modeling Information seeking processes User cognitive styles

[54,59,77,84] [17,55]

[22–24,26,36,57,67] [23,25,28–37,63,85,86,91]

[17,27,46,51,62,70,71,98,105,114–119] [17,114,120,121] [47,58,72,75,80,81] [20,41,51,52,73,82,88,97,101,113,123–128] [1,45,134,135]

[100] [26] [122] [42,93,129–133] [39,44,61,83,89,94,95] [39,43,56,64–66,68,69,74,79,87,94,96,102]

Channel Task 1: Query formulation Concept representation Characterization of data integrity Query templates Self-service query tools Human intermediaries Reference interview Task 2: Query execution Phenotyping

[49,62,76,78,99,117,136–138]

Table 2 Known themes regarding EHR data access barriers and solutions. Level

Barriers

Solutions

Human factors System factors

Known and latent EHR data quality issues [17,120,121] System interoperability [108]

Transparent reporting of data limitations for intended uses [26,114]

Real time analytics [8,12] Data fragmentation [108] Data heterogeneity [121]

204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232

Streamlining data access workflow [71,52] Data warehousing [49,78] Data modeling and integration strategies (i.e. The Observational Medical Outcomes Partnership Common Data Model) [46,48,109]

In the context of this study, we discuss data warehousing at the institutional level rather than at the state or national level, although the same principles may apply to both. Data warehousing has been a focus in the clinical research informatics community for overcoming technical barriers to data access by providing efficient access to integrated EHR data, which can be centralized or federated. The centralized infrastructure [40,41,52] avoids data heterogeneity but can introduce challenges for incorporating new or latent data elements over time because each update affects the entire database, hence not being able to scale easily. In contrast, the federated architecture is flexible [90,103–105] for allowing autonomous data control and growth over time. It allows data representation heterogeneity [70,106,107]. It also enables the leveraging of distributed local expertise for data modeling and data quality control and supports geographic distribution by multiple stakeholders. The national center for education statistics has summarized the tradeoffs between centralized and federated data repositories [139]. Briefly, centralized systems ease data governance, increase data retrieval performance, provide uniform data for efficient data mining, entail a high-cost burden for ensuring data currency and completeness, and are harder to scale for evolving data needs and different data access workflows [140]. Many institutions have centralized data repositories such as STRIDE [110]. The most widely used platform, i2b2 and its SHRINE architecture, supports distributed and federated data repository [97]. The federated architecture represents an established model for most large data networks such as PCORnet [18,141].

4. User

233

4.1. Information need complexity

234

One component of user modeling is understanding the complexity of information need, which depends on the diversities in the use context [72], variations in information seeking behaviors [24], and heterogeneity in languages used to express the information need [92]. Others have proposed a categorical scale of data need complexity by measuring the amount of work required to accomplished the task for satisfying the information need [60,86]. Structures have been defined to characterize a complex information need, including a problem statement, an event of interest, a comparison event (if necessary), and potential effects of the event of interest [50]. Unfortunately, little is known about the data needs of biomedical researchers [111–113]. The very few studies available have largely focused on identifying sets of the major data elements needed to facilitate research in particular medical domains of interest. Cimino et al. leveraged these key data elements needed by researchers to inform the development of a user-centric query tool [113]. What has been lacking includes a thorough understanding of user preferences and search behaviors, as well as communication patterns between biomedical researchers and query analysts for clarifying data needs iteratively.

235

4.2. User cognitive styles

255

Five user characteristics influence a user’s information seeking tactics:

256

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254

257

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

5

Table 3 A taxonomy of clarification questions utilized during need negotiation. Question type

Goal

EHR data clarification examples

Relevance threshold Ambiguity in conjoined facets Example concept

To establish the user’s relevance threshold, a mapping of a continuous scale into a binary decision To establish the relationship type between conjoined facets

What is the A1c threshold for the diabetes patients that you are looking for?

To ensure an identified facet is relevant to the user’s information need To expand or narrow the scope of the user’s information need on the specificity of a particular facet To establish the user’s interest in facets that are conceptually related, but not directly requested To define both the user’s expectation of the information retrieved

Would patients without Diabetes diagnoses, but taking a medication for diabetes be of interest to you? Do you want patients with both types of Diabetes? Type 1 and 2?

Closely related or subset concept Related topical aspects Acceptability of summaries

258 259 260 261 262 263 264 265 266 267

Do you only want patients that have both a diagnosis of diabetes and hypertension or patients with a diabetes diagnosis with or without a hypertension diagnosis?

Does it matter what treatment protocol the diabetic patients are on? Do you just want to know how many patients fit your criteria?

(1) Phases of mental model of information – confusion, doubt, threat, hypothesis testing, assessing, and reconstructing [63]. (2) Levels of need – visceral, conscious, formal, and compromised [39]. (3) Levels of specificity – new problem, new situation, experiential needs, and well understood situation [23]. (4) Expression – questions connections, and commands gap [23,39]. (5) Mood – invitational, and indicative [63].

268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304

Kuhlthau provides a great amalgamation of these user characteristics in her theoretical foundation of the information seeking process [67], which was well supported by Vakkari’s review [57]. Additionally, the information seeking process has been modeled within the biomedical literature. Mendonça et al. [54] and Hung et al. [59] have provided models for the biomedical literature information seeking process, which propose to aid user’s search strategies through well-structured clinical queries and by leveraging the knowledge of human search experts, respectively. User cognitive styles shape information seeking processes [36]. Many describe these styles along two orthogonal axes: analytic and descriptive. The analytic cognitive style captures an active approach to information seeking in which conceptual level questioning is used to resolve information need, whereas the descriptive cognitive style represents a passive approach, where concentration on the most detailed level of the subject matter is used to resolve an information need. User cognitive styles are either passive (high descriptive and low analytic with attention to detailed questions) or active (high analytic and low descriptive questioning). The active styles represent more effective and efficient search strategies than passive styles [37]. A user’s domain knowledge and technical knowledge are both associated with their cognitive styles and effective search strategies [17,50,91]. The users’ cognitive styles can be differentiated [85,86] by search tactics. Users of varying cognitive styles have different sense making strategies or processes. Studies of cognitive styles offer a more generalizable mechanism to stratify users and to predict individual information seeking styles. Cognitive science allows characterization of the demand characteristics of a particular task and to focus on the appropriate problem dimensions [55]. Although user cognition during EHR data interrogation is rarely studied in the biomedical informatics literature, especially for facilitating EHR data interrogation, such studies are much needed. Cognitive studies of users can enable user centered EHR data interrogation designs aiming to improve the user experience.

5. Channel

305

The complexity of a data source is multidimensional, including heterogeneous semantic representations [46,70,71], opaque data integrity, complex time expressions [98,100], and fragmented knowledge of logical data constructs [17,27,114]. A channel enables users to navigate data despite complexities by providing users with an abstract mechanism to interact with data sources during query formulation or query execution.

306

5.1. Query formulation

313

The query formulation component facilitates the iterative interaction between the user and the source to formulate a query in a user’s language. Hripcsak et al. investigated two EHR data retrieval channels, AccessMed and Query by Review, and found neither achieved adequate performance, indicating the difficulty of query formulation for EHR data [51]. Next we review common aids for query formulation and related execution challenges: i.e., selfserving query tools, query templates, as well as human intermediaries.

314

5.1.1. Human intermediaries Human intermediaries are often employed to formulate user queries to ensure feasibility and precision [44,45,83,89,134]. Intermediaries usually have received formal training and possess a deep understanding of work culture and technical skills for data querying [1,61,95,135]. The biomedical literature provides scant knowledge regarding how human intermediaries operate in the biomedical information rich domain. Information science has extensively studied human intermediaries, most notably, librarians and their development of the reference interview technique [39,94]. Reference interviews elicit tacit user needs, specify vague queries, narrow overly broad questions, and suggest further dimensions of the information need that the user may not have expressed but are logically related to the user-stated objective. It enables a skillful interrogation process widely adopted by librarians for converting vague and general data request provided by users into specific data queries expressed using user language [39,56,65,66,68,69,87,94,96]. Elicitation strategies used in reference interviews have been explored to improve negotiation of information needs [43,64,74,79]. Specifically, interrogation strategies are primarily developed to obtain the user’s objective surrounding the information need. When users are aware of the reference interview’s purpose, they are willing to provide additional information on objective and intent. In a related study, Lin et al. analyzed need

323

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

307 308 309 310 311 312

315 316 317 318 319 320 321 322

324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 6

G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

Table 4 The knowledge gaps and recommendations for advancing EHR interrogation. Aspects

348 349 350 351 352 353 354 355

356 357 358 359 360 361 362 363 364 365 366 367 368

369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397

Knowledge gaps

Recommendations

User

 Lack of measure of information need complexity  Lack of knowledge of how the cognitive styles of medical researchers affect the information seeking process

 Develop metrics for measuring EHR information need complexities  Conduct qualitative studies of the information seeking processes of multidisciplinary medical researchers and their barriers to clinical data access

Query formulation channel

 Lack of formalized structure for the medical researcher to express their information need

 Investigate other formal structures used for document retrieval, i.e. PICO framework

 Poor understanding of the data need-negotiation process performed by data intermediaries

 Support reference interview for EHR data interrogation

negotiations and extracted a taxonomy of clarification questions that fit within a set of six classes [102]. The taxonomy applied in the context of EHR data interrogation is illustrated in Table 3, with example clarification questions. Our results imply that in the context of interactive EHR data retrieval, the reference interview may provide human intermediaries with a more efficient workflow to best extract a non-vague description of the data need from the user. 5.1.2. Query templates Templates are another effective technique for expressing standards-based structured data needs free of ambiguity and vagueness [77,84]. A query template provides an organizational structure for the user to describe their information need in a non-vague structure [80]. Templates have been developed to access clinical data [47] and medical literature [58,72,75,80]. The Patient, Intervention, Control/Comparison, and Outcome (PICO) framework is extensively used to explore the medical literature for relevant resources [81,122]. Currently, there is no wellaccepted standard template based on community consensus. Instead, many medical institutions require users to complete data requests using free text. 5.1.3. Self-service query tools Since human intermediaries are expensive and timeconsuming, self-service query aids have been pursued in many institutions in recent years [41,51,73,97,101,113,123,125,127,128 ]. Some are form-based, while others support queries in natural language [42,51]. Visual query formulation is a recent trend and is expected to reduce user cognitive load by presenting information intuitively to the user [129]. The Informatics for Integrating Biology and the Bedside (i2b2) project represents the most widely adopted self-service EHR data retrieval system. The system’s terminology explorer and query builder allow the user to search the terminology for applicable terms and build cohorts using a frame system with Boolean constraints [52,82,88,97,113,123–128]. Deshmukh et al. studied the various types of data requests these applications were able to resolve. The study suggested that i2b2 facilitated relatively simple, cohort identification queries. They also acknowledged that the majority of requests they studied were ‘‘simple” queries [20]. These reports indicate that the majority of self-service tools for EHR data support a limited scope of data specification. Additionally, these tools place the burden to identify the correct terms associated with a particular medical concept on the user. Many complex data request require not just simple constraints, e.g. ‘‘all patients diagnosed with diabetes between May and July of 2012”, but complex relations between data elements, e.g. ‘‘all patients with their first recorded diagnosis of diabetes between May and July of 2012, and all lab glucose tests after their diagnosis and before the start of treatment.” For these complex temporal queries, temporal query tools can be used to visualize raw data or concepts over absolute and relative temporal timelines

[93,101,130–133]. Though experimental, these tools offer a solution to a complex problem of temporal specification and visualization of EHR data. Meanwhile, significant EHR data processing and transforming are needed for these systems to work appropriately.

398

5.2. Query execution

402

The query execution component focuses on the conversion of a user query into an executable database query by mapping medical concepts specified by the user to the EHR data elements that define that concept. The Electronic Medical Records and Genomics (eMERGE) consortium [117] has studied the problem of phenotyping disease concepts through enumerable data elements within the EHR. The EMERGE consortium has shown that each disease phenotype contains significant heterogeneity, underlying elements representing nested Boolean logic, complex temporality and ubiquitous ICD-9 codes [117,136]. As of 2013, the group has validated 13 phenotypes [62]. Although the temporal nature of EHR data was considered only in some of the eMERGE phenotypes, temporal abstraction is an important technique for EHR phenotyping. Post et al. have established the PROTEMP method, which allows for the abstraction of temporal data events [99]. Additionally, Shahar’s framework on temporal abstraction has described promising methods for formally representing temporal patterns [76,137,138].

403

6. Discussion

420

Interactive EHR data retrieval involves complex interactions among users, sources, and channels. The healthcare industry has heavily invested in infrastructures for data integration. To maximize the return on investment and to use these resources to advance medicine, we need to make such data accessible to biomedical researchers for various computing needs. Self-service query tools have not fully addressed user needs and hence make human intermediaries indispensable in many institutions. These intermediaries utilize a needs-negotiation process with the user. Barriers facing this process include the lack of medical and technical knowledge by the intermediary and the user, respectively. Bridging these knowledge gaps for the intermediary and user may engender efficient communication. Furthermore, a user often presents a vague understanding and description of their information need. Intermediaries may benefit from a standardized structure through which requests can be organized, which may reduce the ambiguity of the request and allow the intermediary to focus on other tasks, i.e. query execution. Relatively little is known regarding biomedical researcher’s cognitive styles and information seeking strategies. Table 4 lists key knowledge gaps and potential recommendations for bridging those gaps. Additional exploratory studies are needed to bridge the knowledge gaps concerning how biomedical researchers interrogate EHR data and what their barriers are. We can invest more

421

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

399 400 401

404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419

422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

489

effort on interactive information retrieval that augments not only the source but also the user. Information-seeking models explain the sub-optimal outcomes resulting from current methods used for EHR data interrogation. The granularity of data that biomedical researchers are seeking adds more complexity to existing information seeking models. Additional understanding of process-oriented EHR data access by biomedical researchers is needed. To this end, we extracted from the literature three promising concepts to aid in the construction of an ideal process-oriented EHR data interrogation model. First, both information science and biomedical informatics have established the important role semantics play in optimal information retrieval. For example, the PICO framework is an excellent user aid, which helps organize and express the information need of clinicians. In the context of EHR data interrogation, the PICO framework can be potentially a good starting point for supporting the expression of biomedical researchers’ EHR data need. Second, the complexity of information need shapes information search tactics. A metric for complexity assessment of the information need can optimize resource allocation while resolving the user’s information need. For example, complex data requests would be directed to an informatician, whereas simple request would be facilitated through existing self-service query tools. A standardized method for EHR data need complexity assessment can further enable global resource optimization. Third, the established reference interview has effectively helped librarians to clarify user needs. Informaticians provide a similar role as librarians but lack a guideline on how to conduct reference interview for EHR data. Experience is the only way for informaticians to gain insights and expertise for this task. To increase the efficiency and effectiveness of the EHR data needs negotiation, an EHR-based reference interview, conducted by an informatician, may aid the query formulation process in the translation of vague EHR data requests into specific data queries. More studies are needed in this area to enable reference interview for EHR data. Our study has two major limitations. First, we developed our search criteria based on pre-selected topics. This method may be biased towards self-selected topics and leave out topics relevant to this review but not searchable by the query derived from the pre-selected topics. Nevertheless, we used an established method to identify a focused topic set and believe this review is representative of what is available in the literature. Second, our focus on the most recent four years of literature may have excluded seminal articles in the field from the past; however, we believe our exhaustive citation search should have largely mitigated this problem.

490

7. Conclusions

491

501

This review surveys the methodological considerations for interactive EHR data interrogation. We have identified knowledge gaps and research opportunities for advancing EHR data interrogation. Our results show that support for reference interview for EHR data is a promising direction for improving user autonomy for biomedical researchers during EHR data interrogation. More user understanding is needed to enable such support cost-effectively. We suggest that cross-disciplinary translational research between biomedical informatics and information science is needed to apply theories and techniques from information science to facilitate efficient end user data interrogation in life sciences.

502

Acknowledgments

503

This research was funded under NLM grant R01LM009886 (PI: Chunhua Weng), and 5T15LM007079 (PI: George Hripcsak), and CTSA award UL1 TR000040 (PI: Henry Ginsberg). Its contents are

445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488

492 493 494 495 496 497 498 499 500

504 505

7

solely the responsibility of the authors and do not necessarily represent the official view of NIH.

506

References

508

[1] G.W. Hruby, J. McKiernan, S. Bakken, C. Weng, A centralized research data repository enhances retrospective outcomes research capacity: a case report, J. Am. Med. Inform. Assoc. 20 (3) (2013) 563–567. [2] V. Apgar, A proposal for a new method of evaluation of the newborn infant, Curr. Res. Anesth. Analg. 32 (4) (1953) 260–267. [3] L. Goldman, D.L. Caldera, S.R. Nussbaum, F.S. Southwick, D. Krogstad, B. Murray, D.S. Burke, T.A. O’Malley, A.H. Goroll, C.H. Caplan, J. Nolan, B. Carabello, E.E. Slater, Multifactorial index of cardiac risk in noncardiac surgical procedures, N. Engl. J. Med. 297 (16) (1977) 845–850. [4] J. Adler-Milstein, C.M. DesRoches, P. Kralovec, G. Foster, C. Worzala, D. Charles, T. Searcy, A.K. Jha, Electronic health record adoption in US hospitals: progress continues, but challenges persist, Health Aff. (Millwood) 34 (12) (2015) 2174–2180. [5] E. Borycki, D. Newsham, D. Bates, eHealth in North America, Ybk. Med. Inform. 8 (1) (2012) 3. [6] L.W. D’Avolio, W.R. Farwell, L.D. Fiore, Comparative effectiveness research and medical informatics, Am. J. Med. 123 (12) (2010) e32–e37. [7] E. Holve, C. Segal, M.H. Lopez, A. Rein, B.H. Johnson, The Electronic Data Methods (EDM) forum for comparative effectiveness research (CER), Med. Care 50 (Suppl) (2012) S7–S10. [8] B.J. Miriovsky, L.N. Shulman, A.P. Abernethy, Importance of health information technology, electronic health records, and continuously aggregating data to comparative effectiveness research and learning health care, J. Clin. Oncol. 30 (34) (2012) 4243–4248. [9] S. Hoffman, A. Podgurski, Big bad data: law, public health, and biomedical databases, Public Health, and Biomedical Databases (October 30, 2012), J. Law, Med. Ethic. (2012) (Forthcoming). [10] W.R. Hersh, M.G. Weiner, P.J. Embi, J.R. Logan, P. Payne, E.V. Bernstam, H.P. Lehmann, G. Hripcsak, T.H. Hartzog, J.J. Cimino, Caveats for the use of operational electronic health record data in comparative effectiveness research, Med. Care (2013). [11] D. Blumenthal, M. Tavenner, The ‘‘meaningful use” regulation for electronic health records, N. Engl. J. Med. 363 (6) (2010) 501–504. [12] T.D. Wade, R.C. Hum, J.R. Murphy, A Dimensional Bus model for integrating clinical and research data, J. Am. Med. Inform. Assoc. 18 (Suppl 1) (2011) i96– i102. [13] E. Holve, C. Segal, M. Hamilton Lopez, Opportunities and challenges for comparative effectiveness research (CER) with Electronic Clinical Data: a perspective from the EDM forum, Med. Care 50 (Suppl) (2012) S11–S18. [14] A. Rein, Finding Value in Volume: An Exploration of Data Access and Quality Challenges, AcademyHealth: Briefs and Reports, 2012, p. 9. [15] Y. Yip, Unlocking the potential of electronic health records for translational research. Findings from the section on bioinformatics and translational informatics, Ybk. Med. Inform. 7 (1) (2012) 135. [16] P. Arzberger, P. Schroeder, A. Beaulieu, G. Bowker, K. Casey, L. Laaksonen, D. Moorman, P. Uhlir, P. Wouters, Promoting access to public research data for scientific, economic, and social development, Data Sci. J. 3 (2004) 135–152. [17] S. Kumpulainen, K. Järvelin, Barriers to task-based information access in molecular medicine, J. Am. Soc. Inform. Sci. Technol. 63 (1) (2012) 86–97. [18] F.S. Collins, K.L. Hudson, J.P. Briggs, M.S. Lauer, PCORnet: turning a dream into reality, J. Am. Med. Inform. Assoc. 21 (4) (2014) 576–577. [19] J.V. Selby, A.C. Beal, L. Frank, The Patient-Centered Outcomes Research Institute (PCORI) national priorities for research and initial research agenda, JAMA 307 (15) (2012) 1583–1584. [20] V. Deshmukh, S. Meystre, J. Mitchell, Evaluating the informatics for integrating biology and the bedside system for clinical research, BMC Med. Res. Methodol. 9 (1) (2009) 70. [21] S.M. Meystre, V.G. Deshmukh, J. Mitchell, A clinical use case to evaluate the i2b2 Hive: predicting asthma exacerbations, in: AMIA Annu. Symp. Proc., 2009, pp. 442–446. [22] A. Spink, T. Wilson, Toward a theoretical framework for information retrieval (IR) evaluation in an information seeking context, in: Mira, 1999. [23] N.J. Belkin, R.N. Oddy, H.M. Brooks, ASK for information retrieval: Part I. Background and theory, J. Doc. 38 (2) (1982) 61–71. [24] K. Byström, K. Järvelin, Task complexity affects information seeking and use, Inf. Process. Manage. 31 (2) (1995) 191–213. [25] C.L. Borgman, Why are online catalogs still hard to use?, JASIS 47 (7) (1996) 493–503 [26] P. Vakkari, Task complexity, problem structure and information actions: integrating studies on information seeking and retrieval, Inf. Process. Manage. 35 (6) (1999) 819–837. [27] D. Wang, D.R. Kaufman, E.A. Mendonca, Y.H. Seol, S.B. Johnson, J.J. Cimino, The cognitive demands of an innovative query user interface, Proc. AMIA Symp. (2002) 850–854. [28] M.J. Bates, The design of browsing and berrypicking techniques for the online search interface, Online Inform. Rev. 13 (5) (1989) 407–424. [29] B. Dervin, From the mind’s eye of the user: the sense-making qualitativequantitative methodology, Qual. Res. Inform. Manage. 9 (1992) 61–84. [30] B. Dervin, Sense-making theory and practice: an overview of user interests in knowledge seeking and use, J. Knowl. Manage. 2 (2) (1998) 36–46.

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

507

509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 8 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674

G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx

[31] P.J. Feltovich, R.R. Hoffman, D. Woods, A. Roesler, Keeping it too simple: how the reductive tendency affects cognitive engineering, Intell. Syst., IEEE 19 (3) (2004) 90–94. [32] G. Klein, B. Moon, R.R. Hoffman, Making sense of sensemaking 2: a macrocognitive model, Intell. Syst., IEEE 21 (5) (2006) 88–92. [33] G. Klein, B. Moon, R.R. Hoffman, Making sense of sensemaking 1: alternative perspectives, Intell. Syst., IEEE 21 (4) (2006) 70–73. [34] G. Klein, J.K. Phillips, E.L. Rall, D.A. Peluso, A data-frame theory of sensemaking, Expertise Out Context (2007) 113–155. [35] M.R. Olsson, Re-thinking our concept of users, Aust. Acad. Res. Libr. 40 (1) (2009) 22–35. [36] A. Blandford, S. Attfield, Interacting with information, Synth. Lect. Hum. – Centered Inform. 3 (1) (2010) 1–99. [37] N. Ford, R. Ford, Towards a cognitive theory of information accessing: an empirical study, Inf. Process. Manage. 29 (5) (1993) 569–585. [38] B.M. Wildemuth, Post-positivist research: two examples of methodological pluralism, Libr. Quart. (1993) 450–468. [39] R.S. Taylor, Question-Negotiation an Information-Seeking in Libraries, DTIC Document, 1967. [40] H.R. Warner, J.D. Morgan, High-density medical data management by computer, Comput. Biomed. Res. 3 (5) (1970) 464–476. [41] H.R. Warner, Knowledge sectors for logical processing of patient data in the HELP system, in: Proceedings of the Annual Symposium on Computer Application in Medical Care, American Medical Informatics Association, 1978, p. 401. [42] M. Jarke, J. Tuner, E.A. Stohr, Y. Vassiliou, N.H. White, K. Michielsen, A field evaluation of natural language for data retrieval, Softw. Eng., IEEE Trans. 1 (1985) 97–114. [43] B. Dervin, P. Dewdney, Neutral questioning: a new approach to the reference interview, RQ (1986) 506–513. [44] C.L. Borgman, N.J. Belkin, W.B. Croft, M.E. Lesk, T.K. Landauer, Retrieval systems for the information seeker: can the role of the intermediary be automated?, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, 1988, pp 51–53. [45] R.B. Merz, C. Cimino, G.O. Barnett, D.R. Blewett, J.A. Gnassi, R. Grundmeier, L. Hassan, Q & A: a query formulation assistant, Proc. Annu. Symp. Comput. Appl. Med. Care (1992) 498–502. [46] P.A. Schoening, C.A. Abrams, M.G. Kahn, An object model for uniform access to heterogeneous databases, in: Proceedings of the Annual Symposium on Computer Application in Medical Care, American Medical Informatics Association, 1993, p. 502. [47] J.J. Cimino, A. Aguirre, S.B. Johnson, P. Peng, Generic queries for meeting clinical information needs, Bull. Med. Libr. Assoc. 81 (2) (1993) 195–206. [48] D.A. Lindberg, B.L. Humphreys, A.T. McCray, The unified medical language system, Meth. Inf. Med. 32 (4) (1993) 281–291. [49] S.B. Johnson, G. Hripcsak, J. Chen, P. Clayton, Accessing the Columbia clinical repository, Proc. Annu. Symp. Comput. Appl. Med. Care (1994) 281–285. [50] W.S. Richardson, A.L. Murphy, Ask, and ye shall retrieve, Evidence Based Med. 3 (4) (1998) 100–101. [51] G. Hripcsak, B. Allen, J.J. Cimino, R. Lee, Access to data: comparing AccessMed with query by review, J. Am. Med. Inform. Assoc. 3 (4) (1996) 288–299. [52] S.N. Murphy, M.M. Morgan, G.O. Barnett, H.C. Chueh, Optimizing healthcare research data warehouse design through past COSTAR query analysis, in: Proceedings of the AMIA Symposium, American Medical Informatics Association, 1999, p. 892. [53] E.A. Mendonça, J.J. Cimino, Building a knowledge base to support a digital library, Stud. Health Technol. Inform. 1 (2001) 221–225. [54] E.A. Mendonça, J.J. Cimino, S.B. Johnson, Y.-H. Seol, Accessing heterogeneous sources of evidence to answer clinical questions, J. Biomed. Inform. 34 (2) (2001) 85–98. [55] V.L. Patel, J.F. Arocha, D.R. Kaufman, A primer on aspects of cognition for medical informatics, J. Am. Med. Inform. Assoc. 8 (4) (2001) 324–343. [56] M.M. Wu, Y.H. Liu, Intermediary’s information seeking, inquiring minds, and elicitation styles, J. Am. Soc. Inform. Sci. Technol. 54 (12) (2003) 1117–1133. [57] P. Vakkari, Task-based information searching, Annu. Rev. Inform. Sci. Technol. 37 (1) (2005) 413–464. [58] C. Schardt, M.B. Adams, T. Owens, S. Keitz, P. Fontelo, Utilization of the PICO framework to improve searching PubMed for clinical questions, BMC Med. Inform. Decis. Mak. 7 (1) (2007) 16. [59] P.W. Hung, S.B. Johnson, D.R. Kaufman, E.A. Mendonça, A multi-level model of information seeking in the clinical domain, J. Biomed. Inform. 41 (2) (2008) 357–370. [60] Y. Li, N.J. Belkin, A faceted approach to conceptualizing tasks in information seeking, Inf. Process. Manage. 44 (6) (2008) 1822–1837. [61] J.A. Rankin, S.F. Grefsheim, C.C. Canto, The emerging informationist specialty: a systematic review of the literature, J. Med. Libr. Assoc.: JMLA 96 (3) (2008) 194. [62] K.M. Newton, P.L. Peissig, A.N. Kho, S.J. Bielinski, R.L. Berg, V. Choudhary, M. Basford, C.G. Chute, I.J. Kullo, R. Li, Validation of electronic medical recordbased phenotyping algorithms: results and lessons learned from the eMERGE network, J. Am. Med. Inform. Assoc. 20 (e1) (2013) e147–e154. [63] G.A. Kelly, A Theory of Personality: The Psychology of Personal Constructs, Norton, New York, 1963. [64] G.B. King, Open & closed questions: the reference interview, RQ 12 (2) (1972) 157–160.

[65] S.D. Knapp, The reference interview in the computer-based setting, RQ 17 (4) (1978) 320–324. [66] M.J. Lynch, Reference interviews in public libraries, Libr. Quart. (1978) 119– 142. [67] C.C. Kuhlthau, Inside the search process: information seeking from the user’s perspective, JASIS 42 (5) (1991) 361–371. [68] M.D. White, The dimensions of the reference interview. RQ (1981) 373–381. [69] M.D. White, Evaluation of the reference interview, RQ (1985) 76–84. [70] M.G. Kahn, The desktop database dilemma, Acad. Med. 68 (1) (1993) 34–37. [71] M.G. Kahn, Clinical databases and critical care research, Crit. Care Clin. 10 (1) (1994) 37. [72] W.S. Richardson, M.C. Wilson, J. Nishikawa, R.S. Hayward, The well-built clinical question: a key to evidence-based decisions, ACP J. Club 123 (3) (1995) A12–A13. [73] S. Steib, R. Reichley, S. McMullin, K. Marrs, T.C. Bailey, W.C. Dunagan, M. Kahn, Supporting ad-hoc queries in an integrated clinical database, in: Proceedings of the Annual Symposium on Computer Application in Medical Care, American Medical Informatics Association, 1995, p. 62. [74] P. Dewdney, G. Michell, Asking ‘‘why” questions in the reference interview: a theoretical justification, Libr. Quart. (1997) 50–71. [75] C. Counsell, Formulating questions and locating primary studies for inclusion in systematic reviews, Ann. Intern. Med. 127 (5) (1997) 380–387. [76] Y. Shahar, A framework for knowledge-based temporal abstraction, Artif. Intell. 90 (1) (1997) 79–133. [77] R. Snowball, Using the clinical question to teach search strategy: fostering transferable conceptual skills in user education by active learning, Health Libr. Rev. 14 (3) (1997) 167–172. [78] S.B. Johnson, D. Chatziantoniou, Extended SQL for manipulating clinical warehouse data, in: Proceedings of the AMIA Symposium, American Medical Informatics Association, 1999, p. 819. [79] R. Nordlie, ‘‘User revealment”—a comparison of initial queries and ensuing question development in online searching and in human reference interactions., in: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 1999, pp. 11–18. [80] A. Booth, A.J. O’Rourke, N.J. Ford, Structuring the pre-search reference interview: a useful technique for handling clinical questions, Bull. Med. Libr. Assoc. 88 (3) (2000) 239. [81] E.V. Villanueva, E.A. Burrows, P.A. Fennessy, M. Rajendran, J.N. Anderson, Improving question formulation for use in evidence appraisal in a tertiary care setting: a randomised controlled trial [ISRCTN66375463], BMC Med. Inform. Decis. Mak. 1 (1) (2001) 4. [82] S.N. Murphy, G.O. Barnett, H.C. Chueh, Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system, in: Proceedings of the AMIA Symposium, American Medical Informatics Association, 2000, p. 1174. [83] D. Robins, Shifts of focus on various aspects of user information problems during interactive information retrieval, J. Am. Soc. Inform. Sci. 51 (10) (2000) 913–928. [84] J.W. Ely, J.A. Osheroff, M.H. Ebell, M.L. Chambliss, D.C. Vinson, J.J. Stevermer, E. A. Pifer, Obstacles to answering doctors’ questions about patient care with evidence: qualitative study, BMJ: Brit. Med. J. 324 (7339) (2002) 710. [85] N. Ford, T. Wilson, A. Foster, D. Ellis, A. Spink, Information seeking and mediated searching. Part 4. Cognitive styles in information seeking, J. Am. Soc. Inform. Sci. Technol. 53 (9) (2002) 728–735. [86] A. Spink, T.D. Wilson, N. Ford, A. Foster, D. Ellis, Information-seeking and mediated searching. Part 1. Theoretical framework and research design, J. Am. Soc. Inform. Sci. Technol. 53 (9) (2002) 695–703. [87] J. Janes, Question negotiation in an electronic age, Digit. Ref. Res. Agenda (2003) 48–60. [88] S.N. Murphy, V. Gainer, H.C. Chueh, A visual interface designed for novice users to find research patient cohorts in a large biomedical database, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association, 2003, p. 489. [89] S. Small, N. Shimizu, T. Strzalkowski, T. Liu, HITIQA: a data driven approach to interactive question answering: a preliminary report, in: New Directions in Question Answering, 2003, pp. 94–104. [90] A.B. Wilcox, G. Hripcsak, The role of domain knowledge in automating medical text report classification, J. Am. Med. Inform. Assoc. 10 (4) (2003) 330–338. [91] B.M. Wildemuth, The effects of domain knowledge on search tactic formulation, J. Am. Soc. Inform. Sci. Technol. 55 (3) (2003) 246–258. [92] A.R. Diekema, O. Yilmazel, J. Chen, S. Harwell, L. He, E.D. Liddy, Finding Answers to Complex Questions, 2004. [93] D. Goren-Bar, Y. Shahar, M. Galperin-Aizenberg, D. Boaz, G. Tahan, KNAVE II: the definition and implementation of an intelligent tool for visualization and exploration of time-oriented clinical data, in: Proceedings of the Working Conference on Advanced Visual Interfaces, ACM, 2004, pp. 171–174. [94] R.D. Lankes, The digital reference research agenda, J. Am. Soc. Inform. Sci. Technol. 55 (4) (2004) 301–311. [95] P. Hansen, K. Järvelin, Collaborative information retrieval in an informationintensive domain, Inf. Process. Manage. 41 (5) (2005) 1101–1119. [96] N.J. McCracken, A.R. Diekema, G. Ingersoll, S.C. Harwell, E.E. Allen, O. Yilmazel, E.D. Liddy, Modeling reference interviews as a basis for improving automatic QA systems, in: Proceedings of the Interactive Question, Association for Computational Linguistics, 2006, pp. 17–24.

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760

YJBIN 2534

No. of Pages 9, Model 5G

12 March 2016 G.W. Hruby et al. / Journal of Biomedical Informatics xxx (2016) xxx–xxx 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836

[97] S.N. Murphy, M.E. Mendis, D.A. Berkowitz, I. Kohane, H.C. Chueh, Integration of clinical and genetic data in the i2b2 architecture, AMIA Annu. Symp. Proc. (2006) 1040. [98] A.R. Post, A.N. Sovarel, J.H. Harrison Jr., Abstraction-based temporal data retrieval for a Clinical Data Repository, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association, 2007, p. 603. [99] A.R. Post, J.H. Harrison, Protempa: a method for specifying and identifying temporal sequences in retrospective data for patient selection, J. Am. Med. Inform. Assoc. 14 (5) (2007) 674–683. [100] T.D. Wang, C. Plaisant, A.J. Quinn, R. Stanchak, S. Murphy, B. Shneiderman, Aligning temporal data by sentinel events: discovering patterns in electronic health records, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, 2008, pp. 457–466. [101] C. Plaisant, S. Lam, B. Shneiderman, M.S. Smith, D. Roseman, G. Marchand, M. Gillam, C. Feied, J. Handler, H. Rappaport, Searching electronic health records for temporal patterns in patient histories: a case study with microsoft amalga, AMIA Annu. Symp. Proc. (2008) 601–605. [102] J. Lin, P. Wu, E. Abels, Toward automatic facet analysis and need negotiation: lessons from mediated search, ACM Trans. Inform. Syst. (TOIS) 27 (1) (2008) 6. [103] M.G. Kahn, D. Batson, L.M. Schilling, Data model considerations for clinical effectiveness researchers, Med. Care 50 (2012) S60–S67. [104] M.G. Kahn, L.M. Schilling, B.M. Kwan, A. Bunting, C. Uhrich, C. Singleton, Preparing electronic health records data for comparative effectiveness studies, in: Healthcare Informatics, Imaging and Systems Biology (HISB), 2012 IEEE Second International Conference on, IEEE, 2012, p. 2. [105] C. Tao, G. Jiang, T.A. Oniki, R.R. Freimuth, Q. Zhu, D. Sharma, J. Pathak, S.M. Huff, C.G. Chute, A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data, J. Am. Med. Inform. Assoc. 20 (3) (2013) 554–562. [106] C.G. Chute, (1) Obstacles and options for big-data applications in biomedicine: the role of standards and normalizations, in: Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on, IEEE, 2012, pp. 1–1. [107] L. Zhao, S.N.L.C. Keung, A. Taweel, E. Tyler, I. Ogunsina, J. Rossiter, B.C. Delaney, K.A. Peterson, F.R. Hobbs, T.N. Arvanitis, A loosely coupled framework for terminology controlled distributed EHR search for patient cohort identification in clinical research, Stud. Health Technol. Inform. 180 (2012) 519. [108] S. Rea, J. Pathak, G. Savova, T.A. Oniki, L. Westberg, C.E. Beebe, C. Tao, C.G. Parker, P.J. Haug, S.M. Huff, Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project, J. Biomed. Inform. 45 (4) (2012) 763–771. [109] P.E. Stang, P.B. Ryan, J.A. Racoosin, J.M. Overhage, A.G. Hartzema, C. Reich, E. Welebob, T. Scarnecchia, J. Woodcock, Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership, Ann. Intern. Med. 153 (9) (2010) 600–606. [110] H.J. Lowe, T.A. Ferris, P.M. Hernandez, S.C. Weber, STRIDE–an integrated standards-based translational research informatics platform, AMIA Annu. Symp. Proc. 2009 (2009) 391–395. [111] D. Dowdy, C. Dye, T. Cohen, Data needs for evidence-based decisions: a tuberculosis modeler’s wish list [Review article], Int. J. Tuberc. Lung Dis. 17 (7) (2013) 866–877. [112] W.R. Carpenter, A.-M. Meyer, A.P. Abernethy, T. Stürmer, M.R. Kosorok, A framework for understanding cancer comparative effectiveness research data needs, J. Clin. Epidemiol. 65 (11) (2012) 1150–1158. [113] J.J. Cimino, E.J. Ayres, A. Beri, R. Freedman, E. Oberholtzer, S. Rath, Developing a self-service query interface for Re-using De-identified electronic health record data, Stud. Health Technol. Inform. 192 (2013) 632. [114] T. Edinger, A.M. Cohen, S. Bedrick, K. Ambert, W. Hersh, Barriers to retrieving patient information from electronic health record data: failure analysis from the TREC Medical Records Track, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association, 2012, p. 180. [115] G. Hripcsak, C. Knirsch, L. Zhou, A. Wilcox, G.B. Melton, Bias associated with mining electronic health records, J. Biomed. Discov. Collab. 6 (2011) 48. [116] A.B. Wilcox, D.K. Vawdrey, Y.-H. Chen, B. Forman, G. Hripcsak, The evolving use of a clinical data repository: facilitating data access within an electronic medical record, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association, 2009, p. 701. [117] A.N. Kho, J.A. Pacheco, P.L. Peissig, L. Rasmussen, K.M. Newton, N. Weston, P. K. Crane, J. Pathak, C.G. Chute, S.J. Bielinski, I.J. Kullo, R. Li, T.A. Manolio, R.L. Chisholm, J.C. Denny, Electronic medical records for genetic research: results of the eMERGE consortium, Sci. Transl. Med. 3 (79) (2011) 79re1. [118] R.C. Price, D. Huth, J. Smith, S. Harper, W. Pace, G. Pulver, M.G. Kahn, L.M. Schilling, J.C. Facelli, Federated queries for comparative effectiveness research: performance analysis, Stud. Health Technol. Inform. 175 (2012) 9.

9

[119] D.F. Sittig, B.L. Hazlehurst, J. Brown, S. Murphy, M. Rosenman, P. TarczyHornoch, A.B. Wilcox, A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data, Med. Care 50 (2012) S49–S59. [120] K.B. Bayley, T. Belnap, L. Savitz, A.L. Masica, N. Shah, N.S. Fleming, Challenges in using electronic health record data for CER: experience of 4 learning organizations and solutions applied, Med. Care (2013). [121] N.G. Weiskopf, C. Weng, Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research, J. Am. Med. Inform. Assoc. 20 (1) (2013) 144–151. [122] O. Vechtomova, H. Zhang, Articulating complex information needs using query templates, J. Inform. Sci. 35 (4) (2009) 439–452. [123] J.F. Hurdle, S.C. Haroldsen, A. Hammer, C. Spigle, A.M. Fraser, G.P. Mineau, S.J. Courdy, Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database, J. Am. Med. Inform. Assoc. 20 (1) (2013) 164–171. [124] N. Anderson, A. Abend, A. Mandel, E. Geraghty, D. Gabriel, R. Wynden, M. Kamerick, K. Anderson, J. Rainwater, P. Tarczy-Hornoch, Implementation of a deidentified federated data network for population-based cohort discovery, J. Am. Med. Inform. Assoc. 19 (e1) (2012) e60–e67. [125] G.M. Weber, S.N. Murphy, A.J. McMurry, D. Macfadden, D.J. Nigrin, S. Churchill, I.S. Kohane, The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories, J. Am. Med. Inform. Assoc. 16 (5) (2009) 624–630. [126] A.J. McMurry, S.N. Murphy, D. Macfadden, G. Weber, W.W. Simons, J. Orechia, J. Bickel, N. Wattanasin, C. Gilbert, P. Trevvett, S. Churchill, I.S. Kohane, SHRINE: enabling nationally scalable multi-site disease studies, PLoS ONE 8 (3) (2013) e55811. [127] G.Q. Zhang, T. Siegler, P. Saxman, N. Sandberg, R. Mueller, N. Johnson, D. Hunscher, S. Arabandi, VISAGE: a query interface for clinical research, AMIA Summits Transl. Sci. Proc. 2010 (2010) 76–80. [128] M.M. Horvath, S. Winfield, S. Evans, S. Slopek, H. Shang, J. Ferranti, The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement, J. Biomed. Inform. 44 (2) (2011) 266–276. [129] M. Dörk, C. Williamson, S. Carpendale, Navigating tomorrow’s web: from searching and browsing to visual exploration, ACM Trans. Web (TWEB) 6 (3) (2012) 13. [130] J. Jin, P. Szekely. QueryMarvel: a visual query language for temporal patterns using comic strips, in: Visual Languages and Human-Centric Computing, VL/ HCC 2009, IEEE Symposium on, IEEE, 2009, pp. 207–214. [131] K. Wongsuphasawat, C. Plaisant, M. Taieb-Maimon, B. Shneiderman, Querying event sequences by exact match or similarity search: design and empirical evaluation, Interact. Comput. 24 (2) (2012) 55–68. [132] M. Monroe, R. Lan, H. Lee, C. Plaisant, B. Shneiderman, Temporal event sequence simplification, Vis. Comput. Graph., IEEE Trans. 19 (12) (2013) 2227–2236. [133] R. Lan, H. Lee, A. Fong, M. Monroe, C. Plaisant, B. Shneiderman, Temporal search and replace: an interactive tool for the analysis of temporal event sequences, in: Technical Report HCIL-2013-TBD, University of Maryland, College Park, Maryland, 2013. [134] M.D. Olvera-Lobo, J. Gutiérrez-Artacho, Question–answering systems as efficient sources of terminological information: an evaluation, Health Inform. Libr. J. 27 (4) (2010) 268–276. [135] G.W. Hruby, M.R. Boland, J.J. Cimino, J. Gao, A.B. Wilcox, J. Hirschberg, C. Weng, Characterization of the biomedical query mediation process, in: AMIA Summits on Translational Science Proceedings, San Francisco, 2013, p. 5. [136] M. Conway, R.L. Berg, D. Carrell, J.C. Denny, A.N. Kho, I.J. Kullo, J.G. Linneman, J.A. Pacheco, P. Peissig, L. Rasmussen, Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association, 2011, p. 274. [137] R. Moskovitch, Y. Shahar, Medical temporal-knowledge discovery via temporal abstraction, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association, 2009, p. 452. [138] C. Combi, G. Pozzi, R. Rossato, Querying temporal clinical databases on granular trends, J. Biomed. Inform. 45 (2) (2012) 273–291. [139] S.L.D.S.G. Program, Centralized vs. Federated: State Approaches to P-20W Data Systems, I.o.E. Sciences, Editor, 2012, p. 6. [140] A. Wilcox, G. Randhawa, P. Embi, H. Cao, G.J. Kuperman, Sustainability considerations for health research and analytic data infrastructures, EGEMS (Wash DC) 2 (2) (2014) 1113. [141] R.L. Fleurence, L.H. Curtis, R.M. Califf, R. Platt, J.V. Selby, J.S. Brown, Launching PCORnet, a national patient-centered clinical research network, J. Am. Med. Inform. Assoc. 21 (4) (2014) 578–582.

Please cite this article in press as: G.W. Hruby et al., Facilitating biomedical researchers’ interrogation of electronic health record data: Ideas from outside of biomedical informatics, J Biomed Inform (2016), http://dx.doi.org/10.1016/j.jbi.2016.03.004

837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911