Natural Language Processing in the Electronic Medical Record

Natural Language Processing in the Electronic Medical Record

Natural Language Processing in the Electronic Medical Record Assessing Clinician Adherence to Tobacco Treatment Guidelines Brian Hazlehurst, PhD, Dean...

96KB Sizes 0 Downloads 26 Views

Natural Language Processing in the Electronic Medical Record Assessing Clinician Adherence to Tobacco Treatment Guidelines Brian Hazlehurst, PhD, Dean F. Sittig, PhD, Victor J. Stevens, PhD, K. Sabina Smith, BA, CCRP, Jack F. Hollis, PhD, Thomas M. Vogt, MD, MPH, Jonathan P. Winickoff, MD, MPH, Russ Glasgow, PhD, Ted E. Palen, PhD, MD, Nancy A. Rigotti, MD Background: Comprehensively assessing care quality with electronic medical records (EMRs) is not currently possible because much data reside in clinicians’ free-text notes. Methods:

We evaluated the accuracy of MediClass, an automated, rule-based classifier of the EMR that incorporates natural language processing, in assessing whether clinicians: (1) asked if the patient smoked; (2) advised them to stop; (3) assessed their readiness to quit; (4) assisted them in quitting by providing information or medications; and (5) arranged for appropriate follow-up care (i.e., the 5A’s of smoking-cessation care).

Design:

We analyzed 125 medical records of known smokers at each of four HMOs in 2003 and 2004. One trained abstractor at each HMO manually coded all 500 records according to whether or not each of the 5A’s of smoking cessation care was addressed during routine outpatient visits.

Measurements: For each patient’s record, we compared the presence or absence of each of the 5A’s as assessed by each human coder and by MediClass. We measured the chance-corrected agreement between the human raters and MediClass using the kappa statistic. Results:

For “ask” and “assist,” agreement among human coders was indistinguishable from agreement between humans and MediClass (p ⬎0.05). For “assess” and “advise,” the human coders agreed more with each other than they did with MediClass (p ⬍0.01); however, MediClass performance was sufficient to assess quality in these areas. The frequency of “arrange” was too low to be analyzed.

Conclusions: MediClass performance appears adequate to replace human coders of the 5A’s of smoking-cessation care, allowing for automated assessment of clinician adherence to one of the most important, evidence-based guidelines in preventive health care. (Am J Prev Med 2005;29(5):434 – 439) © 2005 American Journal of Preventive Medicine

Introduction

I

nterest and support for widespread implementation and adoption of electronic medical records (EMRs) is increasing, fueled in part by the hope that these systems will improve care quality through faster and more accurate clinical data analyses.1–5 A significant portion of these electronic data, however, is unusable by available automated analysis methods be-

From the Center for Health Research, Kaiser Permanente (Haslehurst, Sittig, Stevens, Smith, Hollis, Vogt), Portland, Oregon; Tobacco Research and Treatment Center, Massachusetts General Hospital, Harvard Medical School (Winickoff, Rigotti); Department of Ambulatory Care and Prevention, Harvard Medical School (Rigotti), Boston, Massachusetts; and Kaiser Permanente Clinical Research Unit (Glasgow, Palen), Denver, Colorado Address correspondence and reprint requests to: Brian L. Hazlehurst, PhD, Center for Health Research, Kaiser Permanente, 3800 N. Interstate Ave., Portland OR 97227. E-mail: brian.hazlehurst@ kpchr.org.

434

cause it is not systematically coded. These so-called “free-text” portions of medical records often contain critical information that would allow more comprehensive assessment of evidence-based care. One recent analysis concluded that, of the information necessary for a comprehensive quality assessment of a health plan with a modern EMR, a maximum of 50% could be obtained from administrative data and the clinical codes for lab results, procedure results, vital signs, and signs and symptoms.6 This upper-bound estimate was calculated by considering which quality measures could be addressed by these coding schemes. It does not account for the known usability and process challenges of actually achieving structured data entry using these coding schemes that further reduce this coverage in practice.7–9 These problems are often amplified for preventive care activities, such as counseling about smoking cessation, which are based on

Am J Prev Med 2005;29(5) © 2005 American Journal of Preventive Medicine • Published by Elsevier Inc.

0749-3797/05/$–see front matter doi:10.1016/j.amepre.2005.08.007

the content of complex discussions between provider and patient. One potential solution to this dilemma would be to replace all narrative sections of an EMR with structured data entry. However, clinical notes in the record play an important role in communications both within and between providers.10,11 They have not proven to be “replaceable” with structured data captured at the user interface to the EMR. For complex conversations involving multiple topics, the myriad of alternative choices required to create codes is simply impractical. An alternative solution is to develop computer systems capable of automatically processing the free-text portions of the medical record.12–14 Development of natural language processing (NLP) systems has become more feasible with increased availability of electronically recorded (but uncoded) clinical data,15 as well as recent advances in data storage capacity, increases in computational power, and new programming techniques.16 –22 This paper reports on the evaluation of an NLP system, called MediClass (MC, a “medical classifier”), configured to automatically assess delivery of evidence-based smoking-cessation care, using information in both the coded and free-text portions of the EMR.

The 5A’s of Smoking-Cessation Care This test of the MediClass system uses smoking cessation because tobacco use is the leading preventable cause of death in the United States,23–25 and because evidencebased guidelines for delivering tobacco-cessation treatments in primary care settings have been developed, widely disseminated, and now constitute standard care.26 –30 The recommended treatment model involves five steps, that is, the “5A’s”: (1) ask patients about smoking status at every visit; (2) advise all tobacco users to quit; (3) assess a patient’s willingness to try to quit; (4) assist the patient’s quitting efforts (provide smoking-cessation treatments or referrals); and (5) arrange follow-up (provide or arrange for supportive follow-up contacts). The 5A’s approach has been widely endorsed by healthcare organizations and used by regulatory organizations to assess healthcare quality (e.g., HEDIS). It has become the national model for tobacco treatment (Table 1).26 Although some of the 5A steps are easily coded on entry into the EMR (e.g., identification of smoking status or prescriptions for smoking-cessation drugs), other steps are typically recorded in free-text progress notes in the EMR (e.g., assessment of readiness to change, provision of behavior change counseling). Some of these free-text clinical notes are not suitable for coding at the EMR user interface, yet provide the information essential to continuity of patient care that is critical to tobacco cessation and relapse prevention.

Table 1. The “Five A’s” recommended by the current U.S. Public Health Service Clinical Practice Guideline for tobacco treatment and prevention 5A step Ask Advise Assess

Assist Arrange

Operational definition

Example in free-text section of EMR

Identify tobacco user status at every visit Advise all tobacco users to quit Determine patient’s willingness to make a quit attempt Aid the patient in quitting Schedule follow-up contact, in person or via telephone

“Patient smokes 1 ppd” “It is important for you to quit smoking now” “Patient not interested in quitting smoking” “Started patient on Zyban” “Follow-up in 2 weeks for quit progress”

Example text segments shown could appear in clinical notes or patient instructions generated for the encounter. EMR, electronic medical records; ppd, pack per day.

The MediClass system was designed and developed by members of the research team. A complete technical description of the system is reported elsewhere.31 In essence, MediClass maps the contents of each encounter to a controlled set of clinical concepts based on (1) phrases detected in free-text sections, and (2) codes detected in structured sections of the medical record. Classifications are performed by context-sensitive rules that select for the clinical concepts of interest for the application. For assessing delivery of the 5A’s, these concepts include smoking-cessation medications, discussions, referral activities, and quitting activities, as well as smoking and readiness-to-quit assessments documented by the care provider. This knowledge was encoded into the MediClass system; the process began with the guideline definitions of the 5A’s.26 A subgroup of the research team (including clinicians and tobaccocessation experts) met over several weeks to operationalize these definitions by defining the concepts involved and the types of phrases that provide evidence for each concept. Finally, health plan–specific details about smoking-cessation care were incorporated into the definitions and the system.

Methods This research was conducted in 2003 and 2004 in four nonprofit HMOs: Harvard Pilgrim Health Care in Massachusetts, and three regions of Kaiser Permanente (Northwest, Colorado, and Hawaii). Following Institutional Review Board approval from each of the four health plans, the project requested electronic copies of medical records for about 1000 known smokers at each institution. These EMRs include the relevant data from single office visits with primary care clinicians. The data were extracted from the offline data warehouse at each institution and saved as structured files, which included progress notes, patient instructions, medica-

Am J Prev Med 2005;29(5)

435

(i.e., both the coded portions, such as medications ordered, and the free-text or progress notes section). Each site submitted the results of this work back to our coordinating center as anonymous database records that indicated, for each clinical encounter, the presence or absence of each of the 5A’s.

Training Human Abstractors to Identify the 5A’s Figure 1. The validation study sample. An “enrichment” process (see text for explanation) was used to select up to 20 records from each health plan, and random selection then used to fill out each plan’s sample to 125 records. The total validation study sample of 500 included 500 primary care visit records of known smokers.

tions ordered, referrals ordered, reason for visit, and other smoking-related data in the EMR. Some EMR systems included fields for assessing smoking status, and one supported structured entry for indicating provision of cessation advice. However, two of the four data systems included no smokingspecific data fields. In preliminary work, 125 records were randomly selected from each of the four sites, and coded by four trained chart abstractors (see below) and by MediClass (MC). The study assessed disagreements between MC and the trained human raters. MediClass was improved and re-run against these records to ensure that the revised system did not inadvertently introduce new misclassifications. The records used in this preliminary work were then removed from the data pool, leaving 875 records from each health plan in the data pool. This preliminary work, as well as previous studies, showed that several of the “A’s” are typically infrequent in the data.27 Therefore, the validation study used a sample composed of both random and “enriched” portions, as follows. The enriched portions included records with “Assist” (a subgroup with target size of 15 records) and “Arrange” (a subgroup with target size of five records) from each of the four health plans. The MediClass system was used to locate these records for inclusion in the enriched portion of the total sample. If the system located more records than were needed for a subgroup, then the final records were randomly selected to produce the subgroup. In some cases, there were not enough records to fill a subgroup, in which case all of the located records were used and the subgroup was smaller than the target size. The final size of the enriched portion of our sample was 77 records. These records were then removed from the data pool. The remainder of our sample (423 records) was then randomly drawn from the data pool, stratified by health plan. The final sample of 500 records contained 125 records from each of the four health plans (see Figure 1). These 500 records, each representing a single primary care visit, were automatically transformed into HTML files for convenient viewing. Four trained medical record chart reviewers coded all 500 records, using a standard web browser application to view the HTML files.32 Abstractors were trained to look for evidence of documentation of the 5A’s of smokingcessation counseling. For each record, they were asked to identify whether each “A” was absent or present. They were encouraged to use information from all parts of the record

436

A 2-day training meeting was attended by one medical record coder from each of the four participating study sites. Before the meeting, each coder was given the evidence-based clinical practice guidelines to learn the content area,26 and immediately before training took a brief true/false, multiple-choice test to evaluate their understanding of some of the basic concepts in tobacco cessation. All coders scored at least 80% on the test. The training included a review of the development of the 5A’s tobacco-cessation guidelines by a tobacco expert from the research team (JFH). The instructor (KSS) reviewed the coding manual, coding definitions, common tobacco terms, and common medical record terms and abbreviations. The coders were issued the coding manual, and worked on computers set up with data review and entry utilities. An intensive case review of five examples—segments of progress note narrative relevant to smoking cessation—was conducted. The proper coding of each case was discussed in detail. All participants coded ten new examples and entered their results into their respective databases. Results from all coders were compared with each other and discussed until a common understanding was reached. The second day consisted of additional case reviews, and included some example records in HTML format, to familiarize the coders with the actual form of data for the validation study. Several weeks later, a preliminary study was run that allowed abstractors to try out the entire process on an initial set of 500 records. Significant differences among coders were noted and follow-up training was conducted during site visits by the instructor (KSS), over the next several weeks. Finally, an additional 500 records were abstracted for the validation study reported here.

Results Table 2 contains the mean agreement on each of the A’s between MC and each of the human raters (four pairs of raters in “mean MC agreement” group) along with the mean agreement among all pairs of human raters (six pairs of raters in “mean non-MC agreement” group). These means are averages of the kappa statistic (known as Light’s kappa33) for each group. The difference between these two groups was tested for significance using Student’s t -test. For two of the A’s (“ask” and “assist”), the mean agreement between the human coders is indistinguishable from agreement between the humans and MediClass (Student’s t -test, p ⬎0.05). For two other A’s (“assess” and “advise”), the humans agreed more often with each other than they did with MediClass (Student’s t -test, p ⬍0.01). The fifth A (“arrange”) was coded too infrequently by the human abstractors to be

American Journal of Preventive Medicine, Volume 29, Number 5

Table 2. MC agreement with human raters compared with agreement among human raters for each of 5A’s (n ⫽ 500).

Ask Advise Assess Assist Arrange

Frequency (mean across all four human coders)

Mean MC agreement (Light’s kappa)

Mean non-MC agreement (Light’s kappa)

p value*

83% (82%–84%) 31% (24%–39%) 12% (5%–18%) 13% (2%–24%) 0% (0%–1%)

0.88 (0.83–0.93) 0.71 (0.64–0.78) 0.51 (0.39–0.63) 0.49 (0.41–0.57) NA

0.91 (0.87–0.96) 0.82 (0.77–0.87) 0.69 (0.58–0.79) 0.57 (0.47–0.68) NA

0.124 0.006 0.002 0.271 NA

Notes: Mean MC agreement, n ⫽ 4 pairs; mean non-MC agreement, n ⫽ 6 pairs. *Two-tailed Student’s t-test of difference between MC agreement and non-MC agreement, p ⬍ 0.01 (bolded). MC, MediClass; NA, not applicable.

compared, and was dropped from further consideration in our analyses. Another way to analyze these data is to create a “gold standard” using the majority opinion of the human raters (i.e., three or four of the human raters agree on either the presence or absence of a particular “A”), and then compute the accuracy of MediClass against this gold standard. Because there were four human abstractors rating the cases, “ties” had to be adjudicated. To break the ties, an expert on the team who had been involved in training the humans (KSS) independently coded just these cases. There were a total of 72 out of 2000 cases (500 records times four possible codes) that required adjudication (4 of “ask,” 24 of “advise,” 14 of “assess,” and 30 of “assist”). With the coding ties broken, MediClass agreed with the gold standard 91% of the time. Table 3 shows MediClass performance as measured by two common statistics—sensitivity and specificity— generated by comparison against a gold standard. As shown in Table 3, point estimates of sensitivity were found to be 0.97, 0.68, 0.64, and 1.0, while for specificity, they were 0.95, 1.0, 0.96, and 0.82, respectively, across the four A’s (“ask,” “assess,” “advise,” and “assist”) for which measurement was possible.

Discussion This study is the first to evaluate an automated system that abstracts data from the coded and free-text fields of the EMR to assess clinicians’ adherence to tobacco treatment guidelines. Even in state-of-the-art EMR systems, most data are entered as free-text narrative, and are therefore not amenable to currently available automated assessment methods. This hinders the potential of the EMR for assessing healthcare quality. This study evaluated an automated EMR classifier that incorporates natural language processing techniques and handles both free-text and coded record data. The MediClass system was encoded with smoking-cessation guideline knowledge, and performance in automated coding for the 5A’s was compared with that of trained human record abstractors. The system performed similarly to human abstractors in determining whether clinicians at four different health plans performed the

five tasks recommended by evidence-based smokingcessation clinical practice guidelines. There are strengths and weaknesses in both manual and automated means of quality assessment through chart abstraction. For humans performing manual abstraction, the 5A’s coding task is difficult for a number of reasons, such as the data can be buried in the middle of clinical notes, the shorthand of clinicians makes it hard to spot the relevant portions of the note, and variability in word usage and sentence construction make interpretation of what was written problematic from the standpoint of identifying effective counseling behaviors. Importantly, the human raters in this study had a difficult time consistently applying the 5A’s guidelines to all 500 records, despite careful training. This inconsistency shows up in reduced levels of interrater agreement among the human raters (see Table 2, fourth column). The difficulty of this task, possibly in combination with relatively low prevalence of events in the data, created lower magnitudes in our agreement measure (kappa) than is ideal.34 In particular, the low frequency of “assess” and “assist” in the data may be partly responsible for lower kappas for these events. MediClass was entirely consistent (as would be expected from a computer program), yet lacked the ability to detect subtle differences in how clinicians record the encounter, some of which may represent important distinctions in the efficacy of care delivered. The case of “smoking-cessation discussion” provides an example. MediClass was quite literal in interpreting a “smoking-cessation discussion” as a form of “assist.” However, the human raters appeared to sometimes use implicit social knowledge to decide that a tersely Table 3. MediClass performance against gold standard created from human raters

5A step

Frequency in gold standard (n ⴝ 500)

Ask 417 (83%) Advise 161 (32%) Assess 55 (11%) Assist 71 (14%) Arrange 1 (0.2%)

Sensitivity

Specificity

0.97 (0.95–0.99) 0.68 (0.60–0.75) 0.64 (0.50–0.76) 1.0 (0.94–1.0) NA

0.95 (0.88–0.98) 1.0 (0.99–1.0) 0.96 (0.94–0.98) 0.82 (0.78–0.85) NA

NA, not applicable.

Am J Prev Med 2005;29(5)

437

What This Study Adds . . . Although evidenced-based guidelines for tobaccocessation treatment have been developed, efficient methods for assessing adherence to these guidelines have been lacking. This study is the first to evaluate an automated system that abstracts data from coded and free text fields of the electronic medical record to assess smoking-cessation treatment. The automated system performed similarly to human abstractors in determinations of whether clinicians at four different health plans performed the tasks recommended by evidencebased guidelines.

worded note, although it clearly documented a “discussion” about smoking cessation, did not really count as assistance. The humans would often agree about these cases, but MediClass would not make the necessary distinctions. In principle, if this tacit knowledge were made explicit, it could be encoded into the system to improve performance. Relatively low values of specificity on “assist” and sensitivity on “advise” (Table 3) reveal a significant difference in the classifications of MediClass and the human raters. Upon review, 12 cases were found coded as “assist” (and not “advise”) by MediClass, but coded as “advise” (and not “assist”) in the gold standard. Looking in detail at these 12 cases revealed four reasons for the discrepancy: (1) in five cases, language in the progress note such as “counseled for smoking cessation” qualified (for MediClass) as assist and not advise, but was coded conversely by the human raters in the gold standard; (2) similarly, in three cases, the “reason for visit” code for “tobacco-cessation discussion” qualified as assist and not advise for MediClass, but conversely for the humans; (3) in two cases, MediClass made context errors (e.g., interpreting the text segment “URI education done RTC PRN quit smoking” as delivery of smoking-cessation education); and (4) in two cases, MediClass erred by incorrectly spell-correcting words to the word “counsel,” which combined (unfortunately) with local context to indicate that smoking-cessation counseling was given when no such evidence exists. Getting just 8 of the 12 cases (reasons 1 and 2 above) converted to “assist” in the gold standard would increase MediClass sensitivity for “advise” from 0.68 to 0.72, while simultaneously increasing MediClass specificity for “assist” from 0.82 to 0.83. Finally, there are significant “cost” differences between MediClass and human abstractors. Even with the data-system peculiarities that would need to be accommodated at each new installation site, it would be relatively inexpensive to use MediClass to assess smoking438

cessation care delivery at additional health plans. Once installed, MediClass performance is more than two orders of magnitude (⬎100 times) faster than a trained human abstractor, and it operates at negligible cost. Furthermore, the MediClass system is easily replicated on additional computers—at the cost of purchase, installation, and maintenance of each additional desktop personal computer—multiplying processing throughput with each replication. Although coded data (e.g., ICD and CPT codes) can provide unambiguous records of diagnoses and treatment, not all important diagnostic and treatment activities fit easily into coded categories. Particularly for counseling treatments for patients, free-text notes may be much more valuable for documenting and facilitating continuity of care between visits and between care providers. This test of a natural language application shows that automated coding of free-text notes can be practical and potentially provide high-quality measures of health and treatment patterns in large populations.

Conclusion Informative and essential data within uncoded clinical notes and other text portions of the EMR are unavailable to current automated health and care assessment methods. Due to the clinical value of narrative, and the poor acceptance thus far of structured data entry, it is unlikely that wholesale replacement of the EMR narrative with structured data entry will succeed. This study demonstrates the feasibility of an automated coding system for processing the entire EMR, enabling assessment of smoking-cessation care delivery. Such a system can be similar in accuracy to that of trained human coders. Systems such as MediClass can help bridge the gap between the promise and the realization of value in EMRs. We would like to acknowledge the work of Jessica Warmoth (Kaiser Permanente-Hawaii), William Franson (Harvard Pilgrim Health Care), Marilyn Pearson (Kaiser PermanenteColorado, and Kim Olson (Kaiser Permanente-Northwest) for their help in reviewing and manually coding all of the medical records used in this study. This work was supported by a grant from the National Cancer Institute (U19 CA79689) for The HMO Cancer Research Network (CRN2). The Cancer Research Network (CRN) consists of the research programs, enrollee populations, and databases of ten HMOs that are members of the HMO Research Network, including Group Health Cooperative, Harvard Pilgrim Health Care, Henry Ford Health System, HealthPartners Research Foundation, the Meyers Primary Care Institute of the Fallon Healthcare System/University of Massachusetts, and Kaiser Permanente in five regions (Colorado, Hawaii, Northwest [Oregon and Washington], Northern California, and Southern California). The overall goal of the CRN is to increase the effectiveness of preventive, curative, and supportive interventions that span the natural history of major cancers among

American Journal of Preventive Medicine, Volume 29, Number 5

diverse populations and health systems through a program of collaborative research. This work was supported in part by a grant from the National Cancer Institute (U19 CA79689) for The HMO Cancer Research Network (CRN2). No financial conflict of interest was reported by the authors of this paper.

References 1. Institute of Medicine, Dick RS, Steen EB, Detmer DE. The computer-based patient record: an essential technology for health care. Rev. ed. Washington DC: National Academy Press, 1997. 2. Corrigan JM, Donaldson MS, Kohn LT, eds. Crossing the quality chasm: a new health system for the 21st century. Washington DC: National Academy Press, 2001. 3. Schneider EC, Riehl V, Courte-Wienecke S, Eddy DM, Sennett C. Enhancing performance measurement: NCQA’s road map for a health information framework. JAMA 1999;282:1184 –90. 4. Thompson TG, Brailer DJ. The decade of health information technology: delivering consumer-centric and information-rich health care—framework for strategic action. Washington DC: U.S. Department of Health and Human Services, July 21, 2004. 5. Vogt TM, Aickin M, Ahmed F, Schmidt M. The Prevention Index: using technology to improve quality assessment. Health Services Res 2004;39:511–29. 6. Hicks J. The potential of claims data to support the measurement of health care quality. PhD diss. RAND Graduate School, 2003. (Available as of December 20, 2004, at: www.rand.org/cgibin/Abstracts/ordi/getabbydoc. pl?doc⫽RGSD-171.) 7. McDonald C. Quality measures and electronic medical systems. JAMA 1999;282:1181–2. 8. McDonald C. The barriers to electronic medical record systems and how to overcome them. J Am Med Inform Assoc 1997;4:213–21. 9. Kaplan B. Reducing barriers to physician data entry for computer-based patient records. Top Health Info Manag 1994;15:24 –34. 10. Walsh SH. The clinician’s perspective on electronic health records and how they can affect patient care. BMJ 2004;328:1184 –7. 11. Coeira E. When conversation is better than computation. J Am Med Inform Assoc 2000;7:277– 86. 12. Rottger P, Sunkel H, Reul H, Klein I. New possibilities of statistical evaluation of autopsy records. Computer free text analysis. Methods Info Med 1970;9:35– 44. 13. Fenichel RR, Barnett GO. An application-independent subsystem for free-text analysis. Comput Biomed Res 1976;9:159 – 67. 14. Sager N, Wong R. Developing a database from free-text clinical data. J Clin Comput 1983;11:184 –94. 15. Hripcsak G, Austin JH, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 2002;224:157– 63. 16. Jain NL, Knirsch CA, Friedman C, Hripcsak G. Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. Proc AMIA Symp 1996:542– 6.

17. Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med 1995;122:681– 8. 18. Friedman C, Knirsch C, Shagina L, Hripcsak G. Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. Proc AMIA Symp 1999:256 – 60. 19. Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 2004;11:392– 402. 20. Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 2000;33:1–10. 21. Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1994;1:142– 60. 22. Honigman B, Lee J, Rothschild J, et al. Using computerized data to identify adverse drug events in outpatients. J Am Med Inform Assoc 2001;8:254 – 66. 23. Centers for Disease Control and Prevention. Cigarette smoking-attributable morbidity—United States, 2000. MMWR Morb Mortal Wkly Rep 2003;52:842–4. 24. U.S. Department of Health and Human Services. The health consequences of smoking: a report of the Surgeon General. Atlanta GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health, 2004. 25. Mokdad AH, Marks JS, Stroup DF, Gerberding JL. Actual causes of deaths in the United States, 2000. JAMA 2004;291:1238 – 45. 26. Fiore MC, Bailey WC, Cohen SJ, et al. Treating tobacco use and dependence: a clinical practice guideline. Rockville MD: U.S. Department of Health and Human Services, 2000 (also available at: www.surgeongeneral. gov/tobacco). 27. Quinn VP, Stevens VJ, Hollis JF, et al. Tobacco-cessation services and patient satisfaction in nine non-profit health plans. Am J Prev Med 2005;29:77– 84. 28. Hollis JF, Bills R, Whitlock E, Stevens VJ, Mullooly J, Lichtenstein E. Implementing tobacco interventions in the real world of managed care. Tobacco Control 2000;9(suppl 1):i18 –i24. 29. Lancaster T, Stead L, Silagy C, Sowden A. Effectiveness of interventions to help people stop smoking: findings from the Cochrane Library. BMJ 2000;321:355– 8. 30. U.S. Public Health Service. Tobacco Use and Dependence Clinical Practice Guideline Panel. A clinical practice guideline for treating tobacco use and dependence. JAMA 2000;283:3244 –54. 31. Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: a system for detecting and classifying encounter-based clinical events in any EMR. J Am Med Inform Assoc 2005;12 (in press) (electronic preprint available at: www.jamia.org). 32. Hripcsak G, Kuperman GJ, Friedman C, Heitjan DF. A reliability study for evaluating information extraction from radiology reports. J Am Med Inform Assoc 1999;6:143–50. 33. Conger A. Integration and generalization of kappas for multiple raters. Psychol Bull 1980;88:322– 8. 34. Di Eugenio B, Glass M. The kappa statistic: a second look. Computational Linguistics 2004;30:95–101.

Am J Prev Med 2005;29(5)

439