Representation of Ophthalmology Concepts by Electronic Systems Intercoder Agreement among Physicians Using Controlled Terminologies John C. Hwang, MD, MBA,1 Alexander C. Yu, MD, MPhil,2 Daniel S. Casper, MD, PhD,1 Justin Starren, MD, PhD,2,3 James J. Cimino, MD,2,4 Michael F. Chiang, MD, MA1,2 Objective: To assess intercoder agreement for ophthalmology concepts by 3 physician coders using 5 controlled terminologies (International Classification of Diseases 9, Clinical Modification [ICD9CM]; Current Procedural Terminology, fourth edition; Logical Observation Identifiers, Names, and Codes [LOINC]; Systematized Nomenclature of Medicine, Clinical Terms [SNOMED-CT]; and Medical Entities Dictionary). Design: Noncomparative case series. Participants: Five complete ophthalmology case presentations selected from a publicly available journal. Methods: Each case was parsed into discrete concepts. Electronic or paper browsers were used independently by 3 physician coders to assign a code for every concept in each terminology. A match score representing adequacy of assignment for each concept was assigned on a 3-point scale (0, no match; 1, partial match; 2, complete match). For every concept, the level of intercoder agreement was determined by 2 methods: (1) based on exact code matching with assignment of complete agreement when all coders assigned the same code, partial agreement when 2 coders assigned the same code, and no agreement when all coders assigned different codes, and (2) based on manual review for semantic equivalence of all assigned codes by an independent ophthalmologist to classify intercoder agreement for each concept as complete agreement, partial agreement, or no agreement. Subsequently, intercoder agreement was calculated in the same manner for the subset of concepts judged to have adequate coverage by each terminology, based on receiving a match score of 2 by at least 2 of the 3 coders. Main Outcome Measures: Intercoder agreement in each controlled terminology: complete, partial, or none. Results: Cases were parsed into 242 unique concepts. When all concepts were analyzed by manual review, the proportion of complete intercoder agreement ranged from 12% (LOINC) to 44% (SNOMED-CT), and the difference in intercoder agreement between LOINC and all other terminologies was statistically significant (P⬍0.004). When only concepts with adequate terminology were analyzed by manual review, the proportion of complete intercoder agreement ranged from 33% (LOINC) to 64% (ICD9CM), and there were no statistically significant differences in intercoder agreement among any pairs of terminologies. Conclusions: The level of intercoder agreement for ophthalmic concepts in existing controlled medical terminologies is imperfect. Intercoder reproducibility is essential for accurate and consistent electronic representation of medical data. Ophthalmology 2006;113:511–519 © 2006 by the American Academy of Ophthalmology.
Electronic health record (EHR) systems are used by approximately 15% to 20% of American medical institutions.1 They have been described by the Institute of Medicine as an
essential technology for modern health care, and their popularity is expected to increase.2–5 President George W. Bush established the goal of implementing a national network of
Originally received: August 17, 2005. Accepted: January 3, 2006. Manuscript no. 2005-776. 1 Department of Ophthalmology, Columbia University College of Physicians and Surgeons, New York, New York. 2 Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, New York. 3 Department of Radiology, Columbia University College of Physicians and Surgeons, New York, New York. 4 Department of Medicine, Columbia University College of Physicians and Surgeons, New York, New York.
Supported by a Career Development Award from Research to Prevent Blindness, New York, New York (MFC); National Eye Institute, Bethesda, Maryland (grant no.: EY13972 [MFC]); and National Library of Medicine, Bethesda, Maryland (grant no.: LM07079 [ACY]).
© 2006 by the American Academy of Ophthalmology Published by Elsevier Inc.
The authors have no commercial, proprietary, or financial interest in any of the products or companies described in the article. Correspondence and reprint requests to Michael F. Chiang, MD, Department of Ophthalmology, Columbia University College of Physicians and Surgeons, 635 West 165th Street, Box 92, New York, NY 10032. E-mail:
[email protected]. ISSN 0161-6420/06/$–see front matter doi:10.1016/j.ophtha.2006.01.017
511
Ophthalmology Volume 113, Number 4, April 2006 computer-based medical records in his 2004 State of the Union address, and the United States Department of Health and Human Services subsequently presented a 10-year strategy for the widespread adoption of interoperable EHRs.6,7 Electronic health record systems offer significant potential advantages for patient care over existing paper-based medical record systems.4,5 First, by facilitating the rapid storage, transmission, and retrieval of patient data, they improve communication among physicians and other health care workers. This is particularly important for image-oriented diagnostic-intense specialties such as ophthalmology, and is becoming more essential as patients increasingly are being cared for by multiple specialists in geographically disparate locations. Second, digital information management compresses the cycle time of clinical research by reducing data acquisition time. Third, electronic systems help manage administrative burdens associated with timely data transmission to legal and financial agencies and with privacy and security regulations such as the Health Insurance Portability and Accountability Act.8,9 Finally, EHR systems provide opportunities for improving physician productivity while decreasing costs.10 In addition, EHRs drive quality improvement through novel mechanisms. Coded inputs can trigger automated decision support, which creates opportunities to detect potential medical errors and provide clinical guidelines at the point of care.11–14 This assists clinicians in keeping pace with the exponential growth of medical knowledge from clinical trials and scientific innovation.15–17 The flexibility of electronic data retrieval also facilitates performance tracking based on processes of care or measurable clinical outcomes.18 A critical challenge for implementation of EHR systems for ophthalmology is the electronic representation of medical concepts. Because physicians often use inconsistent vocabulary to describe the same concept (e.g., preseptal cellulitis and periorbital cellulitis, or myocardial infarction and heart attack), the transmission and interpretation of meaning by computer systems require the structured representation of medical data using controlled medical terminologies. Existing terminologies include International Classification of Diseases 9, Clinical Modification (ICD9CM),19 and Current Procedural Terminology, fourth edition (CPT-4),20 which are commonly used for billing and reporting; Systematized Nomenclature of Medicine, Clinical Terms (SNOMEDCT),21 which is a comprehensive clinical reference terminology; Logical Observation Identifiers, Names, and Codes (LOINC),22 which provides universal names and codes for laboratory and clinical observations; and Medical Entities Dictionary (MED),13 which is a local terminology developed and used at Columbia University. To deal with ambiguity caused by physicians’ use of natural variations in language to represent the same concepts, controlled terminologies establish a shared vocabulary for effective communication across health care entities and information systems. These terminologies map synonyms to a common concept so that similar things can be categorized together, thereby providing the infrastructure to support powerful functionalities such as data analysis for retrospective research, support for prospective clinical tri-
512
als, and support for evidence-based practice.5,11,12 Coded data representation is essential for answering simple queries such as “Find all of my patients who underwent pars plana vitrectomy with membranectomy for treatment of epiretinal membrane.” In this particular example, a consistent coded data representation is particularly important because several of the terms have commonly used synonyms that would significantly complicate accurate data retrieval. The standard clinical ophthalmology examination is highly structured and well suited for EHR systems. Many ophthalmologists already record examination findings and impressions using paper-based templates, which could easily be translated into structured electronic templates based on controlled terminologies. Two critical factors must be established for controlled terminologies to support EHR systems adequately for ophthalmology: (1) terminologies must have adequate coverage for representing ophthalmic and general medical concepts,14,23 and (2) in many existing EHR systems, data entry is performed directly by physicians at the time and point of care.24 If ophthalmic concepts are not coded consistently among physicians, it will be impossible subsequently to retrieve aggregated information accurately and completely from electronic systems. Therefore, the intercoder agreement of ophthalmic concept representation by multiple physicians using controlled terminologies must be sufficiently high. Previous research on coding reproducibility suggests that reliability is relatively low among physicians and between physicians and professional coders.24 –35 We are not aware of any previous studies that have examined the issue of intercoder agreement for ophthalmology concepts. The goal of this study is to evaluate the intercoder agreement of ophthalmology concepts using 5 major controlled terminologies: ICD9CM, CPT-4, SNOMED, LOINC, and MED. Intercoder agreement will be analyzed for concepts found in a set of representative ophthalmologic clinical presentations. This report builds on our previous studies, which showed that SNOMED-CT provides significantly higher coverage of ophthalmic concepts than these other controlled terminologies.36 The current study focuses specifically on intercoder agreement using controlled terminologies. It does not address topics such as content coverage limitations of existing terminologies, or provider education regarding coding.
Materials and Methods Source of Data Five ophthalmology case presentations were selected from the Grand Rounds section of the Digital Journal of Ophthalmology, a publicly available online journal (http://www.djo.harvard.edu, accessed June 15, 2005). Each case included up to 7 sections: history, physical examination, laboratory test results, radiology results, pathological examination, differential diagnosis, and final diagnosis. Case presentations represented both outpatient and inpatient encounters, and one presentation was related to each of the following ophthalmology subspecialties: corneal/external, oculoplastics, neuro-ophthalmology, uveitis, and pediatric/strabismus. An example is shown in Figure 1. Although case presentations were based on actual patient data, no identifying information was
Hwang et al 䡠 Intercoder Agreement in Controlled Terminologies
Figure 1. Example of “History” (A) and “Ancillary Testing” (B) sections from a representative case presentation. Reproduced with permission from the Digital Journal of Ophthalmology (http://www.djo.harvard.edu).
present. Therefore, institutional review board approval was not required because this study involved only analysis of existing publicly available data that were not individually identifiable.
Controlled Medical Terminologies Five structured medical terminologies were chosen for this study based on their current or potential use by the ophthalmology community: ICD9CM, CPT-4, LOINC, SNOMED-CT, and MED (Table 1). Standard electronic browsers or paper catalogs were used to search each terminology. Although MED is a terminology developed at Columbia University and New York Presbyterian Hospital,13 it was included in this study because many existing EHR systems rely on locally developed knowledge bases and because MED is among the most advanced such local terminologies.
Parsing, Coding, and Scoring of Cases For each case, the text of the history, physical examination, radiology, pathology, differential diagnosis, and final diagnosis sections was parsed into discrete concepts by 3 physicians (JCH, ACY, MFC) based on a uniform methodology. Multiple-word precoordinated terms were considered to be a single concept when judged clinically appropriate. For example, terms such as floaters, diabetic retinopathy, and painless progressive visual loss were parsed as single concepts. Concepts parsed by the 3 coders were integrated into a single list of unique concepts for subsequent coding. This process was intended to represent the indexing of individual concepts in a case presentation, so that they could be used to facilitate subsequent information retrieval and reuse for patient care, research studies, or administrative purposes. Coding and scoring of concepts was performed according to previously published methods.14,36 –38 Electronic or paper brows-
513
Ophthalmology Volume 113, Number 4, April 2006 Table 1. Controlled Medical Terminologies Used in Study Terminology
Version
Method of Access
ICD9-CM CPT-4 SNOMED-CT MED LOINC
2005 2005 January 2005 2005 December 2004
Electronic* Paper catalog† Electronic‡ Electronic§ Electronic储
CPT-4 ⫽ Current Procedural Terminology, fourth edition; ICD9CM ⫽ International Classification of Diseases 9, Clinical Modification; LOINC ⫽ Logical Identifiers, Names, and Codes; MED ⫽ Medical Entities Dictionary; SNOMED-CT ⫽ Systematized Nomenclature of Medicine, Clinical Terms. *Yaki Technologies. ICD9-CM at eicd.com. Available at: http://www. eicd.com/EICDMain.htm. Accessed June 4, 2005. † CPT 2004 Current Procedural Terminology: standard edition, Chicago: American Medical Association; 2004. ‡ Virginia-Maryland Regional College of Veterinary Medicine. Virginia Tech. SNOMED-CT browser. Available at: http://snomed.vetmed.vt.edu/ sct/menu.cfm. Accessed June 16, 2005. § Columbia University Department of Biomedical Informatics. Medical Entities Dictionary. MED browsers. Available at: http://med.dmi.columbia. edu/browser.htm. Accessed July 5, 2005. 储 Regenstrief Institute. LOINC and RELMA downloads. Available at: http://www.loinc.org/download. Accessed July 15, 2005.
ers were used independently by 3 coders (JCH, ACY, MFC) to assign a code for every concept in each of the 5 terminologies. One coder was a practicing ophthalmologist, and the other 2 were nonpracticing general physicians. Two coders had extensive postdoctoral training in biomedical informatics and controlled terminologies, and the third received several months of focused training and experience in these areas. All codes were assigned at the highest available level of detail. For example, ICD9CM assignments were complete 5-digit codes because that level is required for actual coding and billing. The adequacy of assignment for each concept was scored by each of the 3 coders on a 3-point Likert scale: 0, if no match for the concept existed in the terminology; 1, if a partial match existed in the terminology; and 2, if an exact match was present. Synonyms were accepted as matches based on definitions provided by each terminology, as well as judgment of the reviewer. Concepts were excluded from reliability analysis if all coders found no match in the terminology. SNOMED-CT terminology permitted generation of complex concepts through postcoordination of multiple simpler concepts.39 For example, “preauricular lymphadenopathy” did not exist as a concept in SNOMED-CT, but could be coded through postcoordination of the existing terms preauricular and lymphadenopathy. These properly postcoordinated terms were accepted as complete matches for the purposes of this study.
Intercoder Agreement For every concept, the observed level of agreement among codes assigned by the 3 coders was grouped into 1 of 3 categories: complete agreement, partial agreement, and no agreement. This was done by 2 methods: 1. Based on automated determination of exact match of codes assigned by the 3 coders. In this method, coders were classified as having complete agreement when all coders assigned the same code, partial agreement when only 2 coders assigned the same code, and no agreement when all coders assigned different codes.
514
2. Based on manual review for semantic equivalence of all assigned codes by an independent practicing ophthalmologist (DSC). In this method, clinical judgment was used to classify intercoder agreement for each concept as complete agreement, partial agreement, or no agreement. Each situation where exact code matching and manual review for semantic equivalence produced differences in classification was reviewed by all authors to identify the cause for discrepancy as (1) no difference in meaning among single codes [e.g., SNOMED-CT 194119004, “Partial oculomotor nerve palsy (disorder),” and SNOMED-CT 194118007, “Partial third nerve palsy (disorder)”], (2) no difference in meaning among terms that was judged to be clinically significant (e.g., MED 44478, “Ciprofloxacin 0.3 %/ml,” and MED 28397, “Ciloxan 0.3% Ohpth Sol 2.5 m”), or (3) no difference in meaning because of postcoordinated codes in SNOMED-CT [e.g., 301955006, “Iris normal (finding),” and (181164000)(17621005), “Entire iris (body structure), Normal (qualifier value)”]. Our previous work has demonstrated that ophthalmology content coverage by controlled terminologies is highest for SNOMED-CT and lowest for CPT-4.36 In addition, different coding systems were designed for different purposes, and this affects content coverage.36 To account for this variation and to measure coding reliability only among concepts for which terminologies had adequate coverage, the observed intercoder agreement was also determined for the subset of concepts with adequate coverage by each terminology based on assignment of a match score of 2 by at least 2 of the 3 coders.
Data Analysis Findings from all case presentations were combined, and terminologies were compared based on categorical level of coding agreement (complete, partial, or none) among the 3 coders. Numerical computations were performed using a spreadsheet package (Excel 2003, Microsoft, Redmond, WA). Statistical comparison of categorical findings was performed using the chi-square test or Fisher exact test, with Bonferroni correction where appropriate.
Results Data Set Characteristics and Coding Completeness The overall data set consisted of 242 unique concepts from 5 case presentations. SNOMED-CT had codes for the largest number of concepts (227/242), followed by MED (160/242), ICD9CM (91/ 242), LOINC (78/242), and CPT-4 (24/242).
Intercoder Agreement for All Concepts Level of intercoder agreement and coverage for all unique concepts are displayed graphically in Figure 2 and depicted numerically in Table 2. When comparing levels of agreement among the 3 coders by exact matching (column A), the difference in agreement between LOINC and all other terminologies was statistically significant (Pⱕ0.0002), and the difference between SNOMED-CT and MED was statistically significant (Pⱕ0.001). When comparing levels of agreement among coders using manual review for semantic equivalence by an independent ophthalmologist (column B), the difference in agreement between LOINC and all other terminologies was statistically significant (P⬍0.004). There was no statistically significant difference in intercoder agreement for other pairs of terminologies. Table 3 displays sources of discrepancy in levels of agreement between exact code matching and manual review by an indepen-
Percentage of Concepts
Hwang et al 䡠 Intercoder Agreement in Controlled Terminologies *
100%
§
†
†
80% No Agreement Partial Agreement Complete Agreement
60% 40% 20% 0% A A
B B
A A
ICD-9CM ICD-9CM
BB
A A
CPT-4 CPT-4
A A
B B
B B
A A
SNOMED-CT SN
LOINC LOINC
B B
MED MED
Terminology
Figure 2. Proportion of intercoder agreement for all concepts. Each concept was coded by 3 clinician coders, and level of agreement was classified as complete, partial, or none. For each terminology, 2 sets of bars are displayed. Bar A is based on exact code matching, in which coders were classified to have complete agreement when all coders assigned the same code; partial agreement, when 2 coders assigned the same code; and no agreement, when all coders assigned different codes. Bar B is based on manual review for semantic equivalence of assigned codes by an independent ophthalmologist. CPT-4 ⫽ Current Procedural Terminology, fourth edition; ICD-9CM ⫽ International Classification of Diseases 9, Clinical Modification; LOINC ⫽ Logical Identifiers, Names, and Codes; MED ⫽ Medical Entities Dictionary; SNOMED-CT ⫽ Systematized Nomenclature of Medicine, Clinical Terms. *With exact code matching, the difference in level of agreement between LOINC and all other terminologies was statistically significant (Pⱕ0.0002). †With exact code matching, the difference in level of agreement between SNOMED-CT and MED was statistically significant (Pⱕ0.001). §With manual review for semantic equivalence, the difference in level of agreement between LOINC and all other terminologies was statistically significant (P⬍0.004).
(179/242), followed by MED (83/242), ICD9CM (42/242), LOINC (15/242), and CPT-4 (11/242).
dent ophthalmologist. SNOMED-CT had the largest number of discrepancies in level of agreement between exact code matching and manual review (n ⫽ 42). Among these discrepancies, 19 of 42 resulted from postcoordination, whereas 22 of 42 resulted because of no clinically significant difference in meaning. MED had the next largest number of discrepancies in level of agreement (n ⫽ 10, of which 90% resulted from no clinically significant differences in meaning), followed by LOINC (n ⫽ 7, of which all resulted from no clinically significant differences in meaning).
Discussion This study evaluated intercoder agreement among 3 physicians for coding ophthalmology concepts using 5 major controlled medical terminologies: ICD9CM, CPT-4, SNOMEDCT, LOINC, and MED. Two main findings may be extrapolated from Table 4:
Intercoder Agreement for Concepts with Adequate Terminology Coverage
1. The coverage of existing terminologies for ophthalmic concepts is highly variable and generally low, ranging from 4.5% (11/242 for CPT-4) to 74.0% (179/242 for SNOMED-CT). This is consistent with our previously published results.36 2. Even when terminologies are able to represent ophthalmic concepts adequately, the intercoding agreement among 3 physicians is imperfect, ranging from complete agreement of 33% (5/15 for LOINC) to 64% (27/42 for ICD9CM). This is consistent with previous studies of intercoder reliability.24 –35
Level of intercoder agreement and coverage for unique concepts judged to have adequate terminology coverage are displayed graphically in Figure 3 and depicted numerically in Table 4. When comparing levels of agreement among the 3 coders by exact matching (column A), the difference in agreement was statistically significant only for LOINC compared with ICD9CM (P ⫽ 0.0014). When comparing levels of agreement among coders using manual review for semantic equivalence by an independent ophthalmologist (column B), there were no statistically significant differences among any pairs of terminologies. SNOMED-CT had codes judged to be acceptable for the largest number of concepts
Table 2. Intercoder Agreement for All Concepts ICD9CM (n ⴝ 91)
CPT-4 (n ⴝ 24)
LOINC (n ⴝ 78)
SNOMED-CT (n ⴝ 227)
MED (n ⴝ 160)
Level of Agreement
A
B
A
B
A*
B†
A‡
B
A‡
B
Complete Partial None
33 (36%) 45 (49%) 13 (14%)
35 (38%) 45 (49%) 11 (12%)
10 (42%) 12 (50%) 2 (8%)
10 (42%) 12 (50%) 2 (8%)
5 (6%) 63 (81%) 10 (13%)
9 (12%) 63 (81%) 6 (8%)
80 (35%) 97 (43%) 50 (22%)
101 (44%) 99 (44%) 27 (12%)
49 (31%) 96 (60%) 15 (9%)
56 (35%) 92 (58%) 12 (8%)
CPT-4 ⫽ Current Procedural Terminology, fourth edition; ICD9CM ⫽ International Classification of Diseases 9, Clinical Modification; LOINC ⫽ Logical Identifiers, Names, and Codes; MED ⫽ Medical Entities Dictionary; SNOMED-CT ⫽ Systematized Nomenclature of Medicine, Clinical Terms. *With exact code matching the difference in level of agreement between LOINC and all other terminologies was statistically significant (Pⱕ0.0002). † With manual review for semantic equivalence, the difference in level of agreement between LOINC and all other terminologies was statistically significant (P⬍0.004). ‡ With exact code matching, the difference in level of agreement between SNOMED-CT and MED was statistically significant (Pⱕ0.001).
515
Ophthalmology Volume 113, Number 4, April 2006 Table 3. Sources of Discrepancy in Level of Agreement for All Concepts between Exact Matching and Manual Review for Semantic Equivalence of Assigned Codes by an Independent Ophthalmologist Source of Apparent Disagreement
ICD9CM (n ⴝ 4)
CPT-4 (n ⴝ 0)
LOINC (n ⴝ 7)
SNOMED-CT (n ⴝ 42)
MED (n ⴝ 10)
No difference in meaning No clinically significant difference Postcoordination
0 (0%) 4 (100%) 0 (0%)
0 (0%) 0 (0%) 0 (0%)
0 (0%) 7 (100%) 0 (0%)
1 (2%) 22 (52%) 19 (45%)
1 (10%) 9 (90%) 0 (0%)
CPT-4 ⫽ Current Procedural Terminology, fourth edition; ICD9CM ⫽ International Classification of Diseases 9, Clinical Modification; LOINC ⫽ Logical Identifiers, Names, and Codes; MED ⫽ Medical Entities Dictionary; SNOMED-CT ⫽ Systematized Nomenclature of Medicine, Clinical Terms.
Percentage of Concepts
Consistent and accurate coding of ophthalmic concepts is required to drive improvements in clinical care, research, and regulatory compliance. Without strict coding of concepts, natural language usage by physicians may be inexact and ambiguous, and impossible for electronic systems to process.40 Currently, terminologies such as ICD9CM and CPT-4 are used to code diagnostic and procedural information for administration and billing, and administrative databases containing these codes have been used widely for financial, quality assurance, and retrospective clinical research purposes.14,31,34,41– 45 Of course, the validity of this approach will depend heavily on the extent to which identical concepts may be encoded reproducibly in controlled terminologies by multiple physicians. More broadly, reproducibility of coding is essential for the successful implementation of EHR systems. The imperfect intercoder agreement results from this study raise concerns about the reliability of structured medical data in these real-world situations in which codes are either assigned by physicians at the point of care or mapped to terms used in an EHR system that have previously been mapped to a particular code in a controlled medical terminology. In particular, SNOMED-CT recently has been promoted as a reference terminology for EHR systems, and its comprehensive content coverage has been demonstrated by previous studies in
100%
*
ophthalmology and other medical domains.14,36,37,46 In 2003, the National Library of Medicine of the National Institutes of Health signed a 5-year, $32 million agreement with the College of American Pathologists, the developers of the SNOMED terminology, to make SNOMED-CT freely available to all American health care institutions and vendors. The goals of this agreement were to broaden the usage of interoperable EHR systems, improve patient care, and improve patient safety.47,48 However, our current study shows that the complete intercoder agreements for SNOMED-CT based on manual evaluation for semantic equivalence were 44% for all concepts and 55% for concepts judged to have adequate coverage (Tables 2, 4). From another perspective, the proportions of concepts with no intercoder agreement for SNOMED-CT based on manual evaluation were 12% for all concepts and 8% for concepts judged to have adequate coverage (Tables 2, 4). Applications such as clinical research and quality assurance may require higher levels of intercoder agreement. When considering all concepts, LOINC had a significantly different pattern of intercoder agreement, with fewer complete agreements and more partial agreements than any other controlled terminology (Fig 2). In some respects, LOINC has a finer level of granularity than the other terminologies in this study because it is largely used to com-
*
80% No Agreement Partial Agreement Complete Agreement
60% 40% 20% 0%
A B A B ICD-9CM ICD-9CM
A B A CPT-4 CPT-4
A B A B LOINC LOINC
A B A B SNOMED-CT SNOMED-CT
A B A B MED MED
Terminology
Figure 3. Proportion of intercoder agreement for concepts judged to have adequate terminology coverage. Only concepts that received a rating of perfect code match for concept in that terminology by at least 2 of 3 coders were included. Level of agreement among 3 clinician coders was classified as complete, partial, or none. For each terminology, 2 sets of bars are displayed. Bar A is based on exact code matching, in which coders were classified to have complete agreement when all coders assigned the same code; partial agreement, when 2 coders assigned the same code; and no agreement, when all coders assigned different codes. Bar B is based on manual review for semantic equivalence of assigned codes by an independent ophthalmologist. CPT-4 ⫽ Current Procedural Terminology, fourth edition; ICD-9CM ⫽ International Classification of Diseases 9, Clinical Modification; LOINC ⫽ Logical Identifiers, Names, and Codes; MED ⫽ Medical Entities Dictionary; SNOMED-CT ⫽ Systematized Nomenclature of Medicine, Clinical Terms. *With exact code matching, the difference in level of agreement was statistically significant for LOINC compared with ICD-9CM (P ⫽ 0.0014). No other difference in level of agreement between pairs of terminologies was statistically significant.
516
Hwang et al 䡠 Intercoder Agreement in Controlled Terminologies Table 4. Intercoder Agreement for Concepts Judged to Have Adequate Terminology Coverage ICD9CM (n ⴝ 42)
CPT-4 (n ⴝ 11)
LOINC (n ⴝ 15)
SNOMED-CT (n ⴝ 179)
MED (n ⴝ 83)
Level of Agreement
A*
B
A
B
A*
B
A
B
A
B
Complete Partial None
25 (60%) 13 (31%) 4 (10%)
27 (64%) 12 (29%) 3 (7%)
6 (55%) 4 (36%) 1 (9%)
6 (55%) 4 (36%) 1 (9%)
1 (7%) 9 (60%) 5 (33%)
5 (33%) 8 (53%) 2 (13%)
79 (44%) 68 (38%) 32 (18%)
99 (55%) 66 (37%) 14 (8%)
39 (47%) 35 (42%) 9 (11%)
46 (55%) 31 (37%) 6 (7%)
CPT-4 ⫽ Current Procedural Terminology, fourth edition; ICD9CM ⫽ International Classification of Diseases 9, Clinical Modification; LOINC ⫽ Logical Identifiers, Names, and Codes; MED ⫽ Medical Entities Dictionary; SNOMED-CT ⫽ Systematized Nomenclature of Medicine, Clinical Terms. *With exact code matching, the difference in level of agreement was statistically significant for LOINC compared with ICD9CM (P ⫽ 0.0014). No other difference in level of agreement between pairs of terminologies was statistically significant.
municate very specific details about laboratory tests between electronic systems.49 It is possible that intercoder agreement might be decreased when clinicians have to choose among highly specific concepts that seem to be very similar (e.g., “Serum rheumatoid factor, titer” [17857-4] and “Serum rheumatoid factor, qualitative” [33910-1]). Furthermore, in situations where a particular concept does not exist in LOINC, coders may be forced to choose a similar LOINC concept that is only a partial match. However, we note that when considering only concepts judged to be adequately represented by each terminology, there were no statistically significant differences among the patterns of intercoder agreement for any pairs of terminologies (Fig 3B). Finally, terminologies such as LOINC and CPT-4 are organized as large lists of concepts, whereas the other terminologies used in this study are organized as hierarchies (e.g., refractive amblyopia and strabismic amblyopia are categorized under amblyopia). It is conceivable that terminologies not organized as hierarchies could be more difficult for physicians and coders to search reproducibly. Furthermore, hierarchical organizations provide an advantage by allowing searches to be focused more broadly or narrowly; for example, a broader search may identify cases that were coded for subvariants of the item under study. Controlled terminologies are developed for different applications. For example, ICD9CM is maintained primarily for administrative and financial reasons, SNOMED-CT is intended as a comprehensive reference terminology, and LOINC was designed originally to support transmission of laboratory data.39 Our previous work in ophthalmology content coverage by controlled terminologies has suggested that coverage depends on the intended purpose of the terminology, and our present results are consistent with earlier work in demonstrating that SNOMED-CT has the highest ophthalmology content coverage.36 Overall, comparison among terminologies must include analysis of content coverage and intercoder agreement for concepts that are judged to have adequate coverage (such as shown in Fig 3).12 When concepts in this study were divided into subcategories of procedures, diagnoses, or findings according to previously published methodologies,36 there were no statistically significant differences between overall intercoder agreement for concepts with adequate terminology coverage and the subcategories of procedures, diagnoses, or findings (data not shown).
In this study, intercoder agreement was determined by exact code matching, as well as by manual review of codes for semantic equivalence by an independent ophthalmologist. It is instructive to examine the reasons for discrepancy between these 2 methodologies (Table 3). SNOMED-CT allows generation of concepts by postcoordination, a process in which multiple existing concepts are combined to form new concepts. In the majority of situations for this study, codings by multiple physicians that were judged semantically equivalent despite discrepancies in the actual assigned codes were felt to have no clinically significant difference in meaning (Table 3). For example, SNOMED-CT has separate terms for congenital ptosis of upper eyelid (60938005) and congenital ptosis (268163008), as well as ciprofloxacin 0.3% product (330418005) and ciprofloxacin substance (372840008). There are important benefits to terminologies that permit expression of subtle differences in meaning. However, a disadvantage is that this may decrease intercoder agreement because physicians may have difficulty distinguishing among concepts that seem to be very similar, particularly if these decisions must be made quickly at the point of care. It will be important for terminology developers and coding specialists to develop better guidelines and terminology browsing tools that may be used by ophthalmologists and professional coders to improve coding reliability. Since 2001, the American Academy of Ophthalmology has collaborated with the College of American Pathologists on refining the SNOMED-CT terminology for adequate ophthalmology content coverage.23 Similar interactions between ophthalmologists and developers of other terminologies may help both to ensure adequate content coverage and potentially to increase the likelihood for intercoder agreement. Several limitations of this study should be noted: 1. Intercoder agreement was based only on results from 3 physician coders, only one of whom was a practicing ophthalmologist. Comparison of codes among the 3 physicians did not reveal any clear pattern of disagreements. However, a larger follow-up study involving a broader cross section of coders may be helpful in determining generalizability among ophthalmologists, other physicians, and nonphysician coders.
517
Ophthalmology Volume 113, Number 4, April 2006 2. The scoring methodology for quality of matching (no match, partial match, exact match) has been used in multiple studies of terminology content coverage,14,36,37 but is inherently subjective. However, when data were reanalyzed to determine intercoder agreement for concepts that were scored exact match by any of the 3 coders (rather than 2 of the 3), there were no statistically significant differences in agreement (data not shown). 3. In one component of this study, a single ophthalmologist manually identified concepts that had semantically equivalent code expressions. Although these distinctions seemed to be fairly straightforward in most situations, it is conceivable that opinions may differ on whether particular expressions articulate the same concept. 4. The data set for this study was limited in size, and may not necessarily represent problems encountered in a typical clinical practice. Future studies involving larger clinical data sets with routine as well as complex concepts may be informative. For example, further studies designed specifically to elucidate additional common characteristics of concepts that are coded either consistently or inconsistently by multiple physicians may be useful. 5. This study relied on the coding of concepts in several specific computer-based terminology browsers (Table 1). It is possible that results could be affected based on sophistication of those browsers. For example, some browsers may be able to match for synonyms of the particular term being searched for (e.g., realizing that thyroid eye disease and thyroid ophthalmopathy have equivalent meanings), whereas other existing browsers do not perform this function. Electronic systems are commonly used by physicians for scheduling as well as laboratory, pharmacy, and radiology review. In ophthalmology, the use of digital imaging technologies and clinical information systems is already widespread, and likely to increase significantly.6,7 These trends create exciting new opportunities such as improved efficiency of clinical care, data analysis for retrospective clinical research, support for prospective studies, computerbased decision support, accelerated diffusion of knowledge with reduced variability in care, strengthened privacy and data protection, promotion of public health, and support for evidence-based medicine.2–5 Many of these important functionalities are based on structured data entry in controlled medical terminologies at the point of care, and require that terminologies have both adequate concept coverage and reproducibility across multiple coders. In this study, we have shown that the intercoder agreement for ophthalmology concepts among 3 physicians using 5 major controlled terminologies is limited. A combination of physician training, terminology refinement, and development of more sophisticated terminology browsers may be required to improve intercoder agreement for maximizing the potential benefits of EHR systems.
518
References 1. Grove AS. Efficiency in the health care industries: a view from the outside. JAMA 2005;294:490 –2. 2. Shortliffe EH. The evolution of electronic medical records. Acad Med 1999;74:414 –9. 3. Kassirer JP. The next transformation in the delivery of health care. N Engl J Med 1995;332:52– 4. 4. Dick RS, Steen EB, Detmer DE, eds, Committee on Improving the Patient Record, Institute of Medicine. The computerbased patient record: an essential technology for health care. Rev ed. Washington: National Academy Press; 1997:52–3. Available at: http://www.nap.edu/books/0309055326/html/ index.html. Accessed December 28, 2005. 5. Committee on Data Standards for Patient Safety, Institute of Medicine. Key capabilities of an electronic health record system: letter report. Washington: National Academy Press; 2003:1–2. Available at: http://www.nap.edu/books/NI000427/ html. Accessed December 28, 2005. 6. The White House. State of the Union address (2004). Available at: http://www.whitehouse.gov/news/releases/2004/01/ 20040120-7.html. Accessed July 31, 2005. 7. United States Department of Health and Human Services. Office of the National Coordinator for Health Information Technology (ONC). Goals of strategic framework. Available at: http://www.hhs.gov/healthit/goals.html. Accessed July 31, 2005. 8. Moody M. HIPAA strengthens business case for electronic report distribution systems. J Healthc Inf Manag 2002;16: 47–51. 9. Kilbridge P. The cost of HIPAA compliance. N Engl J Med 2003;348:1423– 4. 10. Miller RH, West C, Brown TM, et al. The value of electronic health records in solo or small group practices. Physicians’ EHR adoption is slowed by a reimbursement system that rewards the volume of services more than it does their quality. Health Aff (Millwood) 2005;24:1127–37. 11. Hammond WE. Call for a standard clinical vocabulary. J Am Med Inform Assoc 1997;4:454 –5. 12. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med 1998;37:394 – 403. 13. Cimino JJ. From data to knowledge through concept-oriented terminologies: experience with the Medical Entities Dictionary. J Am Med Inform Assoc 2000;7:288 –97. 14. Chute CG, Cohn SP, Campbell KE, et al. The content of clinical classifications. For the Computer-Based Patient Record Institute’s Work Group on Codes & Structures. J Am Med Inform Assoc 1996;3:224 –33. 15. Greenes RA, Shortliffe EH. Medical informatics: an emerging academic discipline and institutional priority. JAMA 1990; 263:1114 –20. 16. Arndt KA. Information excess in medicine: overview, relevance to dermatology, and strategies for coping. Arch Dermatol 1992;128:1249 –56. 17. Humphreys BL, McCutcheon DE. Growth patterns in the National Library of Medicine’s serials collection and in Index Medicus journals, 1966-1985. Bull Med Libr Assoc 1994;82: 18 –24. 18. Kohn LT, Corrigan JM, Donaldson MS, eds, Committee on Quality of Health Care in America, Institute of Medicine. To err is human: building a safer health system. Washington: National Academy Press; 2000:178. Available at: http:// books.nap.edu/books/0309068371/html/R1.html. Accessed December 28, 2005.
Hwang et al 䡠 Intercoder Agreement in Controlled Terminologies 19. Hart AC, Hopkins CA, eds. ICD-9-CM: For Physicians, 2005 Expert, Vols. 1-2. Salt Lake City: Ingenix; 2004:i–iii. 20. Kirschner CG, Burkett RC, Coy JJ, et al, eds. CPT 2004 Current Procedural Terminology: Standard Edition. Chicago: American Medical Association; 2003:xiii–xv. 21. Wang AY, Sable JH, Spackman KA. The SNOMED clinical terms development process: refinement and analysis of content. Proc AMIA Symp 2002;845-9. 22. Huff SM, Rocha RA, McDonald CJ, et al. Development of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. J Am Med Inform Assoc 1998;5:276 –92. 23. American Academy of Ophthalmology. Academy praises standardized clinical terminology system to be available to U.S. users [press release; May 6, 2004]. Available at: http:// www.aao.org/news/release/20040506.cfm. Accessed July 29, 2005. 24. Lorence DP, Ibrahim IA. Disparity in coding concordance: do physicians and coders agree? J Health Care Finance 2003;29: 43–53. 25. Dixon J, Sanderson C, Elliott P, et al. Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals. J Public Health Med 1998;20:63–9. 26. Nilsson G, Petersson H, Ahlfeldt H, Strender LE. Evaluation of three Swedish ICD-10 primary care versions: reliability and ease of use in diagnostic coding. Methods Inf Med 2000;39: 325–31. 27. Lorence DP, Ibrahim IA. Benchmarking variation in coding accuracy across the United States. J Health Care Finance 2003;29:29 – 42. 28. Lorence D. Regional variation in medical classification agreement: benchmarking the coding gap. J Med Syst 2003; 27:435– 43. 29. Morimoto T, Gandhi TK, Seger AC, et al. Adverse drug events and medication errors: detection and classification methods. Qual Saf Health Care 2004;13:306 –14. 30. Yao P, Wiggs BR, Gregor C, et al. Discordance between physicians and coders in assignment of diagnoses. Int J Qual Health Care 1999;11:147–53. 31. Bentley PN, Wilson AG, Derwin ME, et al. Reliability of assigning correct current procedural terminology-4 E/M codes. Ann Emerg Med 2002;40:269 –74. 32. Chao J, Gillanders WG, Flocke SA, et al. Billing for physician services: a comparison of actual billing with CPT codes assigned by direct observation. J Fam Pract 1998;47:28 –32. 33. Kikano GE, Goodwin MA, Stange KC. Evaluation and management services: a comparison of medical record documentation with actual billing in a community family practice. Arch Fam Med 2000;9:68 –71. 34. Zuber TJ, Rhody CE, Muday TA, et al. Variability in code selection using the 1995 and 1998 HCFA documentation guidelines for office services. J Fam Pract 2000;49:642–5. 35. King MS, Lipsky MS, Sharp L. Expert agreement in Current
36.
37.
38. 39. 40. 41.
42. 43.
44. 45. 46.
47.
48.
49.
Procedural Terminology evaluation and management coding. Arch Intern Med 2002;162:316 –20. Chiang MF, Casper DS, Cimino JJ, Starren J. Representation of ophthalmology concepts by electronic systems. Adequacy of controlled medical terminologies. Ophthalmology 2005; 112:175– 83. Humphreys BL, McCray AT, Cheh ML. Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test. J Am Med Inform Assoc 1997;4:484 –500. Langlotz CP, Caldwell SA. The completeness of existing lexicons for representing radiology report information. J Digit Imaging 2002;15(suppl):201–5. Cimino JJ. Coding systems in health care. Methods Inf Med 1996;35:273– 84. Campbell KE, Oliver DE, Spackman KA, Shortliffe EH. Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc 1998;5:421–31. Romano PS, Luft HS, Rainwater JA, Zach AP. Report on heart attack 1991-1993, volume 2: technical guide. Sacramento: California Office of Statewide Health Planning and Development; 1997:37– 8. Available at: http://www.oshpd. cahwnet.gov/HQAD/Outcomes/Studies/HeartAttacks/ami_9193/V29193.pdf. Accessed December 28, 2005. Morse AR, Yatzkan E, Berberich B, Arons RR. Acute care hospital utilization by patients with visual impairment. Arch Ophthalmol 1999;117:943–9. Chiang MF, Arons RR, Flynn JT, Starren JB. Incidence of retinopathy of prematurity from 1996-2000. Analysis of a comprehensive New York state patient database. Ophthalmology 2004;111:1317–25. Iezzoni LI. Assessing quality using administrative data. Ann Intern Med 1997;127:666 –74. Bates DW, Evans RS, Murff H, et al. Detecting adverse events using information technology. J Am Med Inform Assoc 2003; 10:115–28. Campbell JR, Carpenter P, Sneiderman C, et al. Phase II evaluation of clinical coding schemes: completeness, taxonomy, mapping, definitions, and clarity. J Am Med Inform Assoc 1997;4:238 –51. U.S. Department of Health and Human Services. HHS launches new efforts to promote paperless health care system [press release; July 1, 2003]. Available at: http://www.hhs. gov/news/press/2003pres/20030701.html. Accessed July 31, 2005. National Library of Medicine. Unified Medical Language System. SNOMED license agreement. Available at: http:// www.nlm.nih.gov/research/umls/Snomed/snomed_license.html. Accessed July 31, 2005. Bakken S, Cimino JJ, Haskell R, et al. Evaluation of the clinical LOINC (Logical Observation Identifiers, Names, and Codes) semantic structure as a terminology model for standardized assessment measures. J Am Med Inform Assoc 2000;7:529 –38.
519