Impact of different definitions on estimates of accuracy of the diagnosis data in a clinical database

Impact of different definitions on estimates of accuracy of the diagnosis data in a clinical database

Journal of Clinical Epidemiology 54 (2001) 782–788 Impact of different definitions on estimates of accuracy of the diagnosis data in a clinical datab...

335KB Sizes 0 Downloads 67 Views

Journal of Clinical Epidemiology 54 (2001) 782–788

Impact of different definitions on estimates of accuracy of the diagnosis data in a clinical database Charles R. Woods* Department of Pediatrics, Wake Forest University School of Medicine, Medical Center Blvd., Winston-Salem, NC 27157, USA Received 19 April 2000; received in revised form 20 December 2000; accepted 21 December 2000

Abstract Computerized medical databases are increasingly used for research. The influence of different definitions of the accuracy of matching on the estimated accuracy of diagnosis data was assessed in a database of visits to a public pediatric clinic. Differences between definitions involved 1) unit of analysis, 2) number of diagnoses required to match per visit, and/or 3) whether database contents are required to match the medical record or medical record contents are required to be matched in the database. Overall, 90% of diagnoses in the database (391/435) were accurately coded relative to the medical record. Alternatively, 77% of diagnoses listed in the medical record (391/ 506) were accurately coded in the database. When individual visits were used as the unit of analysis, estimates of accuracy using six definitions ranged from 65% to 92%. The most appropriate definition to use for estimating accuracy of diagnosis data likely depends on the purpose of the study. Use of two or more such definitions may enhance portrayal of the accuracy of diagnosis data. © 2001 Elsevier Science Inc. All rights reserved. Keywords: Databases; Data interpretation statistical; Data collection; Public health administration; Quality assurance health care

1. Introduction Computerized medical databases increasingly are being developed and used for medical and epidemiologic research. Such databases originate from three primary sources: insurance claims, practice data, and electronic medical records. Claims databases are compiled by third-party payors or government agencies and include demographic, diagnosis and procedure data [1]. Practice databases are compiled by provider organizations and usually include claims data elements plus data from laboratory tests, pharmacy, and/or limited information from visit records. These databases are often designed as decision-support tools for disease diagnosis and patient management [2]. Electronic medical records are much more comprehensive than practice databases and contain all of the medical information from patient encounters, with the ultimate goal of replacing traditional paper medical records. Claims data, including Medicaid and Medicare claims, have been used as sources for quality and appropriateness of care assessments, assessment of new technologies, health care utilization studies, and health care economic analyses * Corresponding author. Charles R. Woods, M.D., M.S. Wake Forest University School of Medicine, Medical Center Blvd., Winston-Salem, NC 27157. Tel.: 336-716-6568 E-mail address: [email protected]

[3–13]. Practice databases are considered generally more accurate than claims databases and have been used for clinical epidemiologic studies, postmarketing drug surveillance, health care utilization, and as sampling frames for identification of study subjects [1,14–17]. Future increased availability of electronic medical records is expected to further improve data accuracy. Most medical databases contain coded diagnosis and procedure data. The major advantage of coded diagnosis information is that it is often simpler to analyze than the clinical written text data from which it was derived. Major disadvantages include coding biases, inconsistencies in coding among different providers, and potentially high error rates in code selection [18,19]. Claims databases typically contain multiple diagnosis fields to accommodate the frequent occurrence of multiple diagnoses from a single outpatient encounter (visit) or hospital admissions [6,20]. The importance of having multiple secondary (nonprincipal) diagnoses available for clinical studies using these databases has been established [20], and a number of studies have relied upon secondary diagnoses to indicate the presence of important comorbidities [12,13,21,22] or potential confounders [23]. Neither the accuracy nor effect of including secondary diagnoses in estimates of the overall accuracy of diagnosis data in a database has been studied. This study assessed the influence of different definitions

0895-4356/01/$ – see front matter © 2001 Elsevier Science Inc. All rights reserved. PII: S0895-4356(01)00 3 3 9 - 0

C.R. Woods / Journal of Clinical Epidemiology 54 (2001) 782–788

of the accuracy of matching on estimates of accuracy of diagnosis data using a database of visits to a public pediatric clinic. Differences between the definitions involve 1) the unit of analysis, 2) the number of diagnoses required to match when multiple diagnoses per visit are present, and/or 3) whether the database contents are required to match the medical record or the medical record contents are required to be matched in the database. The third condition, which can be described as the direction of matching, may be important to consider when databases have fewer diagnoses coded per visit than are listed per visit in the medical records. In this circumstance, coded data can be highly accurate but provide only a partial view of the overall scope and prevalence of diagnoses in the medical records of the study population. The effects of inclusion of secondary diagnoses and choice of direction of matching on the results of a diagnosis validation process are described.

2. Methods 2.1. The database and selection of visits for validation As part of an epidemiologic study of otitis media in the pediatric population receiving care at the Reynolds Health Center of Forsyth County, North Carolina, a database of 32,634 pediatric and adolescent clinic visits from July, 1994 to December, 1996 was constructed from electronically scanned encounter form data. Data categories included diagnoses, screening studies, preventive services, risk factors, vaccinations, procedures, data of onset of illness, and follow-up plans. The database was designed to provide basic decision support for patient care and to generate insurance claims, primarily for Medicaid. For this analysis, it is considered more analogous to a claims database than a practice database. The database captured 96.3% of the pediatric and adolescent visits during this 30-month period. One primary and up to 10 secondary diagnoses could be coded. No records selected for this study contained more than two secondary diagnoses. The clinic was staffed by students and residents supervised by faculty members of the Department of Pediatrics of Wake Forest University School of Medicine [24]. A total of 317 visit records in the database (approximately 1% of the visits) were compared with the corresponding written medical records to validate the diagnosis data in the database. These were a combination of two subsets: 1) 90 visits randomly selected such that there were three from each of the 30 months of the study, and 2) 227 visits from the medical records of 100 randomly selected children who had at least one diagnosis of otitis media in the first 2 years of life. Medical record review forms were prepared in Microsoft Access and used to abstract data from the medical records. The diagnoses written in the medical record were entered into text fields in the order listed. The investigator was unaware of the specific diagnosis codes in the database at the time of medical record review.

783

2.2. Matching of diagnoses in the database to those in the written medical record The diagnoses in the scanned database were compared to the listed diagnoses from the medical record for each of the selected visits. The medical record was regarded as the reference standard for all analyses. When more than one diagnosis was listed in the medical record, the first listed diagnosis was defined as the principal diagnosis for the visit. For the purposes of this analysis, matching was defined as an exact text or an acceptable synonym match to the ICD-9 terminology. Examples of acceptable synonyms included “upper respiratory infection” for “viral syndrome” and “gastroenteritis” for “diarrhea.” Examples of potential but disallowed synonyms included “sinusitis” for “upper respiratory tract infection” and “otitis externa” for “otitis media.” Subtype matching of otitis media diagnoses was not required. A coded diagnosis of “well child” can mean either that the child was well at the time of the visit or that the purpose of the visit was for a “well child evaluation.” In the latter instance a “non-well” diagnosis also might be coded. When this occurred, the well child diagnosis code was present for billing purposes to signify the original purpose for the visit. For this study, if “well child” was present in the primary diagnosis field in the database, and one or more other diagnoses were coded in secondary diagnosis fields, the diagnosis of well child was ignored in the matching analysis. The first secondary diagnosis in the database then was used as the primary diagnosis. Two units of analysis were used in the estimates of accuracy of matching of diagnosis data. Two estimates were generated using individual diagnoses as the unit of analysis: 1) [total number of matched diagnoses]/[total number of diagnoses coded in the database], and 2) [total number of matched diagnoses]/[total number of diagnoses listed in the medical records]. Estimates also were made with six definitions of matching that used the visit/encounter as the unit of analysis (Fig. 1): 1. the primary diagnosis in the database matches the primary diagnosis in the medical record matched; 2. the primary diagnosis in the medical record is matched by any diagnosis in the database; 3. primary diagnosis in the database matches any diagnosis in the medical record; 4. all diagnoses listed in the medical record are matched in the database, without regard to order of listing; 5. all diagnoses in the database match in the medical record, without regard to order; and 6. at least one diagnosis in the database matches any diagnosis in the medical record. The differences among the six definitions include 1) the number of diagnoses that are required to match per visit, and/or 2) whether the database contents are required to match the medical record or the medical record contents are

784

C.R. Woods / Journal of Clinical Epidemiology 54 (2001) 782–788

3. Results 3.1. Estimates of accuracy using individual diagnoses as the unit of analysis The written medical records of the 317 visits contained 506 listed diagnoses (1.60  0.73 per visit), compared with 435 coded diagnoses (1.37  0.53 per visit) in the database (P  .001, two-sided paired sample t test). Two or more diagnoses were listed in 47% of the medical records (Fig. 2). Two or more diagnoses were coded in 35% of the database records. Overall, 391 individual diagnoses matched between the database and the corresponding medical record for the visit. Therefore, 90% of the diagnoses in the database (391/435; 95% CI: 87%, 93%) were accurately coded relative to the medical record. Alternatively, 77% of the diagnoses listed in the medical record (391/506; 95% CI: 73%, 81%) were accurately coded in the database. 3.2. Estimates of accuracy by six definitions that use the visit as the unit of analysis

Fig. 1. The different types of matching that can be applied between a database and the corresponding medical records, when coding of multiple diagnosis fields for each encounter is allowed in the database, are shown. The medical record is used as the gold standard in each type of matching. Bold arrows indicate definitions or types of matching assessed in this study: 1  Proportion of matched diagnoses relative to the total number of diagnoses coded in the database. 2  Proportion of matched diagnoses relative to the total number of diagnoses listed in the medical records. 3  The primary diagnosis in the database matches the primary diagnosis in the medical record. 4  The primary diagnosis in the medical record is matched by any diagnosis in the database. 5  The primary diagnosis in the database matches any diagnosis in the medical record. 6  At least one diagnosis in the database matches any diagnosis in the medical record. 7  All diagnoses listed in the medical record are matched in the database, without regard to order. 8  All diagnoses in the database match in the medical record, without regard to order.

required to be matched in the database (the direction of matching; Fig. 1). 2.3. Statistical analysis Accuracy was expressed as the proportion of diagnoses or visits where the diagnosis data matched between the database and medical record. Confidence intervals (CI) (95%) for proportions were determined by the Normal Theory Method for Binomial Parameters. The proportions of visits matched in the 90-visit and 227-visit subsets were compared for differences using Pearson chi-square with the Yates continuity correction. t Tests were used for comparison of means between two samples. Means are presented with standard deviations.

The estimated accuracy of the diagnosis data in the database for each visit, compared with that in the written medical record for each visit, ranged from 65% to 92% of visits matching, depending on the definition used (Table 1). The lowest estimate occurred when all diagnoses listed in the medical record had to be matched (coded appropriately) in the database. The highest estimate occurred when any one diagnosis in the database, if more than one were coded, could match any diagnosis listed in the medical record. These definitions differed both in number of diagnoses per visit that had to match and in the direction of matching. The definition that required the primary diagnosis in the database to match the primary diagnosis in the medical record gave an estimate of 84% accuracy. This definition is unaffected by direction of matching. When all diagnoses coded for a visit in the database were required to match a diagnosis listed in the medical record, the estimate of accuracy was 87% (95% CI: 83%, 91%). When all diagnoses listed for a visit in the medical record had to be matched by a diagnosis coded in the database, the estimate was 65% (95% CI: 60%, 70%). These definitions differed only in the direction of matching (the medical record was the reference standard in each). The difference in the estimates likely reflects the greater number of diagnoses listed per visit in the medical record than coded per visit in the database. When more diagnoses were listed for a visit than coded in the database, the visit could not be counted as accurate when the direction of matching was from medical record to database. 3.3. Differences in estimates of accuracy between the two subsets of visits The 317 visits reviewed consisted of two subsets of 90 and 227 visits. The larger represented visits by children who had at least one visit for otitis media in the database. The majority of these visits, but not all, contained a diagnosis of

C.R. Woods / Journal of Clinical Epidemiology 54 (2001) 782–788

785

Fig. 2. Distribution of the number of diagnoses per encounter listed in the medical record and coded in the database.

otitis media. The smaller consisted of visits selected from across the 30-month period covered by the database. The 227-visit otitis media subset had a larger proportion of visits with more than one diagnosis per visit than the 90-visit subset, both in the medical record and in the database (Table 1). The number of diagnoses listed per visit in the medical record in the 227-visit otitis media subset (1.70  0.75) was greater than the number per visit listed in the 90-visit subset (1.34  0.58; p  .002). The number of diagnoses coded in the database in the otitis media subset (1.42  0.55) was greater than the number per visit coded in the 90-visit subset (1.26  0.46, p  .001). The ratio of diagnoses listed per visit in the medical records to coded per visit in the database was greater in the otitis media subset (1.27  0.57) than in the 90-visit subset (1.10  0.41; p  .013). Children with otitis media frequently have concurrent viral infections, which readily affords the opportunity for providers to list and code more than one diagnosis for such visits. Five of the six definitions that used the visit as the unit of analysis gave estimates of accuracy that were higher for the otitis media subset than the 90-visit subset (Table 1). The differences ranged from 7% to 12% and were statistically significant for three definitions that required 1) the primary diagnosis in the database to match any diagnosis listed in the medical record, 2) at least one diagnosis in the database to match any diagnosis listed in the medical record, and 3) all diagnoses coded in the database to match a diagnosis listed in the medical record. The direction of matching was from the database to the medical record in all three. In each case the subset with the larger ratio of diagnoses listed per visit in the medical records to coded per visit in the database had the higher estimate of accuracy. 4. Discussion Estimate of the accuracy of the diagnostic data in a pediatric public health clinical practice-derived database ranged

from 65% to 92%, depending on the definition used to define the accuracy of matching between diagnoses in the database and the corresponding medical records. The database allowed for coding of multiple diagnoses per visit, and the definitions that were applied varied primarily in three ways: 1) the unit of analysis, where two definitions used individual diagnoses and others used the visit/encounter, 2) the number of diagnoses per visit required to be matched, and/ or 3) the direction of matching—whether the database contents had to match the medical record or the medical record listings had to be matched in the database (Fig. 1). The medical record was the reference standard regardless of direction of matching. When individual diagnoses were used as the unit of analysis, the estimate of accuracy was 90% when describing accuracy as the proportion of all diagnoses coded in the database that were matched inthe medical records. The estimate fell to 77% when describing accuracy as the proportion of all diagnoses listed in the medical records that had a matching code in the database. A similar effect of change in direction of matching was seen when the visit/encounter was used as the unit of analysis. When all diagnoses coded in the database had to match the medical record, the estimate of accuracy was 87%. When all diagnoses listed in the medical record had to be matched in the database, the estimate of accuracy was 65%. These effects of change in direction of matching were due to the greater number of diagnoses listed per visit in the medical records than coded per visit in the database. When comparing the two subsets of visits, three definitions gave higher estimates of accuracy of diagnosis data for the otitis media subset than for the 90-visit subset. The direction of matching was from the database to the medical record in all three. The otitis media subset had a higher ratio of diagnoses listed per visit in the medical records to coded per visit in the database. These differences in estimates of accuracy between the two subsets of visits can be attributed,

786

C.R. Woods / Journal of Clinical Epidemiology 54 (2001) 782–788

Table 1 Matching of diagnoses between the medical record and the database according to different definitions of matching, where the visit/encounter is the unit of analysis

Number (N317)

Percent (95% CIb)

90 Visits randomly selected from the entire dataset Nmatches (%)

267

84 (80, 88)

71 (79)

196 (86)

.14

279

88 (84, 92)

74 (82)

205 (90)

.07

289

91 (88, 94)

75 (83)

214 (94)

.004

206

65 (60, 70)

62 (69)

144 (63)

.43

275

87 (83, 91)

70 (78)

205 (90)

.005

293

92 (89, 95)

76 (84)

217 (96)

.002

149

47

26 (29)

123 (54)

.001

111

35

22 (24)

89 (39)

.019

Visits that met definitiona Matching definition Primary diagnosis in the medical record matches primary diagnosis in the database Primary diagnosis in the medical record matches any diagnosis in the database Primary diagnosis in the database matches any diagnosis in the medical record All diagnoses in the medical record are matched in the database, without regard to order All diagnoses in the database are matched in the medical record, without regard to order At least one diagnosis in the database is matched in the medical record Number of visits with more than one diagnosis listed in the medical record Number of visits with more than one diagnosis coded in the database

227 visits of randomly selected patients with otitis media Nmatches (%)

P value of subgroup comparisonsc

a

Visits were selected by predetermined rules from a randomly selected set of medical records of children with one or more diagnoses of otitis media in the first 2 years of life. b 95% confidence intervals for proportions were determined by the Normal Theory Method for Binomial Parameters. c Comparisons were performed by Pearson chi-square tests with Yates continuity correction.

at least in part, to the greater number of diagnoses per visit that were available for matching in one subset relative to the other. The data flow from which this database was derived was set up in part to generate Medicaid claims for reimbursement for clinic visits. Therefore, it is similar to a claims database. From the earliest days of their existence in the 1970s. Medicaid and other health care claims databases have been seen as potentially valuable sources of information regarding both health care utilization and epidemiology and have become widely used for these purposes [19,25]. A number of studies have illustrated the importance of validating diagnostic data in claims databases against their source medical records when studying specific disease processes [26,30]. In an analysis of the accuracy of ambulatory claims for Maryland Medicaid recipients in 1988, the diagnosis coded on the claim was matched by the 3-digit ICD-9 code terminology in the medical record in 81.8% of cases [31]. For the diagnoses of otitis media and well child, the percentages of claims that matched the medical record were 81.6% and 85.4%, respectively. The definition of matching in the Maryland study was analogous to any one diagnosis in the database matching any diagnosis in the medical record (estimated accuracy of 92% in this study). In a sample of 2579 hospital discharge claims from California Medicare recipients in 1988, the primary diagnosis in the database was present in the medical record in 91% of cases [26]. The definition of matching in the California study was analogous to the primary diagnosis in the database matching any diagnosis in the medical record (estimated accuracy of 91% in this study).

It appears reasonable to conclude that, for the public pediatrics clinic database used in this study, the diagnosis data, in aggregate, provided an 80–90% accurate overall picture of the diagnoses listed in the medical records. This range falls within the 95% CIs of all but one of the definitions applied to the database. It remains unclear which, if any, single estimate would be most appropriate to use. This could vary depending on the intended use of the data. A specific definition may function differently when the number of diagnoses per visit differs between the database and medical record. This occurs when one or more diagnoses listed in the medical record of a visit are not coded in the database. In addition to the eight definitions used in this analysis, other variants that use the visit/encounter as the unit of analysis also could be constructed (Fig. 1). This study addresses only the issue of accuracy of coded diagnosis data in a database relative to the diagnoses listed in the corresponding medical records. Another equally important issue is whether the diagnoses listed in the medical record truly reflect the condition of the patient at the time of the visit. Error or bias can enter the flow of data at many points. Patients may give inaccurate histories or providers may fail to elucidate essential information. Physical findings may be missed or misinterpreted. Correct clinical impressions may be inadequately or imprecisely recorded in the medical record. These possibilities could not be assessed in this study and likely are beyond the scope of validation efforts for most studies that use claims data. The advent of electronic medical records may help to address such errors by supplying investigators with additional evidence that is easily accessible for validation of recorded diagnoses.

C.R. Woods / Journal of Clinical Epidemiology 54 (2001) 782–788

Other studies that have used claims-based databases have used definitions that required assignment of ICD-9 codes to the written diagnoses in the medical records and then defined agreement between the database and medical record when there was matching to the third digit of the code. This was considered overly stringent for this pediatric database, as it would have disallowed the equivalence of diagnosis pairs such as acute otitis media–otitis media with effusion and gastroenteritis–diarrhea, each of which was deemed appropriate for the epidemiologic analyses for which the database was constructed. The synonym matching used in this study may have resulted in a few instances of misclassification, which could have affected the estimates of data accuracy. However, the focus of this study was on the impact that different definitions of matching might have on estimates of data accuracy, rather than a specific method of determining whether two listings or codings represent the same diagnosis. The range of the estimates of accuracy of diagnosis matching yielded by the definitions used in this study may be less marked when these are applied to other claims or practice databases. However, the concept that definitions that treat the existence of secondary diagnoses differently will give different estimates of accuracy likely is generalizable to many current claims databases. The identification of meaningful differences among the estimates of diagnosis accuracy generated by this set of definitions of matching in other databases likely would depend on the degree to which fewer diagnoses were coded in the database than listed in the medical record. Future databases derived from electronic medical records should be less likely to have this disparity. It has been recommended that, when claims data are used for health care research, a random sample of records be compared with their corresponding medical records to estimate the accuracy of the data [28]. Further study of appropriate definitions of the matching of diagnoses between database and medical records and their corresponding databases needs to be done. Overly stringent requirements may lead to rejection of conclusions drawn from analyses of imperfect but reasonably accurate database (type II error). Overly lax definitions may lead to acceptance of invalid conclusions from an insufficiently accurate database (type I error). The specific study question also may determine the type of matching definition to be used. In circumstances where the research question involves a single disease process, definitions that take into account only diagnoses relevant to that disease and disregard issues of multiple different diagnoses for a visit are reasonable. When a picture of the overall accuracy of a database is desired, the use of definitions that consider the issues of multiple diagnoses per visit is appropriate. Ascertainment of the proportions of all diagnoses in the database that are matched in the medical record, and in the medical record that are matched in the database, is a reasonable starting point. The use of two or more measures that use the visit/encounter as the unit of analysis—such as 1) matching of the primary diagnosis in the medical record

787

to any diagnosis in the database, and 2) matching of all diagnoses in the database to those in the medical record—may further enhance the portrayal of the accuracy of diagnosis data in a database.

Acknowledgments The author thanks Drs. Michael O’Shea and Robert DuRant for helpful review of the manuscript and Drs. Lynne Wagenknecht, Michael Miller, and Roger Anderson for guidance and review of the Master’s thesis project from which parts of this study were derived.

References [1] Tierney WM, McDonald CJ. Practice databases and their uses in clinical research. Stat Med 1991;10:541–57. [2] Johnson KB, Feldman MJ. Medical informatics and pediatrics. Decision-support systems. Arch Pediatr Adolesc Med 1995;149:1371–80. [3] Quam L, Ellis LBM, Venus P, Clouse J, Taylor CG, Leatherman S. Using claims data for epidemiologic research. Med Care 1993;131:498–507. [4] Baine WB, Yu W, Summe JP, Weis KA. Epidemiologic trends in the evaluation and treatment of lower urinary tract symptoms in elderly male Medicare patients from 1991 to 1995. J Urol 1998;160:816–20. [5] Berman S, Byrns PJ, Bondy J, Smith PJ, Lezotte D. Otitis media-related antibiotic prescribing patterns, outcomes, and expenditures in a pediatric medicaid population. Pediatrics 1997;100:585–92. [6] Katz JN, Barrett J, Liang MH, Kaplan H, Roberts WN, Baron JA. Utilization of rheumatology physician services by the elderly. Am J Med 1998;105:312–8. [7] Greenland S, Finkle WD. A case-control study of prosthetic implants and selected chronic diseases in Medicare claims data. Ann Epidemiol 1998;8:319–26. [8] Samsa GP, Bian J, Lipscomb J, Matchar DB. Epidemiology of recurrent cerebral infarction: a medicare claims-based comparison of first and recurrent strokes on 2-year survival and cost. Stroke 1999;30:338–49. [9] Newcomer R, Clay T, Luxenberg JS, Miller RH. Misclassification and selection bias when identifying Alzheimer’s disease solely from Medicare claims records. J Am Geriat Soc 1999;47:215–9. [10] Cooper GS, Yuan Z, Stange KC, Dennis LK, Amini SB, Rimm AA. The sensitivity of Medicare claims data for case ascertainment of six common cancers. Med Care 1999;37:436–44. [11] Warren JL, Feuer E, Potosky AL, Riley GF, Lynch CF. Use of Medicare hospital and physician data to assess breast cancer incidence. Med Care 1999;37:445–56. [12] Fortgang IS, Moore RD. Hospital admissions of HIV-infected patients from 1988 to 1992 in Maryland. JAIDS 1995;8:365–72. [13] Mainous AG, Hueston WJ. The cost of antibiotics in treating upper respiratory tract infections in a medicaid population. Arch Fam Med 1998;7:45–9. [14] Fries JF, McChane DJ. ARAMIS: a prototypical national chronic-disease databank in medical information. West J Med 1986;145:798–804. [15] Kong DF, Lee KL, Harrell FE, Boswick JM, Mark DB, Hlatky MA, Califf RM, Pryor DB. Clinical experience and predicting survival in coronary disease. Arch Int Med 1989;149:1177–81. [16] Fries JF, Bloch DA, Segal MR, Spitz PW, Williams C, Lane N. Postmarketing surveillance in rheumatology: analysis of purpura and upper abdominal pain. J Rheumatol 1988;15:348–55. [17] Rogerson CI, Stimson DH, Simborg DW, Charles G. Classification of ambulatory care using patient-based, time-oriented indexes. Med Care 1985;23:780–88. [18] Safran C. Using routinely collected data for clinical research. Stat Med 1991;10:559–64.

788

C.R. Woods / Journal of Clinical Epidemiology 54 (2001) 782–788

[19] Berkanovic E. An appraisal of Medicaid records as a data source. Med Care 1974;12:590–5. [20] Iezzoni LI, Foley SM, Daley J, Hughes J, Fisher ES, Heeren T. Comorbidities, complications and coding bias. Does the number of diagnosis codes matter in predicting in-hospital mortality? JAMA 1992;267: 2197–203. [21] Zhang JX, Iwashyna TJ, Christakis NA. The performance of different lookback periods and sources of information for Charlson comorbidity adjustment in Medicare claims. Med Care 1999;37:1128–39. [22] White SR, Hand R, Klemka-Walden L, Inczauskis D. Secondary diagnoses as predictive factors for survival or mortality in Medicare patients with acute pneumonia. Am J Med Qual 1996;11:186–92. [23] Nyman JA, Krahn AD, Bland PC, Griffiths S, Manda V. The costs of recurrent syncope of unknown origin in elderly patients. Pacing Clin Electrophysiol 1999;22:1386–94. [24] Woods CR. Application of a public health clinic database to the study of the epidemiologic characteristics of, health care utilization by, and the occurrence of otitis media among poor children in a small urban area. Wake Forest University, Winston-Salem, NC, August 6, 1999.

[25] Roghmann KJ. Use of Medicaid payment files for medical care research. Med Care 1974;12:131–7. [26] Green J, Wintfeld N. How accurate are hospital discharge data for evaluating effectiveness of care. Med Care 1993;31:719–31. [27] Strom BL, Carson JL, Halpern AC, Schinnar R, Snyder ES, Stolley PD. Using a claims database to investigate drug-induced Stevens-Johnson syndrome. Stat Med 1991;10:565–76. [28] Grisso JA, Carson JL, Feldman HI, Cosmatos I, Shaw M, Strom B. Epidemiological pitfalls using Medicaid data in reproductive health research. J Mater Fetal Med 1997;6:230–6. [29] Fisher ES, Whaley FS, Kushat WM, Malenka DJ, Fleming C, Baron JA, Hsia DC. The accuracy of Medicare’s hospital claims data: progress has been made, but problems remain. Am J Public Health 1992;82:243–8. [30] Sorensen HT, Sabroe S, Olsen J. A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol 1996;25: 435–42. [31] Steinwachs DM, Stuart ME, Scholle S, Starfield B, Fox MH, Weiner JP. A comparison of ambulatory Medicaid claims to medical records: a reliability assessment. Am J Med Qual 1998;13:63–9.