Journal of Clinical Epidemiology 58 (2005) 323–337
REVIEW ARTICLE
A review of uses of health care utilization databases for epidemiologic research on therapeutics Sebastian Schneeweiss*, Jerry Avorn Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, 1620 Tremont Street (suite 3030), Boston, MA 02120, USA Accepted 16 October 2004
Abstract Objective: Large health care utilization databases are frequently used in variety of settings to study the use and outcomes of therapeutics. Their size allows the study of infrequent events, their representativeness of routine clinical care makes it possible to study real-world effectiveness and utilization patterns, and their availability at relatively low cost without long delays makes them accessible to many researchers. However, concerns about database studies include data validity, lack of detailed clinical information, and a limited ability to control confounding. Study Design and Setting: We consider the strengths, limitations, and appropriate applications of health care utilization databases in epidemiology and health services research, with particular reference to the study of medications. Conclusion: Progress has been made on many methodologic issues related to the use of health care utilization databases in recent years, but important areas persist and merit scrutiny. 쑖 2005 Elsevier Inc. All rights reserved. Keywords: Utilization databases; Claims data; Therapeuties; Pharmaco-epidemiology; Confounding (epidemiology); Adverse drug reactions; Drug utilization
1. Introduction It is widely accepted that randomized clinical trials (RCT) cannot provide all necessary information about the safe and effective use of medicines at the time they are marketed. This stems from the inherent limitations of RCTs during drug development: They usually have a small sample size that often under-represents vulnerable patient groups, and they focus on short-term efficacy and safety in a controlled environment that is often far from routine clinical practice. Moreover, the RCT outcome sufficient to win marketing approval—short-term improvement in a surrogate marker compared with the effect of placebo—often fails to answer the more relevant questions that face doctors and patients. Such limitations make it inevitable that epidemiologic research is performed post marketing to define these issues [1]. Although the focus of pharmacoepidemiology is on post-marketing surveillance of drugs, biologics [2], and medical devices [3], the approach has valuable applications in the pre-marketing phase to assess the safety profile of drugs
* Corresponding author. Tel.: (617) 278-0937; fax: (617) 232-8602. E-mail address:
[email protected] (S. Schneeweiss). 0895-4356/05/$ – see front matter 쑖 2005 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2004.10.012
and put them into context of the natural history of the condition they are designed to treat [4]. Although pharmacoepidemiology makes use of all epidemiologic study designs and data sources, in recent years there has been enormous growth in the use of large health care databases [5]. These are made up of the automated electronic recording of filled prescriptions, professional services, and hospitalizations; such data are increasingly collected routinely for the payment and administration of health services. Beyond this, electronic medical records often contain detailed clinical information, patients’ reports of symptoms, the findings of physical examinations, and the results of diagnostic tests. However, researchers more frequently use insurance data on submitted claims for specific services, procedures, and pharmaceuticals. These are usually less detailed in their clinical contents but often representative and complete for very large patient populations, including elderly patients, children, the very poor, and those in nursing homes who are most often under-represented in or totally excluded from clinical trials. Clinical epidemiologists can answer a wide spectrum of research questions with database studies, but they must be aware of the specific issues that can compromise their validity and of recent methodologic advances to address these shortcomings. This article outlines the breadth of research
324
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
applications using databases, the issues that may compromise the validity of such studies, and approaches to managing such analytic challenges. The goal is to provide researchers with a methodological framework and to comment on the value of new techniques for more advanced users.
2. Research applications with database studies Research applications of databases vary broadly. Most take advantage of the strengths of these datasets: (1) Their large size allows the study of rare events, (2) their representativeness of routine clinical care makes it possible to study real-world effectiveness and utilization patterns, and (3) their availability at relatively low cost and without long delays makes them accessible and efficient.
2.1. Drug utilization studies Basic information on the prevalence, incidence, and duration of drug therapy is essential for health system planning and for assessing the quality of prescribing. Such measures can be derived directly from pharmacy dispensing databases more accurately than from patient recall, gross wholesale figures, or records of physician prescribing. A graphical presentation of newly recorded users of a drug used for chronic treatment in a fixed time window (e.g., 1 month) over a longer time period produces a waiting time distribution [6]. For a chronically used drug (e.g., insulin, antidepressants, etc.), most current users refill their prescription early (e.g., in the first or second 1-month interval), whereas with each passing month new users come to dominate the waiting time graph (Fig. 1). From that distribution, drug utilization measures, including the number of current (prevalent) users, new (incident) users, duration of use, and seasonality, can be directly derived [6] and compared across drugs and drug
Monthly number of newly recorded users 5000 4000
Drug A users
Prevalent users 3000 2000 1000 Incident users
0 1 2
3 4
5 6
7 8
9 10 11 12 1 2
3 4
Year 1
5 6 7 8 9 10 11 12
Year 2
Monthly number of newly recorded users 5000 4000
Drug B users 3000 Prevalent users 2000 1000 Incident users 0
1 2
3 4
5 6
7 8
Year 1
9 10 11 12 1 2 3 4
5 6 7
8 9 10 11 12
Year 2
Fig. 1. Hypothetical example of waiting time distributions [6] for two antidepressant drugs, based on pharmacy dispensing data to identify prevalent (current) and incident (new) drug users. Although the older Drug A was more prevalent at the beginning of the observation period, fewer new users started on Drug A compared with the more recently marketed Drug B, which had fewer current users at the beginning but was more frequently prescribed among new users. Modified from Hallas J, Gaist D, Bjerrum L. The waiting time distribution as a graphical approach to epidemiologic measures of drug utilization. Epidemiology 1997;8:666–70.
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
classes, producing a comprehensive picture of a population’s drug use and its dynamics. Such investigations may lead to studies of the adherence to chronic therapy in routine care. For example, research using Medicare and Medicaid database studies showed that adherence to statin therapy among poor elderly patients was below 60% (proportion of days covered) after 6 months and continued to decline afterward [7].
325
⬎100,000 [16]. The longitudinal nature of databases and their availability with short lag times makes them well suited to monitor the success of risk management programs after a drug risk is identified [17]. In population-based databases, it is possible to estimate the proportion of the population exposed and thus the population-attributable risk as a more policy-relevant measure of risk [18]. 2.4. Beneficial drug effects
2.2. Studies of physician prescribing Drug utilization research using databases can be used for evaluating the appropriateness of drug therapy. Like any evaluation of the quality of care, its usefulness depends on the validity of indicators for appropriate prescribing (e.g., beta-blockers for prevention of mortality after an acute MI) [8]. The Assessing Care of Vulnerable Elders (ACOVE) project is an example of an evidence-based approach, based on advances in the ability to assess and improve drug use [9]. Utilization databases are well suited to help researchers understand the properties and predictors of physicians’ prescribing decisions. Doctors’ characteristics can be linked to data on the prescriptions their patients fill. This makes it possible to identify provider subgroups that are more likely to prescribe suboptimally [10]; they can then be targeted for educational interventions. For example, database studies in large populations have found that after adjusting for patient characteristics, physicians who have completed their training more recently and specialists are more thorough in prescribing preventive drugs [11] and are more likely to be early adopters of new drugs [12]. Multi-level regression models can be used to adjust standard errors for clustering of patients within physicians, producing provider-specific estimates of the quality of prescribing [13] and the analytic basis of provider profiling [14]. However, large numbers of patients and physicians are required for stable estimation, which makes large and representative databases an ideal resource available at low cost. 2.3. Adverse drug effects and risk management Administrative databases have become a useful data source for researchers and regulatory agencies [15] to study the safety of drugs because they include large numbers of patients often for long periods of time, which is a useful attribute for the study of rare events. Retrospective studies that rely on patient interview information may be subject to recall bias if cases can remember their drug exposure more accurately than controls. In contrast, pharmacy claims databases record the date of dispensing accurately and cannot be biased by knowledge of the study outcome, although some exposure misclassification may persist (see below). With large databases, cohort studies on the incidence of rare events are equally readily done using such data. They have the advantage of producing absolute incidence rates and can assess several disease outcomes of one drug exposure within the same study that may require sample sizes
Observational post-marketing studies are usually required to study the effectiveness of drugs in populations often excluded from pre-marketing RCTs (e.g., frail elders, children, or pregnant women). Unanticipated beneficial effects may be first documented in such studies [19]. Observational studies of drug effectiveness can be problematic because of the difficulty in adjusting confounding by indication if a beneficial effect is anticipated and prescribing is rational (i.e., sicker patients are more likely to receive therapy) [20]. A recent example of discrepant results from observational and randomized studies on the benefits of drugs is the cardiovascular effectiveness of hormone replacement therapy [21], in which observational studies repeatedly found a benefit which was not documented in later RCTs. A recent example of an unanticipated beneficial drug effect found by a database study [22] and a RCT is the expanded indication for clozapine for the prevention of suicide in schizophrenic patients. A useful and underutilized application of databases in studying beneficial effects is linking utilization databases or diseasespecific registries to participants in RCTs for active monitoring during the randomized trial phase and for long-term follow-up [23]. 2.5. Health policy research Utilization databases are useful for evaluating the clinical and economic effects of drug reimbursement policy changes because they measure actual utilization and economic outcomes accurately, are broadly representative, and are large enough to detect small changes in major clinical outcomes (e.g., diagnosis-specific ER admissions). Interrupted timetrend analyses implemented in longitudinal databases can provide implicit adjustment for most patient and provider characteristics [24]. Such database studies were able to quantify the expected drug utilization changes after drug reimbursement restrictions [25] and detected increases in nursing home admissions [26] and temporal increases in physician visits following such restrictions [27].
3. General characteristics of databases 3.1. From patients to records and claims databases One advantage of health care utilization databases (i.e., their representativeness of routine clinical practice in large populations) is also a disadvantage (i.e., the reliance on
326
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
previously collected data generated primarily for administrative purposes). In other epidemiologic studies that use primary data collection, the timing of data collection and the detail and accuracy of data are to a large extent under the control of the investigator. By contrast, in administrative databases a record is generated only if there is an encounter with the health care system that is accompanied by a diagnosis (old or new) and one or several procedures or the prescribing of medicines. A third hurdle to generating an electronic record is that the encounter must be filed and coded accurately in a computer system. To generate a complete insurance file the claim then must be adjudicated by a thirdparty payer. Each of these steps can lead to bias if such data are used uncritically for epidemiologic studies (Fig. 2). In health care systems with nonuniversal coverage, the fact that an encounter has happened at all may bias the source population of
all patients with at least one record in the database toward those with insurance coverage who may be healthier or more affluent and employed. This may lead to limited generalizability but does not affect the internal validity of a study. Although some errors are less likely to occur (e.g., miscoding of drugs and doses at pharmacies), others have been shown to be prevalent in many databases (e.g., under-reporting of secondary diagnoses). Changes in hardware, software, or coding practice may change the completeness and interpretation of specific data items over time. Mergers of health plans may lead to doubling or sharing of patient ID numbers or incomplete data linkage. Descriptive analyses of the population composition over time can help one determine the integrity of linked administrative databases [28]. The key difference between claims databases and complete electronic medical records is that the latter have more detailed clinical information often in form of text information
Record Generation Process
Potential Sources of Bias
Patients has symptoms, acute illness etc.
Indigent patients without coverage and patients with insufficient insurance are less likely to seek professional care.
Encounter with health professional
Examination, history, diagnostics Incomplete documentation of clinical status; Misdiagnosis; False ranking of ‘primary diagnosis’.
Diagnosis
Pharmacy encounter
Interventions including drug prescribing
EMR*
Paper-based records
Miscoding of drug, strength, dose; Non-recording of free samples and over-the-counter drugs.
Incomplete record keeping.
Coding of claims
Miscoding of primary and secondary diagnoses; Miscoding of procedures; Failure to file claims.
Filing of complete claims
Filing and adjudication of final claims
Filing and adjudication of final claims
Administrative database
Transaction error; Lag-time until adjudication and final filing; Loss to follow-up if patient has left the system.
Administrative database
Incomplete / false record linkage
Research database
Research database
* Electronic Medical Record
Fig. 2. The generation of health care utilization databases and potential sources of errors/bias.
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
(e.g., patient and family history) but can also contain structured information, including anthropometrics or laboratory results. Each database has its own specific “grammar” that determines the way data are generated; this is often undocumented or not updated. Researchers who perform epidemiologic studies using databases must understand how the data were generated from the encounter all the way through to the completed database entry. This can be achieved by having a close working relationship with knowledgeable people involved in the various steps during the data generation process. Only then can important mistakes and the resulting biases be avoided and the full potential of databases realized. 3.2. Types of databases Health care utilization databases throughout the world differ substantially with regard to their population representativeness and patient turnover rates, the breadth and detail of information they contain, their data quality and completeness, and their linkability with data from other sources (e.g., vital statistics or cancer registries). Checklists exist to assess the usefulness of databases for epidemiologic research [29]. The minimum requirement for drug utilization research is the availability of a pharmacy dispensing database, which allows descriptive analyses of drug use. Hallas [30] demonstrated the use of longitudinal patient-specific drug dispensing information in Denmark for drug safety evaluations in a limited number of scenarios in which an adverse event is typically followed by the prescription of another drug. Diagnostic and procedure information is usually required to define study outcomes and covariates that may confound an association because they are associated with the prescribing decision but are also independent predictors of the study outcome (Table 1) [31]. Most databases have such information from hospital discharge files or ambulatory services or both but differ in their detail. British Columbia’s hospital discharge database has 16 diagnosis fields, whereas Medicare and Medicaid databases have 10
327
fields [32], and others only five fields. Most databases distinguish between the principal diagnosis and secondary diagnoses. The British Columbia hospital discharge database has an additional code for complications which has less accuracy. HMO databases often have fairly detailed information from ambulatory care records (e.g., Fallon Health Plan, Harvard Pilgrim Health Care, and Kaiser Permanente) [33] that often include laboratory results. Although electronic medical records often contain patient risk factor information like smoking and obesity, such information can be recorded incompletely (e.g., 78% sensitivity for current smoking [60% former smoking] in the General Practice Research Database) [34] or missing for large proportions of the population (e.g., ⬎20% missing smoking status, ⬎40% missing BMI in another GPRD study) [35]. Some jurisdictions have extensive administrative databases that are fully linkable (e.g., Denmark [36]), whereas others have regulations that make it difficult to link component databases to a meaningful research database (e.g., Germany [37]). 3.3. Protecting the privacy of individuals All jurisdictions or database owners require careful attention to data privacy. It is common practice to keep the linking process with the original patient identifiers physically separated from the data analysis files that instead use study-specific coded ID numbers [38]. Camouflaged sampling techniques can be used to contact individual patients specifically identified in a database based on their drug use pattern or diagnoses, without revealing the targeted patients to the personnel who de-identify patient IDs and make the actual contact [39]. Nevertheless, concerns persist that unlikely cases of very small cell sizes in highly cross-classified data may allow the identification of individuals [38]. While investigators are exploring a growing list of useful applications of such databases for purposes ranging from quality improvement to adverse effects research, new privacy regulations threaten to make access to such valuable data more vulnerable. In the United States, the implementation
Table 1 Basic and advanced data elements for pharmacoepidemiologic research using health care databasesa Advanced data elements, often found in electronic medical records or other data sources
Database
Basic data elements
Exposure
Pharmacy dispensing records, drug use registries (e.g., anti-TNFα)
Outcome
Hospitalization, ambulatory visits, vital statistics
Drug name, strength, dose, quantity, date of dispensing, strength, dose, quantity, date Diagnoses (primary and secondary), procedures, reason for visit, date of service, date/cause of death Diagnosis with date of disease onset
Staging and detailed clinical information
Age, sex, comorbidities, provider specialty
Prior conditions, smoking, BMI, family history, lab test results
Confounders/effect modifiers
Cancer registries, MI registries, birth registries Hospitalization, ambulatory visits, provider files
Indication for drug use, directions for use, intended days supply, detailed clinical information at start of therapy Patient reported symptoms, alterations in laboratory test results
Abbreviations: BMI, body mass index; MI, myocardial infarction. Modified from Stergachis AS. Record linkage studies for postmarketing drug surveillance: data quality and validity considerations. Drug Intell Clin Pharm 1988;22:157–61. a This list is not exhaustive.
328
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
in 2003 of the Health Insurance Portability and Accountability Act (HIPAA) has constrained the availability of health care data for uses other than the direct care of patients. Other regulations in Europe and Canada have imposed similar problems for researchers. This is unfortunate because modern methods of data anonymization, coupled with oversight by institutional review boards, can make such research possible while adequately protecting the privacy of patients. Many have argued that the societal benefits of such database research in pharmacoepidemiology, ranging from quicker ascertainment of drug risks to a more accurate depiction of the quality of use and cost-effectiveness of comparable medications, are considerable, whereas the risks of properly authorized use of such data are negligible. Many investigators working in this area hope that this initial period of legalistic hypervigilance will be followed soon by a more reasoned regulatory approach to accessing such data for research purposes.
depending on the main interest, economically relevant, assuming the absence of systematic error. Confidence limits provide more information than simple P values and are more useful in describing the precision of statistical estimation without suggesting any notion of meaningful difference. It is then up to the investigator and the reader to impose their own perspective to determine the meaningfulness of differences that can be estimated with such high precision in large databases.
4. Validity of clinical information As in any epidemiologic study one must consider a cascade of potential biases that may come into play between an underlying causal relation and the reported findings of a database study [41]. The following issues are more likely to be present in database studies, although they are not unique to them.
3.4. Drop-out and completeness of response 4.1. Misclassification of exposure and outcome Some problems that are common in studies relying on primary data collection may be minimal in claims data studies. In many population-based claims databases, patient dropout is rarely a problem of appreciable magnitude if databases describe a program of universal entitlement, such as Medicare for patients 65 years and older in the United States or province-wide databases in Canada. By contrast, however, selective dropout may be a major problem in some United States HMO databases if sicker patients lose their jobs and drop out of employer-sponsored health plans or if the beneficiary turnover time is short. Overall the turnover rate (known as “churn”) in most HMOs in the United States averages 20% to 30% each year [40]. Patient nonresponse or recall bias are non-existent in this form in claims data because all data recording is independent of a patient’s memory or agreement to participate in a research study. However, the administrative system may fail to record complete information randomly or systematically, particularly diagnostic information, which may lead to misclassification bias. 3.5. Precision and statistical inference Comparing the findings of small observational studies with those from very large databases illustrates the difference between the magnitude of risk estimates and their precision. Comparing simple proportions between two groups often entails inspection of confidence limits or P values to evaluate any differences between groups. In large database studies, however, these proportions are likely to have very narrow and non-overlapping confidence intervals (or “significant” P values) because of the enormous number of observations. This makes it clear that random error is only the first step in assessing the data; a second step is to judge whether the magnitude of the observed differences is clinically or,
Drug use information in databases is not susceptible to problems of poor patient recall or the failure to fill a prescribed medication, and for these reasons this is one of the most accurate ways of determining drug exposure in very large populations. However, misclassification of drug exposure can occur and lead to bias, with the magnitude and direction of bias depending on the mechanism of the misclassification. The following section illustrates typical misclassification mechanisms relevant for pharmacoepidemiologic database studies. 4.1.1. Drug exposure misclassification and its consequences Electronic pharmacy dispensing records are considered accurate because pharmacists fill prescriptions with little room for interpretation and are reimbursed by insurers on the basis of detailed, complete, and accurate claims submitted electronically [42–44]. Accuracy is improved in many systems by software that assigns the corresponding drug codes after automatically comparing the spelling and strength with electronic lists of marketed drugs. Therefore, pharmacy dispensing information is usually seen as the gold standard of drug exposure information compared with selfreported information [45] or prescribing records in outpatient medical records [46]. In most databases, hospital discharge files do not contain drug use information. Therefore, every hospital stay represents a period with missing drug exposure information by design, which may become relevant for prolonged stays. Drug claims often contain a field for “days supply,” and some insurance plans have rigorous 30-day refill requirements that can support the days supply information. Alternatively, the interval between previously filled prescriptions can be used to estimate intervals of exposure. Combining
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
329
Prescription by MD Free sample
True drug exposure pattern of a sample patient
Fill of a 30d supply
40d
15d
1) Drug exposure according to pharmacy claims alone
30d
Refill of a 30d Patient dissupply continuous use
20d
30d
2) Drug exposure according to claims data plus “15 day rule” = time patient is truly exposed = time patient is classified as exposed Fig. 3. Typical causes for drug exposure misclassification in longitudinal claims database studies.
the strength and quantity of the drug dispensed plus some metric of average daily dose (e.g., defined daily dose [47]) is another way to calculate the number of days covered. This can give rise to two typical misclassification problems in drug exposure assessment (Fig. 3). If the calculated or pharmacist-recorded days supply is too short or if patients decide to stretch a prescription by using a lower dose (e.g., tablet splitting), some person-time will be classified as unexposed when it truly is exposed. Most chronically administered drugs are used for longer periods, resulting in multiple refills. A patient can thus be classified as being unexposed intermittently despite continuous exposure. Many investigators therefore extend the calculated days supply by some fraction (e.g., 50% or 15 days in Fig. 3 [strategy 2]) to avoid this misclassification. However, this strategy aggravates another misclassification that can occur if a patient discontinues drug use without finishing the supply. The right balance between improved sensitivity of drug exposure assessment and specificity depends on how well the days supply is calculated; this depends on the type of drug and how regularly it is taken. Drugs taken sporadically are particularly difficult to model accurately. McMahon [48] showed that for NSAID exposure it may be best to use the calculated exposure duration and proposed a re-sampling method to identify the strongest association as a function of varying length of days supply. The strongest association is chosen based on the theory that random exposure misclassification biases effect estimates toward the null. It would violate epidemiologic principle to assess the time to the next refill and assume that the days between were continuously exposed because future drug dispensing is likely to be associated with disease outcome. Using each patient’s average time between two dispensings in the past to calculate a patient-specific expected refill date might be a more acceptable solution [49]. These limitations are unavoidable and inherent to most pharmacoepidemiologic studies. Their effect on validity can be explored by sensitivity analyses.
Free samples handed out by physicians are not recorded in pharmacy dispensing databases and are another hypothetical source of bias by misclassification. Because samples are usually supplied for a brief period of time, rarely longer than 14 days, such misclassification produces bias only in extreme cases of massive free sample distribution and with few patients recorded as unexposed in the database [50] or when studying effects that mostly occur immediately after starting a new drug [51]. Similarly, medications that cost less than a required co-payment may not be recorded in insurance claims. In summary, there is no algorithm to classify exposure 100% correctly in claims data, although data quality is considered better than self-report and physician notes. The choice of strategy depends on whether one needs to be more concerned about falsely classifying person-time as exposed or unexposed and on the pharmacology of the hypothesized drug effect. 4.1.2. Outcome misclassification and its consequences Because utilization databases often lack detailed clinical information, one must consider the possible effect of misclassification of a given disease outcome. To understand the effect of outcome misclassification on effect estimates, it is important to note that a lack of specificity of the outcome measurement is worse than a lack of sensitivity in most situations. If specificity of the outcome assessment is 100%, then relative risk estimates are unbiased [52]. Given this, the literature on misclassification of claims data diagnoses is not quite as depressing as it first seems. A recent comprehensive study on the misclassification of claims data diagnoses using medical records review as the gold standard revealed that the sensitivity of claims diagnoses is often less than moderate, whereas their specificity is usually 95% or greater (Table 2) [53]. A high specificity of diagnostic coding in claims data can be expected because if a diagnosis is coded and recorded in the claims data it is likely that this diagnosis was made, particularly in hospital discharge summaries [54].
330
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
Table 2 Misclassification of selected diagnoses in databases for hospital and ambulatory care claims Ambulatory care billing diagnosesa
Hospital discharge diagnosesb
Diagnosis
Sensitivity Specificity Sensitivity Specificity
Hypertension COPD Diabetes Renal failure Chronic liver disease Any cancer Peptic ulcer disease Congestive heart failure AMI Neutropenia Stevens-Johnson syndrome
60.6 53.4 62.6 18.6 27.6 44.8 27.6 41.5 25.4
87.7 87.9 97.2 99.0 99.8 95.0 94.6 96.1 96.8
65 91 88 88 100 91 92c 85c 94c
99.9 98.8 99.4 99.4 100 100 100d 99d 100d 97e 95e
Abbreviations: AMI, acute myocardial infarction; COPD, chronic obstructive pulmonary disease. a Gold standard = medical records review [53]. b Coded as primary diagnosis; gold standard = medical records review [54]. c Coded as primary diagnosis; gold standard = medical records review [117]. d Specificity (Spec) is calculated from positive predictive values (PPV), sensitivity (Sens), and disease prevalence (PR) reported by Fisher et al. [117] according to: (1 ⫺ Pr) ⫺ Sens × Pr × Spec ⫽
1 ⫺ Pr
(
)
1 ⫺1 PPV
.
e PPV according to Strom et al. [117]. Because the prevalence of both conditions is very low, the specificity is close to 100 (see formula above).
For many conditions, specificity can be further improved by requiring the occurrence of disease-specific procedure codes or minimum length-of-stay requirements [55]. However, diagnoses for ambulatory services may sometimes be based on diagnostic codes for services to rule out a condition (e.g., a blood glucose test to rule out diabetes). Requiring two or more recordings of a diagnosis made in an ambulatory setting possibly combined with a drug or procedure claim specific to a confirmed diagnosis can help to increase specificity. Validating outcomes in the patient’s primary medical record can ensure high specificity so that relative risk estimates are less affected by diagnostic misclassification bias [56,57]. Validation of all events is superior to validating a sample if the availability of validation records is associated with the study exposure or study outcome. For example, blood alcohol levels were not available in people who were seriously injured or died in car accidents in a study on benzodiazepine use and road-traffic accidents [58]. Random misclassification of confounding variables (e.g., comorbid conditions) leads to incomplete control of confounding for these conditions. The Deyo claims data application [59] of the Charlson comorbidity score [60] that performs well in elderly patients [61] was recorded correctly in more than 40% of patients and misclassified by only one
point in an additional 35% of patients when compared with ambulatory medical records [53]. This tool can be expected to perform better when data from hospital records are used. Complications that occur during hospital stays are often miscoded as comorbidities. A recent validation study found that the sensitivity of coding clinical complications is usually less than 50% [62]. However, because the prevalence of complications is fairly low, this may not cause major distortions in most settings. 4.1.3. Strategies to assess the effect of misclassification The impact of modest misclassification can be profound but is rarely quantified. If the sensitivity and specificity of a specific exposure or outcome measurement are known, simple algebraic methods can be applied to standard 2×2 tables to assess the impact of misclassification in unmatched [63] and matched analyses [64]. Other techniques use the positive predictive value (PPV) of the outcome measure to be determined in a separate validation study [65,66]. This is an interesting approach for database studies because typically PPVs are easier to estimate in internal validation studies than sensitivity and specificity. All these methods are, however, applied only to unadjusted associations (i.e., crude 2×2 tables), which is an unrealistic analysis in pharmacoepidemiology. The Simulation Extrapolation (SIMEX) technique [67,68], originally developed for normally distributed data, was recently applied to pharmacoepidemiologic analysis with a multivariate logistic regression model [69]. In simulations, random misclassification is added to the observed exposure or outcome measurements in defined increments. Increasing random misclassification leads to increasingly attenuated effect estimates. These decreasing simulated effect estimates can be plotted as a function of increasing misclassification. After fitting a (non-) linear regression line through these observations, data can be extrapolated forward beyond the originally observed estimate to assess the effect of decreasing random misclassification on findings of the main study. Extending the SIMEX method to categorical data provides a valid sensitivity analysis for multivariate adjusted effect estimates but does not produce corrected estimates if the underlying misclassification mechanism cannot be quantified using a gold standard measure. 4.2. Confounding (by indication) bias Physicians prescribe drugs in light of diagnostic and prognostic information available at the time of prescribing. The factors influencing this decision vary by physician and over time [70] and frequently involve patients’ clinical, functional, or behavioral characteristics that are not directly recorded in administrative databases. If some of these factors that are imbalanced among drug users and non-users are also independent predictors of the study outcome, then failing to control for such factors can lead to confounding bias. The confounding thus results from selecting patients into drug
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
exposure groups (confounding by indication) [31]. Strategies to adjust such confounding vary depending on whether the potential confounders are measured in a given database. 4.2.1. Measured confounders If confounders are measured in a particular database, then the usual strategies for controlling confounding can be applied: restriction, stratification, matching, and multivariate modeling. These techniques are well described in standard epidemiology texts [71] and can be directly applied to database studies with the usual caveats. In studying an unexpected and harmful effect of a drug that was unnoticed or insufficiently described in randomized clinical trials, the study outcome is usually rare. A strength of database studies is that the number of subjects is large and the databases contain a large battery of measures of potential confounders. In studies using primary data collection, each confounding factor is usually measured by one pre-defined measure that can be adjusted for in multivariate models; however, in administrative data, there can often be dozens of measures for each construct of a confounding factor such as comorbidity. If we have no prior knowledge of which measure best fits the construct, the number of covariates can quickly rise, making it difficult to fit multivariate regression models for a limited number of observed outcomes even in large studies [72]. Data reduction techniques have become increasingly popular in clinical epidemiology using such databases. In these approaches, a vector of covariates is combined into a single covariate by estimating a prediction rule of exposure status and calculating an exposure propensity score for each individual or by estimating a prediction rule of disease outcome independent of exposure (or only in the non-exposed) and calculating a disease risk score for each individual [73]. 4.2.1.1. Exposure propensity scores. An exposure propensity score (EPS) is the probability (propensity) of exposure given measured covariates [74]; it can be estimated using a multivariable logistic regression model of exposure. Each patient is assigned an estimated probability of exposure ranging from 0 to 1 that reflects the propensity (rather than the known fact) of being prescribed a given drug, given all measured characteristics. Individuals with the same estimated EPS have, on average, the same chance of receiving that treatment, although they may have very different covariate constellations. EPS can be used by matching on the EPS, performing stratified analyses, and any combinations of these methods with traditional multivariable outcome modeling [75]. Because a number of covariates are “bundled” into one propensity score the analysis of the exposure-outcome association may lack transparency. Within each EPS stratum, some patients have received the treatment of interest, whereas others have not. This is sometimes proposed as a “virtual randomization” in which comparable patients are divided between treated and untreated. However, because EPS are conditional on measured
331
covariates only, there is concern that they cannot control for unmeasured or imperfectly measured variables. Residual confounding bias therefore cannot be excluded, particularly in database studies that often have limited information on many clinical confounders [76]. Although some questions on the advantages and applications of EPS are unanswered, EPS offer some clear advantages in database studies. In the evaluation of pharmaceuticals, we often deal with frequent exposures and rare outcomes [77]. Because exposure propensity scores model the relation of covariates and their interactions with the exposure and not directly with the study outcomes, such study configurations are ideal for applying EPS to avoid the risk of over-fitting if there were fewer than 10 outcomes per variable in a traditional outcome model [72,78]. Plotting and comparing the distribution of EPS for exposed and unexposed subjects can be instructive and should be standard procedure in database analyses using this approach (Fig. 4). The amount of non-overlap of these two curves on the extreme ends of the EPS distribution identifies (1) patients who have a very low probability of treatment and are not treated, possibly because of an important contraindication, and (2) patients who would be expected to always receive treatment based on their covariate vector. In these patients, there is no equipoise of evidence (or medical practice), and it is therefore questionable whether these patients should be included in an analysis. If they are, one should keep in mind the implicit distributional assumptions that regression models make to extrapolate data into a parameter space that is not supported by adequate data [71]. 4.2.1.2. Disease risk scores. Just as EPS can estimate the likelihood of a treatment given a set of measured covariates, disease risk scores (DRS) estimate the risk of a clinical outcome given a set of covariates, independent of the exposure being studied [27,79]. DRS can reduce the number of N of subjects Never treated
Always treated
0
0
0.5
1
Exposure propensity score = Treated subjects = Untreated subjects
Fig. 4. The non-overlap of the exposure propensity score distribution among treated and untreated study subjects. In this example subjects with very low propensity scores are never treated, whereas subjects with very high propensity scores are all treated.
332
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
covariates by summarizing all potential confounders in a single multivariable model that predicts the risk of disease based on measured covariates [73,80]. Strata based on ranges of predicted disease risk can then be created and entered together with the exposure into the final multivariable outcome model. Under the assumption that the DRS captures all disease risk information inherent in the measured covariates independent of exposure, an analysis adjusting for DRS produces exposure effect estimates unbiased by measured covariates. In situations of very strong associations between treatment exposure and covariates, which may be unrealistic in most practical settings, DRS may introduce bias compared with EPS [80].
patients. Adjusting for the baseline risks results in the relative excess risk (RER) as RER ⫽ (RR1 ⫺ 1)/(RR2 ⫺ 1) [88]. 4.2.2.2. Sensitivity analysis. Basic sensitivity analyses of residual confounding try to determine how strong and how imbalanced a confounder would have to be among drug categories to explain the observed effect. Such a fully adjusted relative risk (RRadj) can be expressed as a function of the unadjusted relative risk (RRunadj), the independent RR of the unmeasured confounder on the disease outcome (RRCD) and the prevalence of the confounder in both drug exposure categories (PC|E) [71]: RRadj. ⫽
4.2.2. Unmeasured confounders A good understanding of the imbalance of measured covariates among drug user categories can lead to better control for confounding factors. For example, newer sedative-hypnotics (e.g., zolpidem) are preferentially prescribed to frail elderly patients more likely to experience falls and hip fractures to avoid such outcomes. The construct of frailty is difficult to measure in administrative databases [81] and led to an overestimation of the association of newer sedativehypnotics with hip fractures when compared with users of traditional benzodiazepines or non-users [82]. In contrast, if drugs are prescribed independent of risk factors for an outcome because that potential effect was unknown at the time of prescribing, then there is no need to adjust for such factors. Such effects may have been not detected in smaller pre-approval randomized trials in selected populations and include rare adverse effects [83] and (less frequently) beneficial effects after long-term therapy [84,85]. For example, two similarly marketed selective Cox-2 inhibitors, rofecoxib and celecoxib, were likely to be equally prescribed to patients at risk for cardiovascular events, so that an increased risk of myocardial infarction seen with one of them is unlikely to be attributable to such confounding by indication [57]. Patient characteristics measured in Medicare claims data were found to be well balanced among users of the two drugs, and more detailed survey information confirmed that the choice between celecoxib and rofecoxib was independent of predictors for cardiovascular events not measurable in the database, such as smoking, body mass index, and aspirin use [86,87]. This insight leads directly to the competing comparator design. 4.2.2.1. Active (competing) comparator design. When comparing the effects of two active and competing therapies that are prescribed under the assumption of identical effectiveness and safety, it is much less likely that predictors of the study outcome are imbalanced and cause confounding. The comparative relative risk of Drug 1 versus Drug 2 can be directly estimated in a regression model or as the relative risk of Drug 1 compared with non-exposed patients divided by the relative risk of Drug 2 compared with non-exposed
RRunadj PC|E⫽1(RRCD⫺1)⫹1 PC|E⫽0(RRCD⫺1)⫹1
[
]
Psaty et al. [89] demonstrated that very strong risk factors of cardiovascular events must be unmeasured and uncontrolled to explain their observed association between the use of calcium channel blockers and acute MI after adjusting for most known cardiovascular risk factors. This type of sensitivity analysis is particularly helpful in database studies but is underutilized. Lash and Fink [90] proposed an approach that resamples the original data allowing sensitivity analyses for confounding, misclassification, and selection bias in one process. If additional information is available (e.g., a detailed survey of a representative sample of the main database study), such univariate sensitivity analyses can be used to correct for confounders unmeasured in the main study [86]. If internal validation studies are not feasible or are too costly, external data sources can be used under certain assumptions. For example, the Medicare Current Beneficiary Survey studies a representative sample of Medicare beneficiaries to measure a wide variety of characteristics that are not captured in Medicare claims data such as limitations in activities of daily living [91], cognitive impairment, and physical impairments [92]. This method was recently extended to a multivariate adjustment for unmeasured confounders using a new technique of propensity score calibration, which can be applied when external information is available [93]. In a validation study for each subject, the full database record is available along with detailed survey information. The goal is to compute, within the validation population, an error-prone exposure propensity score using only database information and an improved exposure propensity score that additionally includes survey information for each subject. The error component in the database propensity score in the validation study is then quantified and can be used to correct the propensity score in the database main study, using established regression calibration techniques [94]. 4.2.2.3. Crossover study designs. The underlying idea of crossover study designs is that case patients can serve as
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
their own controls. Relevant examples are the case-crossover design [95] and prescription symmetry analysis [96]. The case-crossover design uses a case as his or her own control by considering person-time before the case-defining event as control person-time. This makes it possible to address the problem of unmeasured between-person confounding by using previous exposure to the study drug within the same person to quantify drug use in the control person-time. This is compared with the exposure status just before the casedefining event in the same person. Several applications of case-crossover studies in administrative data demonstrate its utility in controlling unmeasured confounding. In a study of selective serotonin reuptake inhibitors and the risk of hip fracture, the relative risk estimate reduced from 6.3 (casecontrol design) to 2.0 in a case-crossover design [97]; such a reduction in effect is expected because the cross-over design better controls for confounding by frailty [82]. Farrington [98] applied the same design to vaccine safety studies assuming Poisson distributed variables. If only pharmacy dispensing data are available, then adverse events (e.g., depression) can be defined by initiation of a drug used to treat such an effect (e.g., antidepressants). Prescription symmetry analysis adjusts for the background risk of starting antidepressant therapy without having to know the clinical reasons for starting them [96]. The weak point of crossover study designs is the potential for within-person confounding over time if there is an abrupt change in a patient’s clinical status, or increase or decrease in exposure utilization trend [99]. For example, early symptoms of a case-defining event may lead to an increase in use of the study drug during the time preceding the actual event. This is less likely for suddenonset events but can pose a problem in studying insidious outcomes in claims data [100]. A limited assessment and correction of this bias is possible by including time-trend controls [101,102]; alternatively, comparator drugs known to be unrelated to the outcome can be used to calibrate casecrossover designs [100]. 4.2.2.4. Instrumental variable estimation. Instrumental variable (IV) estimation can provide unbiased estimates in database studies if several conditions are fulfilled; this is best illustrated by a study on the effect of primary left heart catheterization after an acute MI [103]: The instrumental variable (living close to a cardiac catheterization facility) must not be associated with any potential confounder (i.e., severity of MI) and must not be associated with the outcome (death) other than through the exposure of interest (undergoing catheterization). These assumptions may appear reasonable but are impossible to test in the data. At the same time the instrumental variable (living close to a facility) must be strongly associated with the exposure of interest (undergoing catheterization). This last assumption is rarely fulfilled in reality and can be thought of as a weak link between random treatment assignment and the actual treatment [104]. Such a weak association will increase the standard error of the instrumental variable estimate.
333
IV estimation has been successfully applied in databases for the evaluation of health policy changes using time trend information [105]. Because a new drug reimbursement policy affects all beneficiaries at the same time and does so indiscriminately, the IV (the date of the policy start) is not associated with any subject characteristics. Further, the date of the policy start is not associated with any outcomes itself, but only through the effects of the policy. Thus, interrupted time-trend analysis and IV estimation are considered the gold standard evaluation methods for drug policy changes. Utilization databases are particularly useful for such evaluations because of their longitudinal nature, completeness of medical service recording, and their population-based data [106].
5. Special topics related to databases 5.1. Primary medical record review Validation of diagnoses is generally recommended if there is any doubt about the specificity of the coding of the study outcome because specificity of outcome classification is key for unbiased relative risk estimates. Sudden, highly symptomatic events that lead to hospitalizations are more likely to have a low rate of false positives, whereas outcomes with insidious onset and less clearly defined diagnostic criteria are likely to be less specific. Findings of published validation studies performed in other populations may guide the decision of when to initiate a case validation, although some authors claim that case validation is always necessary in database studies [107]. The first important choice to make in this regard is whether to validate a subset of cases or all cases. The decision depends on the number of cases in a specific study, the uniformity of case definitions, and the resources available. If validation is performed in a subset of cases, the response rate must be high because coding accuracy can depend on the availability of medical records or patient information. Validation subjects are identified in the database and their physicians/hospitals contacted. This design only allows estimating positive predictive values as a proxy for specificity. Sample validation studies are often used to improve the case-identification algorithm to optimize the predictive value. However, in such situations it is advisable to divide the validation sample into a “training set” and a “test set” or to apply bootstrap techniques to obtain a valid estimate of the predictive value after the algorithm is optimized [55,108]. The final algorithm can then be used for identifying all cases in the main study. 5.2. Handling time-varying drug exposure and repeated outcomes Utilization databases contain a longitudinal string of health service encounters for individual patients, in contrast to primary data collection, which usually collects data at one
334
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
or few predetermined time points using patient interviews. This is a critical strength of database studies because drugs are often not taken continuously; yet, causal relation between a drug and most outcomes usually requires that the drug be present at a biologically active level at the time of event. Once time-varying drug exposure information is coded and covariate vectors are updated with each change in exposure status, one can use standard epidemiologic regression techniques. Repeated outcomes, such as asthma attacks leading to ER admissions, can multiply the apparent number of subjects, resulting in falsely narrow standard errors. These must be corrected for the clustering of information within individuals using one of three strategies: (1) Keep the patient the unit of analysis and model the number of outcomes but lose statistical power if there are few patients but many events; (2) use events as the unit of analysis and correct standard errors for correlation of covariate information within subjects using generalized estimating equations (GEE) [109]; or (3) use multi-level models, allowing each individual to have a different underlying tendency for the outcome [110]. GEE produces population average effects that tend to underestimate a drug’s effect in an individual if there is large variability in patients’ underlying tendency to develop the outcome (e.g., very frequent versus rare asthma attacks, independent of treatment) [111]. Multi-level models estimate the person-specific effect but make distributional assumptions (usually normal) on how individual patients vary around the population mean with regard to their baseline risk of outcome. Software for nonlinear multilevel models is imperfect, and results may not be consistent across software packages [111]. 5.3. Drug effects that are not constant over time: the new users design Risks and benefits of drugs often vary in their magnitude over time after the start of a therapeutic intervention. Intolerance may lead to differential dropout and may result in a “survivor cohort” that is much more likely to do well with the therapy. For example, randomized trials showed an initial harmful cardiovascular effect of hormone replacement therapy (HRT) [112,113], whereas observational studies failed to uncover these initial risks [114]. Observational cohorts may have failed partly because they numerically combined the increased risk of a small group of HRT starters with the beneficial effect of a much larger group of prevalent users that tolerated HRT. Randomized trials are by definition composed of inception cohorts (e.g., all subjects are new users of therapy at the point of randomization). Although observational inception cohorts are widely used in prognostic studies, they are rarely used to study drugs. Such new-user designs are underused because of the logistic difficulty of screening large populations to identify new users and the loss of sample size and statistical power with exclusion of prevalent users [84].
In utilization databases, it is logistically easy to identify new drug users by screening very large populations; this often results in datasets that retain sufficient statistical power. Many linked databases are now available covering 15 years or more (GPRD, Medicare/Medicaid, BC, Saskatchewan, etc.) and can provide long follow-up times for new users. Large database studies are therefore ideal for applying newuser designs and can thus provide a better understanding of the chronology of beneficial and adverse health effects of therapeutics. The choice of comparison groups in new-user designs must consider that every new user has been evaluated by a health professional just before receiving a new prescription, which may make them less comparable to non-users selected at random dates. Patients starting on other drugs known to be not associated with the study outcome may improve the balance of patient risk factors [115].
6. Conclusion The growing trend of recording data on all medical encounters in electronic format is making large utilization-based datasets more and more common in health care. Their representativeness, large size, and capacity to contain large quantities of longitudinal clinical data on each patient can make such datasets useful for clinical epidemiologic research, especially on medication utilization and outcomes. Such pharmacoepidemiologic applications extend from studies of physician prescribing and patient compliance to those focusing on drug adverse events, effectiveness, and cost-effectiveness. Increasing availability in electronic medical records of even more detailed clinical information, such as the medical history and the results of diagnostic tests, will further enhance the validity and versatility of such databases. Despite their advantages, these approaches have important limitations that are the subject of much innovative current methodologic research. As the universality and level of detail of such utilization databases grow in the coming years, they will offer a powerful tool for the assessment of medication uses and outcomes as long as they are used with appropriate attention to their potential limits.
Acknowledgments Dr. Schneeweiss received support from the National Institute on Aging (RO1-AG021950) and the Agency for Healthcare Research and Quality (2-RO1-HS10881), Department of Health and Human Services, Rockville, MD. We thank our colleagues in the Division of Pharmacoepidemiology and Pharmacoeconomics at the Brigham and Women’s Hospital for their helpful discussions: Robert J Glynn, PhD, ScD; Til Stu¨rmer, MD, MPH; Allan Brookhart, PhD; and Ken Rothman, DMD, DrPH.
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
References [1] Avorn J. Powerful medicines: the benefits, risks, and costs of prescription drugs. New York: Knopf; 2004. [2] The Centers for Education and Research on Therapeutics (CERTs) Risk Assessment Workshop Participants. Risk assessment of drugs, biologics and therapeutic devices: present and future issues. Pharmacoepidemiol Drug Safety 2003;12:653–62. [3] Greenland S, Finkle WD. A retrospective cohort study of implantable medical devices and selected chronic disease in Medicare claims data. Ann Epidemiol 2000;10:205–13. [4] Guess HA. Pharmacoepidemiology in pre-approval clinical trial safety monitoring. J Clin Epidemiol 1991;44:851–7. [5] Arana A, Rivero E, Egberts TCG. What do we show and who does so? An analysis of the abstracts presented at the 19th ICPE. Pharmacoepidemiol Drug Safety 2004;13:S330–1. [6] Hallas J, Gaist D, Bjerrum L. The waiting time distribution as a graphical approach to epidemiologic measures of drug utilization. Epidemiology 1997;8:666–70. [7] Benner JS, Glynn RJ, Mogun H, Neumann PJ, Weinstein MC, Avorn J. Long-term persistence in use of statin therapy in elderly patients. JAMA 2002;288:455–61. [8] Soumerai SB, McLaughlin TJ, Spiegelman D, Hertzmark E, Thibault G, Goldman L. Adverse outcomes of underuse of beta-blockers in elderly survivors of acute myocardial infarction. JAMA 1997;277: 115–21. [9] Knight EL, Avorn J. Quality indicators for appropriate medication use in vulnerable elders. Ann Intern Med 2001;135:703–10. [10] Brookhart MA, Solomon DH, Wang P, Glynn RJ, Avorn J, Schneeweiss S. Quantifying sources of explained variation in multilevel models of therapeutic decision making. J Clin Epidemiol (in press). [11] Levy AR, Tamblyn RM, Mcleod PJ, Fitchett D, Abrahamowicz M. The effect of physicians’ training on prescribing beta-blockers for secondary prevention of myocardial infarction in the elderly. Ann Epidemiol 2002;12:86–9. [12] Tamblyn R, McLeod P, Hanley JA, Girard N, Hurley J. Physician and practice characteristics associated with the early utilization of new prescription drugs. Med Care 2003;41:895–908. [13] Gatsonis CA, Epstein AM, Newhouse JP, Normand SL, McNeil BJ. Variations in the utilization of coronary angiography for elderly patients with an acute myocardial infarction: an analysis using hierarchical logistic regression. Med Care 1995;33:625–42. [14] Goldfield N. Physician profiling and risk adjustment. 2nd edition. Gaithersburg (MD): Aspen; 1999. [15] Rodriguez EM, Staffa JA, Graham DJ. The role of databases in drug postmarketing surveillance. Pharmacoepidemiol Drug Saf 2001;10: 407–10. [16] Strom BL. Sample size considerations for pharmacoepidemiology studies. In: Strom BL, editor. Pharmacoepidemiology. 3rd edition. Chichester (UK): Wiley; 2000;31–40. [17] Weatherby LB, Nordstrom BL, Fife D, Walker AM. The impact of wording in “Dear doctor” letters and in black box labels. Clin Pharmacol Ther 2002;72:735–42. [18] Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. Am J Public Health 1998;88:15–9. [19] Miettinen OS. The need for randomization in the study of intended effects. Stat Med 1983;2:267–71. [20] Strom BL, Melmon KL, Miettinen OS. Post-marketing studies of drug efficacy: how? Am J Med 1984;77:703–8. [21] Grodstein F, Clarkson TB, Manson JE. Understanding the divergent data on postmenopausal hormone therapy. N Engl J Med 2003;348: 645–50. [22] Walker AM, Lanza LL, Arellano F, Rothman K. Mortality in current and former users of clozapine. Epidemiology 1997;8:671–7. [23] The West of Scotland Coronary Preventive Study Group . Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study. J Clin Epidemiol 1995;48:1441–52.
335
[24] Soumerai SB, Ross-Degnan D, Fortess EE, Abelson J. A critical analysis of studies of state drug reimbursement policies: research in need of discipline. Milbank Q 1993;71:217–52. [25] Tamblyn R, Laprise R, Hanley JA, Abrahamowicz M, Scott S, Mayo N, Hurley J, Grad R, Latimer E, Perreault R, McLeod P, Huang A, Larochelle P, Mallet L. Adverse events associated with prescription drug cost-sharing among poor and elderly persons. JAMA 2001;285:421–9. [26] Soumerai SB, Ross-Degnan D, Avorn J, McLaughlin T, Choodnovskiy I. Effects of Medicaid drug-payment limits on admission to hospitals and nursing homes. N Engl J Med 1991;325:1072–7. [27] Schneeweiss S, Walker AM, Glynn RJ, Maclure M, Dormuth C, Soumerai SB. Outcomes of reference pricing for angiotensinconverting enzyme inhibitors. New Engl J Med 2002;346:822–9. [28] Hennessy S, Bilker WB, Weber A, Strom BL. Descriptive analyses of the integrity of a US Medicaid claims database. Pharmacoepidemiol Drug Saf 2003;12:103–11. [29] Sorenson HT, Sabroe S, Olsen J. A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol 1996;25:435–42. [30] Hallas J. Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis. Epidemiology 1996;7:478–84. [31] Walker AM. Confounding by indication. Epidemiology 1996;7: 335–6. [32] Bright RA, Avorn J, Everitt DE. Medicaid data as a resource for epidemiologic studies: strengths and limitations. J Clin Epidemiol 1989;42:937–45. [33] Platt R, Davis R, Finkelstein J, Go AS, Gurwitz JH, Roblin D, Soumerai S, Ross-Degnan D, Andrade S, Goodman MJ, Martinson B, Raebel MA, Smith D, Ulcickas-Yood M, Chan KA. Multicenter epidemiologic and health services research on therapeutics in the HMO Research Network Center for Education and Research on Therapeutics. Pharmacoepidemiol Drug Saf 2001;10:373–7. [34] Lewis JD, Brensinger C. Agreement between GPRD smoking data: a survey of general practitioners and a population-based survey. Pharmacoepidemiol Drug Saf 2004;13:437–41. [35] Jick H, Zornberg GL, Jick SS, Seshadri S, Drachman DA. Statins and the risk of dementia. Lancet 2000;356:1627–31. [36] Hallas J. Conducting pharmacoepidemiologic research in Denmark. Pharmacoepidemiol Drug Saf 2001;10:619–23. [37] Schneeweiss S, Scho¨ffski O, Selke GW. What is Germany’s experience on reference based drug pricing and the etiology of adverse health outcomes or substitution? Health Policy 1998;44:253–60. [38] Willison DJ. Health services research and personal health information: privacy concerns, new legislation, and beyond. CMAJ 1998; 159:1378–80. [39] Lynd LD, Warren L, Maclure M, Pare´ PD, Anis AH. Using administrative data to recruit study participants while protecting patient privacy: experience with “Camouflaged Sampling”. Eur J Epidemiol 2004;19:517–25. [40] Short PF, Graefe DR, Schoen C. Churn, churn, churn: how instability of health insurance shapes America’s uninsured problem. Issue brief. New York, NY: The Commonwealth Fund; 2003. [41] Maclure M, Schneeweiss S. Causation of bias: the Episcope. Epidemiology 2000;12:114–22. [42] Stergachis AS. Record linkage studies for postmarketing drug surveillance: data quality and validity considerations. Drug Intell Clin Pharm 1988;22:157–61. [43] Levy AR, O’Brien BJ, Sellors C, Grootendorst P, Willison D. Coding accuracy of administrative drug claims in the Ontario Drug Benefit database. Can J Clin Pharmacol 2003;10:67–71. [44] McKenzie DA, Semradek J, McFarland BH, Mullooly JP, McCamant LE. The validity of medicaid pharmacy claims for estimating drug use among elderly nursing home residents: the Oregon experience. J Clin Epidemiol 2000;53:1248–57.
336
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337
[45] West S, Savitz DA, Koch G, Strom BL, Guess HA, Hartzema A. Recall accuracy for prescription medications: self report compared with database information. Am J Epidemiol 1995;142:1103–12. [46] West S, Strom BL, Freundlich B, Normand E, Koch G, Savitz DA. Completeness of prescription recording in outpatients medical records from a health maintenance organization. J Clin Epidemiol 1994;47:165–71. [47] WHO Collaborating Centre for Drug Statistics Methodology. ATC index with DDD. Oslo, Norway: WHO; 2003. [48] McMahon AD, Evans JM, McGilchrist MM. et al.: Drug exposure risk windows and unexposed comparator groups for cohort studies in Pharmacoepidemiology. Pharmacoepidemiol Drug Safety 1998; 7:275–80. [49] Dormuth C, Schneeweiss S. Rapid monitoring of drug discontinuation rates in response to restrictions in drug reimbursement. Pharmacoepidemiol Drug Saf 2004;13:5310–1. [50] Jacobus S, Schneeweiss S, Chan KA. Exposure misclassification as a result of free sample drug utilization in automated claims databases and its effect on pharmacoepidemiologic studies of selective COX2 inhibitors. Pharmacoepidemiol Drug Safety 2004;13:695–702. [51] Guess HA. Behavior of the exposure odds ratio in a case-control study when the hazard function is not constant over time. J Clin Epidemiol 1989;42:1179–84. [52] Kelsey JL, Whittemore AS, Evans AS, Thompson WD. Methods in observational epidemiology. 2nd edition. New York: Oxford University Press; 1996. [53] Wilchesky M, Tamblyn RM, Huang A. Validation of diagnostic codes within medical services claims. J Clin Epidemiol 2004;57:131–41. [54] Romano PS, Mark DH. Bias in the coding of hospital discharge data and its implications for quality assessment. Med Care 1994;32: 81–90. [55] Kiyota Y, Schneeweiss S, Glynn RJ, Cannuscio CC, Avorn J, Solomon DH. The accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value based on review of hospital records. Am Heart J 2004;148:99–104. [56] Meier CR, Jick SS, Derby LE, Vasilakis C, Jick H. Acute respiratorytract infections and risk of first-time acute myocardial infarction. Lancet 1998;351:1467–71. [57] Solomon DH, Schneeweiss S, Glynn RJ, Kiyota Y, Levin R, Mogun H, Avorn J. The relationship between selective COX-2 inhibitors and acute myocardial infarction. Circulation 2004;109:2068–73. [58] Barbone F, McMahon AD, Davey PG, Morris AD, Reid IC, McDevitt DG, MacDonald TM. Association of road-traffic accidents with benzodiazepine use. Lancet 1998;352:1331–6. [59] Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 1992;45:613–9. [60] Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chron Dis 1987;40:373–83. [61] Schneeweiss S, Seeger J, Maclure M, Wang P, Avorn J, Glynn RJ. Performance of comorbidity scores to control for confounding in epidemiologic studies using claims data. Am J Epidemiol 2001;154: 854–64. [62] Quan H, Parsons GA, Ghali WA. Assessing accuracy of diagnosistype indicators for flagging complications in administrative data. J Clin Epidemiol 2004;57:366–72. [63] Copeland KT, Checkoway H, Holbrook RH, McMichael AJ. Bias due to misclassification in the estimate of relative risk. Am J Epidemiol 1977;105:488–95. [64] Greenland S. The effect of misclassification in matched-pair casecontrol studies. Am J Epidemiol 1982;116:402–6. [65] Marshall RJ. Validation study methods for estimating exposure proportions and odds ratios with misclassified data. J Clin Epidemiol 1990;43:941–7. [66] Brenner H, Gefeller O. Use of positive predictive value to correct for disease misclassification in epidemiologic studies. Am J Epidemiol 1993;138:1007–15.
[67] Cook JR, Stefanski LA. Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc 1994;89: 1314–28. [68] Stefanski LA, Cook JR. Simulation-extrapolation: the measurement error jackknife. J Am Stat Assoc 1995;90:1247–56. [69] Schneeweiss S, Spiegelman DL, Avorn J, Glynn RJ. Sensitivity of multivariate logistic regression results towards random misclassification of exposure status and event dates in a study on antiparkinsonian drug use and sudden onset pathologic somnolence. Pharmacoepidemiol Drug Safety 2003;12:S146–7. [70] Schneeweiss S, Glynn RJ, Avorn J, Solomon DH. A Medicare database review found that physician preferences increasingly outweighed patient characteristics as determinants of first-time prescriptions for cox-2 inhibitors. J Clin Epidemiol 2005;58:98–102. [71] Rothman KJ, Greenland S. Modern epidemiology. 2nd edition. Philadelphia: Lippincott Williams & Wilkins; 1998. [72] Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373–9. [73] Miettinen OS. Stratification by a multivariate confounder score. Am J Epidemiol 1976;104:609–20. [74] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. [75] Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Principles for modeling propensity scores in medical research: a systematic literature review. Pharmacoepidemiol Drug Safety 2004;13:841–53. [76] Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med 1997;127:757–63. [77] Braitman LE, Rosenbaum PR. Rare outcomes, common treatments: analytic strategies using propensity scores. Ann Intern Med 2002;137:693–5. [78] Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol 2003;158: 280–7. [79] Ray WA, Stein CM, Hall K, Daugherty JR, Griffin MR. Non-steroidal anti-inflammatory drugs and risk of serious coronary heart disease: an observational cohort study. Lancet 2002;359:118–23. [80] Cook EF, Goldman L. Performance of tests of significance based on stratification by a multivariate confounder score or by a propensity score. J Clin Epidemiol 1989;42:317–24. [81] Glynn RJ, Knight EL, Levin R, Avorn J. Paradoxical relations of drug treatment with mortality in older persons. Epidemiology 2001; 12:682–9. [82] Schneeweiss S, Wang P. Association between SSRI use and hip fractures and the effect of residual confounding bias in claims database studies. J Clin Psychopharm 2004;13:695–702. [83] Vandenbroucke JP. When are observational studies as credible as randomized trials? Lancet 2004;363:1728–31. [84] Ray WA. Evaluating medication effects outside of clinical trials: new-user designs. Am J Epidemiol 2003;158:915–20. [85] Giovannucci E, Egan KM, Hunter DJ, Stampfer MJ, Colditz GA, Willett WC, Speizer FE. Aspirin and the risk of colorectal cancer in women. N Engl J Med 1995;333:609–14. [86] Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH. Assessment of bias by unmeasured confounders in pharmacoepidemiologic claims data studies using external data. Epidemiology 2005;16: 17–24. [87] Velentgas P, Cali C, Diedrick G, Heinen MJ, Verburg KM, Dreyer NA, Walker AM. A survey of aspirin use, non-prescription NSAID use, and cigarette smoking among users and non-users of prescription NSAIDs: estimates of the effect of unmeasured confounding by these factors on studies of NSAID use and risk of myocardial infarction. Pharmacoepidemiol Drug Safety 2001;10:S103. [88] Suissa S. Relative excess risk: an alternative measure of comparative risk. Am J Epidemiol 1999;150:279–82.
S. Schneeweiss, J. Avorn / Journal of Clinical Epidemiology 58 (2005) 323–337 [89] Psaty BM, Koepsell TD, Lin D, Weiss NS, Siscovick DS, Rosendaal FR, Pahor M, Furberg CD. Assessment and control for confounding by indication in observational studies. J Am Geriatr Soc 1999;47:749–54. [90] Lash TL, Fink AK. Semi-automated sensitivity analysis to assess systematic errors in observational data. Epidemiology 2003;14: 451–8. [91] Saliba D, Orlando M, Wenger NS, Hays RD, Rubenstein LZ. Identifying a short functional disability screen for older persons. J Gerontol 2000;55:750–6. [92] Eppig FJ, Chulis GS, Matching MCBS. Medicare data: the best of both worlds. Health Care Financing Rev 1997;18:211–29. [93] Sturmer T, Schneeweiss S, Avorn J, Glynn RJ. Correcting effect estimates for unmeasured confounding in cohort studies with validation studies using propensity score calibration. Am J Epidemiol 2004, in press. [94] Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol 1990;132:734–45. [95] Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol 1991;133: 144–53. [96] Hallas J. Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis. Epidemiology 1996;7:478–84. [97] Hubbard R, Farrington P, Smith C, Smeeth L, Tattersfield A. Exposure to tricyclic and selective serotonin reuptake inhibitor antidepressants and the risk of hip fracture. Am J Epidemiol 2003;158:77–84. [98] Farrington CP. Relative incidence estimation from case series for vaccine safety evaluation. Biometrics 1995;51:228–35. [99] Vines SK, Farrington CP. Within-subject exposure dependency in case-crossover studies. Stat Med 2001;20:3039–49. [100] Wang PS, Schneeweiss S, Glynn RJ, Mogun H, Avorn J. Use of the case-crossover design to study prolonged drug exposures and insidious outcomes. Ann Epidemiol 2004;14:296–303. [101] Suissa S. The case-time-control design. Epidemiology 1995;6: 248–53. [102] Suissa S. The case-time-control design: further assumptions and conditions. Epidemiology 1998;9:441–5. [103] McClellan M, McNeil BJ, Newhouse JP. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? Analysis using instrumental variables. JAMA 1994;272:859–66. [104] Newhouse JP, McClellan M. Econometrics in outcomes research: the use of instrumental variables. Annu Rev Public Health 1998;19: 17–34.
337
[105] Schneeweiss S, Maclure M, Soumerai SB, Walker AM, Glynn RJ. Quasi-experimental longitudinal designs to evaluate drug benefit policy changes with low policy compliance. J Clin Epidemiol 2002; 55:833–41. [106] Ray WA. Policy and program analysis using administrative databases. Ann Intern Med 1997;127:712–8. [107] Jick H, Jick SS, Derby LE. Validation of information recorded on general practitioner based computerised data resource in the United Kingdom. Br Med J 1991;302:766–8. [108] Miller DP, Alfredson T, Cook SF, Sands BE, Walker AM. Incidence of colonic ischemia, hospitalized complications of constipation, and bowel surgery in relation to use of alosetron hydrochloride. Am J Gastroenterol 2003;98:1117–22. [109] Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13–22. [110] Snijders T, Bosker R. Multilevel analysis. London: Sage; 1999. [111] Twisk JWR. Applied longitudinal data analysis for epidemiology. Cambridge, UK: Cambridge University Press; 2003. [112] Hulley S, Grady D, Bush T, Furberg C, Herrington D, Riggs B, Vittinghoff E. Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group. JAMA 1998;280:605–13. [113] Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, Jackson RD, Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J, Writing Group for the Women’s Health Initiative Investigators . Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women’s Health Initiative randomized controlled trial. JAMA 2002;288:321–33. [114] Grodstein F, Manson JE, Colditz GA, Willett WC, Speizer FE, Stampfer MJ. A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Ann Intern Med 2000;133:933–41. [115] Mamdani M, Rochon P, Juurlink DN, Anderson GM, Kopp A, Naglie G, Austin PC, Laupacis A. Effect of selective cyclooxygenase 2 inhibitors and naproxen on short-term risk of acute myocardial infarction in the elderly. Arch Intern Med 2003;163:481–6. [116] Fisher ES, Whaley FS, Krushat M, Malenka DJ, Fleming C, Barin JA, Hsia DC. The accuracy of Medicare’s hospital claims data: progress has been made, but problems remain. Am J Public Health 1992;82: 243–8. [117] Strom BL. Data validation issues in using claims data. Pharmacoepidemiol Drug Safety 2001;10:389–92.