ARTICLE IN PRESS
Risk Prediction for Ischemic Stroke and Transient Ischemic Attack in Patients Without Atrial Fibrillation: A Retrospective Cohort Study Zhong Yuan, MD, PhD,* Erica A. Voss, MPH,† Frank J. DeFalco, BA,† Guohua Pan, PhD,† Patrick B. Ryan, PhD,* Daniel Yannicelli, MD,‡,1 and Christopher Nessel, MD†
Background: Stroke mainly occurs in patients without atrial fibrillation (AF). This study explored risk prediction models for ischemic stroke and transient ischemic attack (TIA) in patients without AF. Methods: Three US-based healthcare databases (Truven MarketScan Commercial Claims and Encounters [CCAE], Medicare Supplemental [MDCR], and Optum Clinformatics [Optum]) were used to establish patient cohorts without AF during the index period of 2008-2012. The performance of 2 existing models (CHADS2 and CHA2DS2-VASc) for predicting stroke and TIA was examined by fitting a logistic regression to a training dataset and evaluating predictive accuracy in a validation dataset (area under the curve, AUC) using patients with complete follow-up of 1 or 3 years, separately. Results: The commercial populations were younger and had fewer comorbidities than Medicare-eligible population. The incidence proportions of ischemic stroke and TIA during 1 and 3 years of follow-up were .5% and 1.9% (CCAE), .6% and 2.2% (Optum), and 4.6% and 13.1% (MDCR), respectively. The models performed consistently across all 3 databases, with the AUC ranging from .69 to .77 and from .68 to .73 for 1- and 3-year prediction, respectively. Predictive accuracy was lower than the initial work of CHADS2 evaluation in patients with AF (AUC: .82), but consistent with a subsequent meta-analysis of CHADS2 (.60-.80) and CHA2DS2-VASc performance (.64-.79). Conclusion: Although the existing schemes for predicting ischemic stroke and TIA in patients with AF can be applied to patients without AF with comparable predictive accuracy, the evidence suggests that there is room for improvement in these models’ performance. Key Words: Stroke—transient ischemic attack—risk prediction—stroke prevention. © 2017 The Authors. Published by Elsevier Inc. on behalf of National Stroke Association. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
From the *Janssen Research & Development, LLC, Titusville, New Jersey; †Janssen Research & Development, LLC, Raritan, New Jersey; and ‡Janssen Scientific Affairs, LLC, Titusville, New Jersey. Received April 29, 2016; revision received February 9, 2017; accepted March 24, 2017. Declaration of financial/other relationships: The authors (Z.Y., E.A.V, F.J.D, G.P., P.B.R, C.N.) are salaried employees of Janssen Research & Development, LLC, USA. Address correspondence to Zhong Yuan, MD, PhD, Janssen Research & Development, LLC, 1125 Trenton-Harbourton Rd, Titusville, NJ 08560. E-mail:
[email protected]. 1 Dr. Yannicelli:
[email protected] (has left Janssen Scientific Affairs, LLC, at the time of resubmission). 1052-3057/$ - see front matter © 2017 The Authors. Published by Elsevier Inc. on behalf of National Stroke Association. This is an open access article under the CC BYNC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). http://dx.doi.org/10.1016/j.jstrokecerebrovasdis.2017.03.036
Journal of Stroke and Cerebrovascular Diseases, Vol. ■■, No. ■■ (■■), 2017: pp ■■–■■
1
ARTICLE IN PRESS Z. YUAN ET AL.
2
Introduction Stroke is the leading cause of adult disability, representing a significant public health problem worldwide. In the United States, stroke is the fourth leading cause of death among all diseases, with an annual incidence of 795,000, resulting in nearly 130,000 deaths a year.1 Although the incidence of stroke increases with age and with the presence of a number of comorbidities, atrial fibrillation (AF) is the most important single predictor of ischemic stroke (primarily through embolism of the left atrial appendage thrombi), which confers nearly a fivefold increase in risk of stroke based on the Framingham study.2 Given this important causal relationship, the thrombotic mechanism and AF as the most common cardiac arrhythmia (particularly in the elderly), prophylactic anticoagulation has been the cornerstone in stroke prevention in patients with AF for several decades, potentially saving many lives.3,4 Several risk prediction schemes were developed initially to characterize the risk of stroke for patients with AF, including those developed by the Atrial Fibrillation Investigators (AFI) and the Stroke Prevention in Atrial Fibrillation (SPAF) III investigators.5-7 In predicting stroke in patients with AF, the model-based C-statistic (the area under the curve [AUC] for this receiver operating characteristic [ROC] curve) was .68 (95% confidence interval [CI]: .65-.71) for AFI and .74 (95% CI: .71-.76) for SPAF. Based on data from the National Registry of Atrial Fibrillation, encompassing Medicare beneficiaries aged 6595 years with nonrheumatic AF who were not prescribed warfarin on hospital discharge, Gage et al showed that the CHADS2 index (congestive heart failure [CHF], hypertension, age ≥75, diabetes, and prior stroke or transient ischemic attack [TIA] [double score]) had an improved performance as compared with AFI and SPAF for predicting stroke, with a C-statistic of .82 (95% CI, .80-.84).8 Because of its simplicity, the CHADS2 index became the most commonly used scoring scheme for stroke prediction in patients with AF. More recently, Lip et al developed the CHA2DS2-VASc score, consisting of CHF, hypertension, age ≥75 years (double score), diabetes mellitus, previous stroke and TIA (double score), vascular disease, age 65-74 years, and sex (female), with an accompanying C-statistic of .606 (95% CI: .513-0.699).9 The European Society of Cardiology, the American College of Cardiology/ American Heart Association, and the National Institute for Health and Care Excellence all now recommend the use of CHA2DS2-VASc as the preferred risk scoring method to assess stroke risk in AF patients, as it provides more accurate assessment of low-risk patients than the existing methods.10-12 Stroke prevention in patients with nonvalvular atrial fibrillation relies on an assessment of the individual risks and CHADS2 and CHA2DS2-VASc risk scores, and they are commonly used in clinical practice.13 Our pre-
vious analyses have demonstrated that the risk of stroke among patients without AF is also positively associated with those risk scores.14 Although approximately 85% of all strokes occur in people without AF, the performance of these schemes for predicting stroke risk has not been comprehensively examined in this population from a statistical perspective.15,16 Unlike patients with AF, there is no clear prophylactic strategy for stroke prevention in patients without AF; a good prediction model could promote the awareness of patients at high risk, thus allowing healthcare providers to enhance treatment planning by aggressively managing underlying diseases. Therefore, we designed the current study and hypothesized that the commonly used schemes such as CHADS2 and CHA2DS2-VASc scores can be used to predict the risks of stroke and TIA in patients without AF. In addition, we intended to explore additional factors that might easily be identified in clinical practice, and that might improve the model performance, as measured by AUC. To accomplish these study objectives and assess the robustness and consistency of the results, we employed multiple commercially available databases, multiple end points, and different durations of follow-up for ascertaining the outcomes of interest.
Methods Study Design, Data Sources, and Patient Selection This was a retrospective cohort study that used commercially available claims databases, including Truven MarketScan Commercial Claims and Encounters (CCAE), Truven MarketScan Medicare Supplemental (MDCR), and Optum Clinformatics (Optum). Briefly, CCAE is an administrative health claims database for active employees, early retirees, the Consolidated Omnibus Budget Reconciliation Act beneficiaries, and their dependents insured by employer-sponsored plans (individuals in plans or product lines with fee-for-service plans and fully capitated or partially capitated plans). CCAE captures person-specific clinical utilization, expenditures, and enrollment across inpatient, outpatient, prescription drug, and carve-out services. MDCR is an administrative health claims database for Medicareeligible active and retired employees and their Medicareeligible dependents from employer-sponsored supplemental plans (predominantly fee-for-service plans). Only plans where both the Medicare-paid amounts and the employerpaid amounts were available and evident on the claims were selected for this database. MDCR also captures person-specific clinical utilization, expenditures, and enrollment across inpatient, outpatient, prescription drug, and carve-out services. Finally, Optum is an administrative health claims database for members who are fully insured in commercial plans or in administrative services only, Medicaid (prior to July 2010, 1.25 million)
ARTICLE IN PRESS RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS
Figure 1.
3
Flowchart for the cohort definition.
and Legacy Medicare Choice (prior to January 2006, 0.36 million). Optum also captures person-specific clinical utilization, expenditures, and enrollment across inpatient, outpatient, prescription drug, and carve-out services. These 3 databases were used to establish the study cohort. Patients were considered eligible for the cohort if a medical encounter took place between the years 2008 and 2012, with the first service date set as the index date for the patient. These patients needed to be between 18 and 64 years of age for CCAE and Optum and older than or equal to 65 years of age for MDCR, and have at least 365 days of continuous observation time prior to the index date. Patients with a prior diagnosis of AF and evidence of receiving warfarin or direct oral anticoagulants were excluded from the analyses. Once this base cohort was established, patients were evaluated to determine if they had an end point of interest during a 1-year follow-up period, that is, ischemic stroke or composite of ischemic stroke or TIA. Patients who did not experience an end point of interest were required to have at least 1-year complete observable time postindex date, whereas patients who experienced an end point of interest did not need to meet that requirement (Fig 1). As part of sensitivity analysis, we also repeated the analyses using a 3-year observable time window postindex date.
Main Outcomes Measure The composite of ischemic stroke or TIA (at 1-year or 3-year observable time window, respectively) was the main outcome of interest because the current study is intended to assess the performance of the existing risk schemes (i.e., CHADS2 and CHA2DS2-VASc) for its prediction. The outcome postindex date was identified using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes present in any diagnosis field in the database (ischemic stroke: 433.x1, 434.1, 434.x1; TIA: 435.x). We used all diagnosis fields to ascertain the study end point, because a prior validation study in patients with AF by Thigpen et al17 showed that, although the diagnosis using the primary position had an excellent positive predictive value (97.2%), the diagnosis using the nonprimary position accounted for about 20% of all valid stroke events, with a positive predictive value of 83.7%, which is still reasonably good for secondary database studies. Furthermore, in our definition of ischemic stroke, we did not use the ICD-9-CM codes of 436 (acute, but ill-defined cerebrovascular disease) primarily for 2 reasons: (1) in the aforementioned study, this code identified fewer than 3% of all stroke events (in patients with AF), and (2) the accuracy of this code in patients without AF has not been comprehensively examined. Finally, given that TIA may be considered a soft
ARTICLE IN PRESS Z. YUAN ET AL.
4
end point, we also assessed the end point of ischemic stroke alone to corroborate the primary analysis.17 For descriptive purposes, a number of baseline comorbidities of interest were identified using the ICD9-CM codes, including diabetes, CHF and left ventricle dysfunction, myocardial infarction, chronic obstructive pulmonary disease, heart failure, vascular disease, hypertension, hyperlipidemia, hyperthyroidism, thromboembolism, liver disease, renal disease, cancer, ischemic stroke, and TIA. Appendix S1 presents the ICD-9-CM codes for these comorbid conditions.
Statistical Analyses Descriptive statistics were provided for patient demographics and baseline comorbidities. Means and standard deviations were reported for continuous variables, whereas counts and frequencies were reported for categorical variables. The incidence of ischemic stroke and TIA was presented as incidence proportion for each fixed observable time period along with standard errors (SE). Because the sample size was quite large for each database, we did not present CI for the incidence proportions as the data points were nearly identical. As expected, SE converged near 0 as sample size increased, but it should be interpreted in a proper context because it did not take other factors into consideration (e.g., potential systematic error, measurement error, or misclassification). The scores for the existing schemes of CHADS2 (CHF; hypertension; age older than or equal to 75 years; diabetes mellitus; and prior stroke, TIA, or thromboembolism) and CHA2DS2-VASc (CHADS2 + vascular disease, age between 65 and 74 years, and sex category) were calculated based on baseline comorbidities.8,9 For the CCAE and Optum, because all patients were younger than 65 years of age, the CHADS2 and CHA2DS2-VASc scores were modified and calculated without the age category. To assess the performance of these 2 schemes for predicting ischemic stroke and TIA, we performed a logistic regression model, including the risk scores as independent variable and outcomes of interest as dependent variable. The model differentiates the correct classification of each patient (outcome or not), and the predicted probability was plotted in an ROC curve with sensitivity against 1-specificity. The AUC for this ROC curve is also known as the C-statistic, with a range of .5 (no discrimination) to a theoretical maximum of 1.18,19 The model performance was compared across risk schemes (CHADS2 and CHA2DS2VASc), databases (CCAE, Optum, MDCR), and different observable time periods (1-year and 3-year complete observable time). To explore whether additional parameters would improve the patient-level prediction (PLP), we employed a regularized logistic regression model, which included a large number of baseline covariates, including age, sex, diagnoses, procedures, comorbidities, medications, comorbidity
indices, and factors associated with resource utilization (e.g., number of outpatient visits, hospitalizations).20 For these analyses, we divided each study cohort into 2 groups: one for the training dataset and one for the validation dataset. Summary statistics were reported on full cohorts created in each dataset, and prediction model statistics were reported based on the validation dataset. The AUC scores from the PLP models were compared with the AUC scores from the CHADS2 and CHA2DS2-VASc models (respectively) to show the models’ improvements, if any. Only de-identified patient-level data were analyzed for this study and institutional review board oversight was not required. All analyses were conducted in R version 3.2.1 (Vienna, Austria) and the main package used was the PatientLevelPrediction (PLP) package generated from the Observational Health Data Sciences and Informatics open-source community.21,22
Results A total of 12,006,960 (CCAE), 5,318,574 (Optum), and 1,371,352 (MDCR) patients were included in the final analysis from each database. The baseline characteristics are presented in Table 1. Hypertension, hyperlipidemia, and cancer were the most prevalent comorbidities among the study patients. As expected, the privately insured patient populations (CCAE and Optum) were younger and had fewer comorbidities than the Medicare-eligible population (MDCR), corresponding to CHADS2 and CHA2DS2VASc scores (standard deviation) of .3 (.6) and .9 (.8) for CCAE, .3 (.6) and .8 (.8) for Optum, and 1.7 (1.2) and 3.2 (1.3) for MDCR. Gender distribution was generally balanced for the CCAE and Optum patients, whereas about 56.2% of patients were females for MDCR, which is consistent with the gender distribution in the overall MDCR database. These observations are generally consistent with the findings from previous investigations of patient characteristics across databases.23,24 The incidence proportion (SE) of ischemic stroke and TIA during 1 and 3 years of follow-up were .5% (.00002) and 1.9% (.00006) for CCAE, .6% (.00004) and 2.2% (.00010) for Optum, and 4.6% (.00020) and 13.1% (.00039) for MDCR, respectively. Within each database cohort, the predictive accuracy between the 2 models (CHADS2 and CHA2DS2-VASc scores) was similar regardless of followup time, with the AUC difference less than .01. Overall, the models performed similarly for the MDCR cohort, with the AUC ranging from .68 (95% CI: .68-.69) to .70 (95% CI: .69-.70) regardless of follow-up time period (Table 2). For the CCAE and Optum cohorts, the models generally performed slightly better for 1-year follow-up time (the AUC ranged from .72 to .74) as compared with 3-year follow-up time (the AUC ranged from .69 to .70), with the AUC difference between .03 and .04 with the same model for different observation periods.
ARTICLE IN PRESS RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS
5
Table 1. Baseline characteristics of the study patients without atrial fibrillation
Study cohort, N Baseline demographics Mean age, years Standard deviation Sex (female) (%) Baseline comorbidity of interest Diabetes (%) Congestive heart failure and left ventricle dysfunction (%) Myocardial infarction (%) Chronic obstructive pulmonary disease (%) Heart failure (%) Vascular disease (%) Hypertension (%) Hyperlipidemia (%) Hyperthyroidism (%) Thromboembolism (%) Liver disease (%) Renal disease (%) Cancer (%) Ischemic stroke, TIA (%) Ischemic stroke, TIA, and thromboembolism (%) Ischemic stroke, TIA, and thromboembolism (within 60 days of index) (%) CHADS2 (SD) CHA2DS2-VASc (SD)
CCAE
Optum
MDCR
12,006,960
5,318,574
1,371,352
43 13 50.9
42 13 50.9
76 7 56.3
7.7 1.0 .5 2.0 .7 2.5 22.1 25.3 1.3 .1 5.2 .6 11.5 .8 .9 .1 .3 (.6) .9 (.8)
7.0 .9 .3 1.7 .6 2.2 21.4 26.2 1.1 .0 5.2 .6 10.8 .6 .6 .1 .3 (.6) .8 (.8)
23.4 9.8 3.1 15.2 8.4 15.6 64.7 45.9 1.4 .4 6.9 5.4 36.0 8.8 9.1 .8 1.7 (1.2) 3.2 (1.3)
Abbreviations: CCAE, Truven MarketScan Commercial Claims and Encounters; MDCR, Truven MarketScan Medicare Supplemental; Optum, Optum Clinformatics; SD, standard deviation; TIA, transient ischemic attack.
Table 2. CHADS2 and CHA2DS2-VASc scores for predicting outcomes of interest among patients without AF: AUC Risk prediction model: AUC (95% CI)
Study cohort, N Outcome: composite of ischemic stroke or TIA With 1-year complete follow-up CHADS2 With 3-year complete follow-up CHA2DS2-VASc
With 1-year complete follow-up With 3-year complete follow-up
Outcome: ischemic stroke CHADS2 With 1-year complete follow-up With 3-year complete follow-up CHA2DS2-VASc
With 1-year complete follow-up With 3-year complete follow-up
CCAE
Optum
MDCR
12,009,924
5,318,577
1,373,502
.73 (.73-.73) .70 (.70-.70) .72 (.72-.72) .69 (.69-.69)
.74 (.73-.74) .71 (.70-.71) .73 (.72-.73) .70 (.70-.70)
.70 (.69-.70) .69 (.68-.69) .69 (.69-.69) .68 (.68-.69)
.76 (.76-.76) .73 (.72-.73) .74 (.73-.74) .71 (.70-.71)
.77 (.76-.78) .73 (.72-.74) .75 (.74-.76) .71 (.71-.72)
.71 (.70-.71) .70 (.69-.70) .70 (.69-.70) .69 (.69-.69)
Abbreviations: AF, atrial fibrillation; AUC, area under the curve; CI, confidence interval; CCAE: Truven MarketScan Commercial Claims and Encounters; Optum: Optum Clinformatics; MDCR: Truven MarketScan Medicare Supplemental; TIA, transient ischemic attack.
ARTICLE IN PRESS Z. YUAN ET AL.
6
Table 3. Common predictors for all 3 databases for the outcome of stroke and TIA looking back 365 days with associated betas
Description Intercept Number of distinct conditions observed in 365 days on or prior to cohort index Number of distinct drug ingredients observed in 365 days on or prior to cohort index Number of distinct procedures observed in 365 days on or prior to cohort index Number of visits observed in 365 days on or prior to cohort index Number of ER visits observed in 365 days on or prior to cohort index Charlson index—Romano adaptation, using conditions all time on or prior to cohort index Diabetes Comorbidity Severity Index, using conditions all time on or prior to cohort index CHADS2, using conditions all time on or prior to cohort index Condition era record observed during anytime on or prior to cohort index: type 2 diabetes mellitus Condition era record observed during anytime on or prior to cohort index: transient cerebral ischemia Condition occurrence record observed during 365 days on or prior to cohort index: cerebral infarction due to thrombosis of cerebral arteries Condition era record observed during anytime on or prior to cohort index: cerebral infarction due to thrombosis of cerebral arteries Condition era record observed during anytime on or prior to cohort index: adult health examination Number of ingredients within the drug group observed all time on or prior to cohort index: DRUGS USED IN DIABETES Number of ingredients within the drug group observed all time on or prior to cohort index: ANALGESICS
CCAE beta
MDCR beta
Optum beta
−6.1026 −.0090 .0038 .0031 −.0084 .0370 .0764 .0224 .5308 −.4195
−3.8255 .0140 .0125 −.0042 .0010 −.0026 .0094 .0457 .1977 −.1959
−6.0192 .0211 .0076 −.0100 −.0009 −.0731 .0247 .0547 .7552 −.4671
.5423
.2339
.2015
1.0777
.5913
.6942
.2526
.1107
1.1025
−.0112
−.0106
−.0073
−.0596
.0091
−.1495
−.0420
−.0094
.1270
Abbreviations: CCAE, Truven MarketScan Commercial Claims and Encounters; ER, emergency room; MDCR, Truven MarketScan Medicare Supplemental; Optum, Optum Clinformatics; TIA, transient ischemic attack; Beta: coefficient from regularized regression (average shrinkage estimate). Condition era: a condition era is defined as a span of time when the person is assumed to have a given condition. Condition eras are chronological periods of condition occurrence records. A 30-day gap between records was used (http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:condition_era).
The sensitivity analysis showed that the performance of the models for ischemic stroke alone generally followed a similar pattern (Table 2). In addition, the models performed slightly better for ischemic stroke alone than for the composite of ischemic stroke or TIA, with the AUC difference between .02 and .03 for the same database and same observation period, particularly for the commercial population (CCAE and Optum databases). The regularized regression included a large number of baseline covariates in the PLP models. As compared with univariate models that included only CHADS 2 and CHA2DS2-VASc scores, the regularized regression improved the prediction accuracy (AUC) by a range of .05.10, which was generally more pronounced for patients in the CCAE and Optum databases than the MDCR database (Fig 2, A,B). The AUCs from the PLP models were statistically superior to the corresponding models based on CHADS2 and CHA2DS2-VASc scores (all P < .008). Using the end point of ischemic stroke and TIA with 1-year complete follow-up time as an example, the PLP model ended up with 110 predictors for CCAE, 101 predictors for Optum, and 228 predictors for MDCR. Fifteen predictors were identified in all databases (Table 3), in-
cluding CHADS2 scores. These additional factors can be broadly summarized into 3 categories: (1) general baseline health condition of the patient and resource utilization (e.g., number of distinct conditions observed, number of distinct drug ingredients observed, number of distinct procedures observed, number of visits); (2) risk scores, for example, Charlson index and CHADS2 scores; and (3) prior stroke and TIA. In addition, we identified several factors that were PLP model predictors for the CCAE and Optum databases, but not for MDCR (Table 4). Not surprisingly, the oldest age categories of 50-54, 55-59, and 6064 were identified as independent predictors for the CCAE and Optum databases. However, other predictors need to be interpreted with caution, particularly because those same factors showed directionally different impact on outcomes between CCAE and Optum databases.
Discussion Electronic medical records have been increasingly used as a valuable source for medical and scientific research. Although CHADS2 and CHA2DS2-VASc are the 2 most commonly used risk schemes for predicting stroke in pa-
ARTICLE IN PRESS RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS
7
A CCAE
OPTUM
MDCR
CCAE
OPTUM
MDCR
B
Figure 2. (A) The receiver operator curve (ROC) for various models across databases at 1-year complete follow-up time. (B) The ROC for various models across databases at 3-year complete follow-up time. The AUCs from the PLP models were statistically superior to the corresponding models based on CHADS2 and CHA2DS2-VASc scores (all P < .008). Abbreviations: AUC, area under the curve, also known as C-statistic; CCAE, Truven MarketScan Commercial Claims and Encounters; MDCR, Truven MarketScan Medicare Supplemental; Optum, Optum Clinformatics; PLP, patient-level prediction using a regularized regression; TIA, transient ischemic attack.
tients with AF, the performance of these models has not been comprehensively evaluated in patients without AF, which motivated us to conduct the current study. Using 3 large healthcare databases in the United States, our anal-
yses showed that these 2 models had generally modest to good performance in predicting the composite of ischemic stroke or TIA among patients without AF (with the AUC ranging from .68 to .74), with slightly better results
ARTICLE IN PRESS Z. YUAN ET AL.
8
Table 4. Common predictors for CCAE and Optum but not MDCR for the outcome of stroke and TIA looking back 365 days with associated betas
Description Age group: 50-54 Age group: 55-59 Age group: 60-64 Number of distinct observations observed in 365 days on or prior to cohort index Condition era record observed during anytime on or prior to cohort index: low back pain Condition era record observed during anytime on or prior to cohort index: pure hypercholesterolemia Procedure occurrence record observed during 365 days on or prior to cohort index: established patient office or other outpatient, visit typically 25 minutes Condition era record observed during anytime on or prior to cohort index: vaccination required Procedure occurrence record observed during 365 days on or prior to cohort index within procedure group: chem. metabolic function tests Procedure occurrence record observed during 365 days on or prior to cohort index within procedure group: surgical pathology procedure Drug era record observed during 365 days on or prior to cohort index within drug group: corticosteroids acting locally Number of ingredients within the drug group observed all time on or prior to cohort index: SEX HORMONES AND MODULATORS OF THE GENITAL SYSTEM Number of ingredients within the drug group observed all time on or prior to cohort index: ANTIBACTERIALS FOR SYSTEMIC USE Number of ingredients within the drug group observed all time on or prior to cohort index: COUGH AND COLD PREPARATIONS Number of ingredients within the drug group observed all time on or prior to cohort index: ANTI-INFLAMMATORY AND ANTIRHEUMATIC PRODUCTS Number of ingredients within the drug group observed all time on or prior to cohort index: PSYCHOANALEPTICS Procedure occurrence record observed during 365 days on or prior to cohort index within procedure group: imaging of brain
CCAE beta
MDCR beta
Optum beta
.5335 .7032 .7610 .0058 −.0000 .0247
— — — — — —
.4087 .4449 .8652 −.0009 .0020 .0288
.1173
—
.1251
−.1323 .2089
— —
−.2064 .0904
−.0000
—
.0810
.0000
—
.0878
−.0230
—
−.0424
.0295
—
−.0016
−.0980
—
−.1067
.0406
—
−.0603
.0645
—
−.0087
.0000
—
.0081
Abbreviations: CCAE, Truven MarketScan Commercial Claims and Encounters; MDCR, Truven MarketScan Medicare Supplemental; Optum, Optum Clinformatics; TIA, transient ischemic attack. Beta: coefficient from regularized regression (average shrinkage estimate). Condition era: a condition era is defined as a span of time when the person is assumed to have a given condition. Condition eras are chronological periods of condition occurrence records. A 30-day gap between records was used (http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:condition_era).
for the end point of ischemic stroke alone (with the AUC ranging from .70 to .77 and a general improvement of the AUC from .02 to .03), particularly for younger patient cohorts at 1 year of complete follow-up time. These findings appeared clinically intuitive and not surprising. As patients aged and were followed up longer, the baseline characteristics may have lost significance, whereas other health behaviors and (postbaseline) comorbidities may have become more prominent. Those results were robust and generally consistent across different patient cohorts and with various sensitivity measures. The risk schemes of CHADS2 and CHA2DS2-VASc were both developed for patients with AF. In essence, the CHADS2 score was the combination of 2 stroke prediction schemes developed earlier: the AFI and the SPAF III. As previously mentioned, Gage et al showed that the CHADS2 scheme (C-statistic of .82; 95% CI: .80-.84) had better prediction of stroke as compared with the AFI and
SPAF schemes (C-statistic of .68 [95% CI: .65-.71] and .74 [95% CI: .71-.76], respectively).8 In later work, using the combined clinical trial datasets from SPORTIF III and SPORTIF V (Stroke Prevention using ORal Thrombin Inhibitor in atrial Fibrillation), Lip et al developed and validated the CHA2DS2-VASc scheme.9 Although the model performance was relatively modest with a C-statistic of .65 (95% CI: .61-.68), this new risk scheme identified the greatest proportion of AF patients at high risk for stroke as compared with other risk stratification schemes. Currently, the treatment guidelines generally all recommend the use of CHA2DS2-VASc as the preferred risk scoring method to assess stroke risk in AF patients.25,26 The initial work associated with CHADS2 and CHA2DS2VASc schemes was compelling, but interestingly a recent systematic review and meta-analysis suggested that the performances of these schemes in patients with AF were heterogeneous and study population-dependent, with the
ARTICLE IN PRESS RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS
C-statistic ranging from .60 to .80 (median: .683) for CHADS2 and .64-.79 (median: .673) for CHA2DS2-VASc.27,28 Given these findings, our analyses suggested that these 2 existing risk schemes performed similarly in patients without AF and could be used in clinical practice to manage these patients, particularly given that the large majority of stroke events occur in patients without AF.15,16 Conceptually, clinicians could use these models to identify patients without AF who are at much higher risk for stroke, perhaps necessitating more aggressive therapies to manage underlying diseases (e.g., diabetes, hypertension, heart failure). In addition, clinicians can also inform high-risk patients to look out for early signs and symptoms of stroke, so as to allow potential early therapeutic intervention, if stroke occurs, to prevent the devastating consequences of an ischemic event. Based on the empirical data presented in this study, clinicians should also be aware that the models had better predictions of events at 1 year as opposed to 3 years, which is biologically intuitive. Of note, other researchers also investigated the factors that are associated with the risk of ischemic stroke. However, those studies may be limited by the use of community studies, a non-US population, or studies of a patient population with a particular condition.29-32 From that perspective, our study population tended to be more robust and representative of a general patient population without AF. In addition to examining the performance of the existing risk schemes, we also employed a regularized regression approach to explore a large number of baseline characteristics, with intention to improve the model performance and to identify additional risk factors that are not part of the current risk scores. Several findings are worth noting. First, a number of baseline characteristics were identified as risk factors across all databases. With the addition of those factors, the model performance was substantially improved, with the C-statistic approaching or above .8, particularly for younger patient populations (CCAE and Optum). Not surprisingly, these factors generally reflected poor health conditions of patients, although unlike the individual components of the current risk schemes, most of the additional factors might not be easily included in the risk calculation quantitatively, because a more comprehensive evaluation of the patient’s medical history may be required. Nevertheless, clinicians may still take those factors into consideration when evaluating the risk of ischemic stroke and TIA for patients without AF. In addition, even with CHADS2 scores in the model, history of ischemic stroke and TIA was identified as an independent risk factor, suggesting that this parameter might need to be assigned more than 2 points in the original risk scheme. Finally, our results also suggested that the PLP models generally performed better for ischemic stroke alone than for the composite of ischemic stroke and TIA. This could be due, in part, to the
9
differences in how stroke and TIA are coded in electronic healthcare records. For instance, stroke is a severe clinical event and often requires an intensive work-up prior to a diagnosis being recorded, for example, “ICD9 434.1-cerebral embolism,” which falls under ischemic stroke code sets. In contrast, the condition of TIA may be less specific (e.g., “ICD9 435.3-Vertebrobasilar artery syndrome”), which may or may not be associated with some thrombotic risk factors. However, from the perspective of prevention, there is value in predicting TIA early, as it might help manage underlying diseases more aggressively and prevent more severe events from occurring (e.g., cerebral infarction). Our study has limitations and our results should be interpreted in that context. The CCAE and Optum reflect privately insured populations (mainly for patients <65 years of age), whereas MDCR reflects privately insured patients with supplemental Medicare coverage (mainly >65 years of age). These databases capture healthcare claims that are gathered primarily for reimbursement purposes rather than to answer a particular medical or scientific question. Therefore, the data suffer from limitations that are similar to those in other claims databases (e.g., coding practice, accuracy, and completeness of the data reported). However, previous studies have suggested that clinically important end points, such as stroke, myocardial infarction, and cancer, can reliably be identified through those electronic healthcare records.33,34 Although our primary analysis focused on the composite of ischemic stroke and TIA, where TIA may be considered to be a soft end point, the results from the sensitivity analysis that focused on ischemic stroke alone were largely consistent with the primary findings, demonstrating the robustness of the primary results. It is encouraging that our regularized regression has identified a number of baseline characteristics associated with thrombotic risk and improved the performance of existing risk schemes substantially. Importantly, however, the actual utility of these models has not been tested in clinical practice. Given the large sample sizes included in our study, some risk factors (although statistically significant) need to be interpreted in the context of clinical importance, and biological mechanisms for these risk factors in relation to stroke (e.g., a risk factor was a predictor for one database but not the other) warrant further investigation. The existing schemes (CHADS2 and CHA2DS2-VASc scores) for stroke prediction in patients with AF can be applied to patients without AF with similar predictive accuracy. Our results also suggest that these models can be improved upon, but further research is required to validate our findings. Because the majority of stroke events occur in patients without AF, our findings highlight an important clinical question regarding how patients who are at high risk for ischemic stroke and TIA could be managed more effectively in clinical practice.
ARTICLE IN PRESS Z. YUAN ET AL.
10 Acknowledgment: The authors would like to thank Jesse Berlin, ScD, Senior Vice President and Global Head of Epidemiology, Johnson & Johnson, for his critical review of the manuscript.
Appendix: Supplementary Material Supplementary data to this article can be found online at doi:10.1016/j.jstrokecerebrovasdis.2017.03.036.
References 1. Centers for Disease Control and Prevention. Stroke facts. Available at: http://www.cdc.gov/stroke/facts.htm. Accessed March 17, 2016. 2. Wolf PA, Abbott RD, Kannel WB. Atrial fibrillation as an independent risk factor for stroke: the Framingham Study. Stroke 1991;22:983-988. 3. Hart RG, Halperin JL. Atrial fibrillation and stroke: concepts and controversies. Stroke 2001;32:803-808. 4. Go AS, Hylek EM, Phillips KA, et al. Prevalence of diagnosed atrial fibrillation in adults: national implications for rhythm management and stroke prevention: the AnTicoagulation and Risk Factors in Atrial Fibrillation (ATRIA) Study. JAMA 2001;285:2370-2375. 5. Pearce LA, Hart RG, Halperin JL. Assessment of three schemes for stratifying stroke risk in patients with nonvalvular atrial fibrillation. Am J Med 2000;109:45-51. 6. Atrial Fibrillation Investigators. Risk factors for stroke and efficacy of antithrombotic therapy in atrial fibrillation: analysis of pooled data from five randomized clinical trials. Arch Intern Med 1994;154:1949-1957. 7. The SPAF III Writing Committee for the Stroke Prevention in Atrial Fibrillation Investigators. Patients with nonvalvular atrial fibrillation at low-risk of stroke during treatment with aspirin. JAMA 1998;279:1273-1277. 8. Gage BF, Waterman AD, Shannon W, et al. Validation of clinical classification schemes for predicting stroke: results from the National Registry of Atrial Fibrillation. JAMA 2001;285:2864-2870. 9. Lip GYH, Nieuwlaat R, Pisters R, et al. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the Euro Heart Survey on Atrial Fibrillation. Chest 2010;137:263-272. 10. Camm AJ, Lip GY, De Caterina R, et al. 2012 focused update of the ESC Guidelines for the management of atrial fibrillation: an update of the 2010 ESC Guidelines for the management of atrial fibrillation. Eur Heart J 2012;33:2719-2747. 11. January CT, Wann LS, Alpert JS, et al. 2014 AHA/ACC/ HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society. Circulation 2014;130:2071-2104. 12. National Institute for Health and Care Excellence. Atrial fibrillation: clinical guidelines 2014. Available at: http://www.nice.org.uk/guidance/cg180/evidence/atrial -fibrillation-update-full-guideline-243739981. Accessed December 12, 2015. 13. Reiffel JA. Atrial fibrillation and stroke: epidemiology. Am J Med 2014;127:e15-e16. doi:10.1016/ j.amjmed.2013.06.002.
14. Yuan Z, Makadia R, Ryan P, et al. Incidence of ischemic stroke or transient ischemic attack in patients with multiple risk factors with or without atrial fibrillation: a retrospective cohort study. Curr Med Res Opin 2015;31:1257-1266. 15. Hughes M, Lip GY. Stroke and thromboembolism in atrial fibrillation: a systematic review of stroke risk factors, risk stratification schema and cost effectiveness data. Thromb Haemost 2008;99:295-304. 16. Bunch TJ, May HT, Bair TL, et al. Atrial fibrillation ablation patients have longterm stroke rates similar to patients without atrial fibrillation regardless of CHADS2 score. Heart Rhythm 2013;10:1272-1277. 17. Thigpen JL, Dillon C, Forster KB, et al. Validity of international classification of disease codes to identify ischemic stroke and intracranial hemorrhage among individuals with associated diagnosis of atrial fibrillation. Circ Cardiovasc Qual Outcomes 2015;8:8-14. 18. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36. 19. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007;115:928-935. 20. Suchard MA, Simpson SE, Zorych I, et al. Massive parallelization of serial inference algorithms for a complex generalized linear model. ACM Trans Model Comput Simul 2013;23:1-17. 21. R Foundation for Statistical Computing. R: a language and environment for statistical computing. Vienna, Austria: R Core Team, 2015. 22. Schuemie MJ, Suchard MA, Ryan PB, et al. Package “PatientLevelPrediction”. 1.1.0, 2015. Available at: https://github.com/OHDSI/PatientLevelPrediction. Accessed March 17, 2016. 23. Voss EA, Ma Q, Ryan PB. The impact of standardizing the definition of visits on the consistency of multi-database observational health research. BMC Med Res Methodol 2015;15:13. 24. Voss EA, Makadia R, Matcho A, et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc 2015;22:553-564. 25. Durrant J, Lip GYH, Lane DA. Stroke risk stratification scores in atrial fibrillation: current recommendations for clinical practice and future perspectives. Expert Rev Cardiovasc Ther 2013;11:77-90. 26. Chao T, Liu C, Wang K, et al. Should atrial fibrillation patients with 1 additional risk factor of the CHA2DS2Vasc score (beyond sex) receive oral anticoagulation? J Am Coll Cardiol 2015;65:635-642. 27. Keogh C, Wallace E, Dillon C, et al. Validation of the CHADS2 clinical prediction rule to predict ischaemic stroke: a systematic review and meta-analysis. Thromb Haemost 2011;106:528-538. 28. Chen JY, Zhang AD, Lu HY, et al. CHADS2 versus CHA2DS2-VASc score in assessing the stroke and thromboembolism risk stratification in patients with atrial fibrillation: a systematic review and meta-analysis. J Geriatr Cardiol 2013;10:258-266. 29. Ohira T, Shahar E, Chambless LE, et al. Risk factors for ischemic stroke subtypes: the Atherosclerosis Risk in Communities study. Stroke 2006;37:2493-2498. 30. Lip GY, Lin HJ, Chien KL, et al. Comparative assessment of published atrial fibrillation stroke risk stratification schemes for predicting stroke, in a nonatrial fibrillation population: the Chin-Shan Community Cohort Study. Int J Cardiol 2013;168:414-419.
ARTICLE IN PRESS RISK PREDICTION FOR STROKE AND TIA IN NON-AF PATIENTS 31. Welles CC, Whooley MA, Na B, et al. The CHADS2 score predicts ischemic stroke in the absence of atrial fibrillation among subjects with coronary heart disease: data from the Heart and Soul Study. Am Heart J 2011;162:555561. 32. Ntaios G, Lip GY, Makaritsis K, et al. CHADS(2), CHA(2)S(2)DS(2)-VASc, and long-term stroke outcome in patients without atrial fibrillation. Neurology 2013;80:1009-1017.
11
33. Fisher ES, Whaley FS, Krushat WM, et al. The accuracy of Medicare’s hospital claims data: progress has been made, but problems remain. Am J Public Health 1992;82:243-248. 34. Wahl PM, Rodgers K, Schneeweiss S, et al. Validation of claims-based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially-insured population. Pharmacoepidemiol Drug Saf 2010;19:596-603.