Risk Adjustment in the American College of Surgeons National Surgical Quality Improvement Program: A Comparison of Logistic Versus Hierarchical Modeling

Risk Adjustment in the American College of Surgeons National Surgical Quality Improvement Program: A Comparison of Logistic Versus Hierarchical Modeling

ORIGINAL SCIENTIFIC ARTICLES Risk Adjustment in the American College of Surgeons National Surgical Quality Improvement Program: A Comparison of Logis...

519KB Sizes 8 Downloads 106 Views

ORIGINAL SCIENTIFIC ARTICLES

Risk Adjustment in the American College of Surgeons National Surgical Quality Improvement Program: A Comparison of Logistic Versus Hierarchical Modeling Mark E Cohen, PhD, Justin B Dimick, MD, MPH, Karl Y Bilimoria, MD, MS, Clifford Y Ko, MD, MS, MSHS, FACS, Karen Richards, Bruce Lee Hall, MD, PhD, MBA, FACS Although logistic regression has commonly been used to adjust for risk differences in patient and case mix to permit quality comparisons across hospitals, hierarchical modeling has been advocated as the preferred methodology, because it accounts for clustering of patients within hospitals. It is unclear whether hierarchical models would yield important differences in quality assessments compared with logistic models when applied to American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) data. Our objective was to evaluate differences in logistic versus hierarchical modeling for identifying hospitals with outlying outcomes in the ACS-NSQIP. STUDY DESIGN: Data from ACS-NSQIP patients who underwent colorectal operations in 2008 at hospitals that reported at least 100 operations were used to generate logistic and hierarchical prediction models for 30-day morbidity and mortality. Differences in risk-adjusted performance (ratio of observed-to-expected events) and outlier detections from the two models were compared. RESULTS: Logistic and hierarchical models identified the same 25 hospitals as morbidity outliers (14 low and 11 high outliers), but the hierarchical model identified 2 additional high outliers. Both models identified the same eight hospitals as mortality outliers (five low and three high outliers). The values of observed-to-expected events ratios and p values from the two models were highly correlated. Results were similar when data were permitted from hospitals providing ⬍ 100 patients. CONCLUSIONS: When applied to ACS-NSQIP data, logistic and hierarchical models provided nearly identical results with respect to identification of hospitals’ observed-to-expected events ratio outliers. As hierarchical models are prone to implementation problems, logistic regression will remain an accurate and efficient method for performing risk adjustment of hospital quality comparisons. (J Am Coll Surg 2009;209:687–693. © 2009 by the American College of Surgeons) BACKGROUND:

paring adverse outcomes across hospitals, event rates are adjusted by the surgical risk profile of patients treated at each hospital and with consideration of differences in case types performed. The primary statistic used to compare hospital quality in ACS-NSQIP is the ratio of observed events to expected events (O/E).2,3 Observed events are the actual number of patients who had at least one complication or who died, and expected events are the expected

The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) collects robust patient-level data on preoperative risk factors and postoperative morbidity and mortality outcomes to assess surgical quality at ⬎ 200 hospitals.1 For purposes of comDisclosure Information: Nothing to disclose. American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) Disclaimer: The ACS-NSQIP and the hospitals participating in the ACS-NSQIP are the source of the data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors. This article represents the personal viewpoint of the authors and cannot be construed as a statement of official ACS or ACS-NSQIP policy.

Surgery, University of Michigan, Ann Arbor, MI (Dimick); Department of Surgery, Feinberg School of Medicine, Northwestern University, Chicago, IL (Bilimoria); Department of Surgery, University of California, Los Angeles, and Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, CA (Ko); and Department of Surgery, John Cochran Veterans Affairs Medical Center, Department of Surgery, School of Medicine, Washington University in St Louis, and Barnes Jewish Hospital, Center for Health Policy, and the Olin Business School at Washington University in St Louis, St Louis, MO (Hall). Correspondence address: Mark E Cohen, PhD, American College of Surgeons, 633 N Saint Clair St, 22nd Fl, Chicago, IL 60611-3211. email: [email protected]

Received March 20, 2009; Revised August 12, 2009; Accepted August 19, 2009. From the Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL (Cohen, Bilimoria, Ko, Richards); Michigan Surgical Collaborative for Outcomes Research and Evaluation (M-SCORE), Department of

© 2009 by the American College of Surgeons Published by Elsevier Inc.

687

ISSN 1072-7515/09/$36.00 doi:10.1016/j.jamcollsurg.2009.08.020

688

Cohen et al

Comparison of Logistic and Hierarchical Modeling

Abbreviations and Acronyms

ACS BLUP NSQIP O/E

⫽ ⫽ ⫽ ⫽

American College of Surgeons best linear unbiased predictor National Surgical Quality Improvement Program observed events-to-expected events ratio

J Am Coll Surg

hierarchical modeling for morbidity and mortality after colorectal operations. Colorectal operations were selected for study because they are common procedures, with nontrivial rates of mortality and postoperative complications for which predictive models have been reasonably successful.

METHODS number of such events, statistically estimated based on risk profiles for patients and the procedures performed at each hospital. Values of O/E that are significantly different from 1.0 identify hospitals that are doing better (if O/E is significantly ⬍ 1.0; a low outlier) or worse (if O/E is significantly ⬎ 1.0; a high outlier) than expected based on their case mix. Depending on outcomes, ACS-NSQIP typically uses p value criteria of 0.10, 0.05, or 0.01. Statistically significant differences are those where the confidence interval for the O/E ratio does not overlap 1.0. ACS-NSQIP uses logistic regression to estimate O/E ratios for each hospital. Logistic regression has well-documented advantages for this purpose, including being a robust methodology that can incorporate many predictors using efficient, welldocumented, and well-maintained statistical software; being well-understood and widely accepted within the health research community; yielding results that are superior or equivalent to most alternative methods; being easily revised and associated with report-generating software so that analyses can be routinely reassessed and documents updated within tight production schedules; and producing information that is conceptually accessible to a wide range of different users.4 Hierarchical models have been advocated as a superior methodology for identifying hospital outliers and might be appropriate for ACS-NSQIP.5-13 An important advantage of hierarchical models is their capacity to adjust for the fact that patients are not selected as independent observations but are clustered or nested within hospitals. The adjustment for this lack of independence theoretically results in more accurate performance estimates, and a frequent associated result is that fewer statistical outliers are identified. The magnitude of these effects will depend on features unique to each dataset, including homogeneity among hospitals (eg, ACS-NSQIP versus Veterans Affairs NSQIP versus all hospitals) and the level of clustering or lack of data independence (often investigated as “intraclass correlation”). The importance of these cluster effects will depend on programmatic goals. It is unknown whether hierarchical models will produce important differences in hospital quality assessments compared with logistic models in ACSNSQIP. The objective of this study is to evaluate the magnitude and form of differences in estimated O/E ratios and p values using logistic versus one specific implementation of

Data acquisition and patient selection

The developmental history and current details of ACSNSQIP, including sampling strategy, data abstraction procedures, variables collected, outcomes, and structure, are well described elsewhere.1,14,15 In brief, the program collects detailed data on patient demographics, preoperative risk factors and laboratory values, operative variables, and postoperative events using standardized definitions. From the ACS-NSQIP database for January 1, 2008, through December 31, 2008, patients 16 years of age or older who underwent a major colorectal operation were identified using Current Procedural Terminology codes, and these data were used for model development. Preexisting blinded data were used. This work was not based on any data from the Veterans Affairs NSQIP. Preoperative variables

A set of predictive variables, historically useful in modeling outcomes after colorectal operations in the ACS-NSQIP, was constructed from ACS-NSQIP data fields. For purposes of simplicity and consistency with routine ACSNSQIP modeling, variables were made appropriately categorical and, to preclude empty cells that would adversely affect fitting algorithms, selected categories were collapsed. Patient demographic variables of age (younger than 65, 65 to 74, 75 to 84, 85⫹ years), gender, and lifestyle factors of smoking status (within 1 year of operation) and alcohol consumption (more than 2 drinks per day for 2 weeks before admission) were used. General factors considered were American Society of Anesthesiologists class (I/II, normal healthy/mild systemic disease; III, severe systemic disease; IV/V, severe systemic disease that is a constant threat to life/moribund), preoperative functional status (independent, partially dependent, totally dependent), dyspnea (none, moderate exertion, at rest), and body mass index (normal, ⬍ 18.5 to ⱕ 25; underweight, ⱕ 18.5; overweight, ⬍ 25 to ⱕ 30; 3 levels of obese: ⬍ 30 to ⱕ 35, ⬍ 35 to ⱕ 40, and ⬍ 40). Comorbidities included ventilator dependence, sepsis (eg, systemic inflammatory response syndrome, sepsis, septic shock), a history of COPD, hypertension requiring medication, current pneumonia, ascites, coronary heart disease (includes angina, myocardial infarction within 30 days before operation, percutaneous cardiac intervention, or cardiac artery bypass operation), periph-

Vol. 209, No. 6, December 2009

Cohen et al

eral vascular disease (includes revascularization for peripheral vascular disease, claudication, rest pain, amputation, or gangrene), neurologic event or disease (includes stroke with or without residual deficit, transient ischemic attack, hemiplegia, paraplegia, quadriplegia, or impaired sensation), diabetes (oral medication or insulin-dependent), disseminated cancer, steroid use, weight loss (⬎ 10% in last 6 months), bleeding disorders, and current chemotherapy or radiotherapy. Current Procedural Terminology codes were used to categorize surgical procedure by extent and type (eg, abdominoperineal resection, partial colectomy, total colectomy, total proctocolectomy, proctectomy). Operations were also categorized according to indication, using the first entered ICD-9 code for the case, ie, neoplasm (benign or malignant), diverticulitis, colitis or enteritis, obstruction or perforation, hemorrhage, rectal prolapse, vascular insufficiency, intestinal volvulus, and other. Operations were also described by wound class (clean, clean/contaminated versus contaminated, dirty/infected) and whether emergent. Preoperative laboratory variables examined included sodium, albumin, blood urea nitrogen, creatinine, hematocrit, platelet count, WBC, partial thromboplastin time, and prothrombin. Values were categorized using ACSNSQIP definitions of normal and abnormal, and missing data constituted a third categorical level, an indicator variable.2 It should be noted that except for laboratory values (derived from tests that might or might not be ordered, depending on each patient’s particular clinical situation), missing values are rare among ACS-NSQIP predictors. Outcomes

Outcomes were assessed at 30 days regardless of whether the patient was discharged, remained hospitalized, or was admitted to a different institution. Outcomes of interest were 30-day morbidity and mortality. Morbidity included superficial surgical site infection (without preoperative wound infection), deep incisional surgical site infection (without preoperative wound infection), organ space surgical site infection (without preoperative would infection), pneumonia (without preoperative pneumonia), unplanned intubation (without preoperative ventilator dependence), progressive renal insufficiency (without preoperative renal failure or dialysis), urinary tract infection, deep venous thrombosis, CVA or stroke, myocardial infarction, cardiac arrest requiring cardiopulmonary resuscitation, pulmonary embolism, ventilator dependence ⬎ 48 hours (without preoperative ventilator dependence), acute renal failure (without preoperative renal failure or dialysis), bleeding complication defined by transfusions ⬎ 4 U, and sepsis or septic shock (if preoperative sepsis exists, it must worsen postoperatively by NSQIP criteria).

Comparison of Logistic and Hierarchical Modeling

689

Statistical analysis

Only data from hospitals that provided at least 100 colorectal operations were included in the primary analysis. As ACS-NSQIP guidelines often equate to a roughly 20% sampling protocol, this requirement limits participation to hospitals that perform ⬎ 500 colorectal operations per year. Because differences in logistic and hierarchical models might be influenced by the number of operations reported from each hospital, we also explored results when hospitals with ⬍ 100 reported cases were permitted in the analysis. Forward stepwise (with inclusion criterion set at p ⬍ 0.05) logistic regression models were constructed for mortality and morbidity. For each outcome, variables selected for inclusion in the logistic model were then included in a random intercepts, fixed slopes hierarchical model, using the adaptive quadrature likelihood approximation method in SAS PROC GLIMMIX (SAS Institute).16-18 Patient-level predicted probabilities were estimated using only the fixed portion of the model and were invoked by specifying the NOBLUP SAS option. (If random [hospital effects] had been included in predicted probabilities [BLUP, best linear unbiased predictor], the expected value of every hospital O/E would be 1.0, although statistical noise would cause variation about that point. If the intent had been to estimate a standardized incidence rate for each hospital rather than an O/E ratio, then the BLUP estimate would be appropriate.) Regression equations yield expected event probabilities for individual patients, and the sum of these probabilities for patients at each hospital is E in the hospital O/E ratio. p Values for these ratios were then computed using an exact procedure.19 A 95% CI was used for detecting morbidity outliers and a 90% CI for mortality outliers. Different CI criteria were used, because differences in sample sizes and event rates affect the number of outliers. Adjusting CI ranges to influence the number of outliers detected is a routine ACS-NSQIP practice so that models provide a useful number of outliers for qualityimprovement purposes. Pearson correlation coefficients were computed for O/E ratios and p values derived from logistic versus hierarchical models, and a linear regression equation was defined for predicting the hierarchical model O/E p value from the logistic model O/E p value. All data manipulation and analysis were done with SAS version 9.2.

RESULTS Of 211 hospitals in the 2008 dataset, 108 met the criterion of having at least 100 qualifying colorectal operations. These 108 hospitals yielded 17,050 colorectal operation patients (Table 1). Overall 30-day morbidity rate (patients having at least 1 complication) was 22.1% (3,765 patients), and 30-day mortality rate was 3.6% (613 patients).

690

Cohen et al

Comparison of Logistic and Hierarchical Modeling

Table 1. Characteristics of 17,050 Patients Who Underwent Colorectal Operations in 2008 from 108 Hospitals That Had at Least 100 Operations Variables

Age* (y) Younger than 65 65⫺74 75⫺84 Older than 84 Gender Female Male Surgical extent Partial colectomy Proctectomy Total colectomy Total proctocolectomy Abdominoperineal resection Indication Neoplasm Diverticulitis Enteritis/colitis Obstruction/perforation Rectal prolapse Vascular insufficiency Intestinal volvulus Hemorrhage Other ASA I/II: Normal healthy/mild systemic disease III: Severe systemic disease IV/V: Severe systemic disease/ moribund FHS 1: Independent 2: Partially dependent 3: Totally dependent

n

%

9,183 3,650 3,061 1,156

53.9 21.4 18.0 6.8

8,894 8,156

52.2 47.8

14,028 1,353 953 492 224

82.3 7.9 5.6 2.9 1.3

8,122 2,705 1,439 891 485 363 260 229 2,556

47.6 15.9 8.4 5.2 2.8 2.1 1.5 1.3 15.0

8,381 7,146

49.2 41.2

1,523

8.9

15,480 1,036 534

90.8 6.1 3.1

*Mean ⫾ SD ⫽ 61.8 ⫾ 16.1 years. ASA, American Society of Anesthesiologists; FHS, Functional health status.

Morbidity

Stepwise logistic regression yielded a 19-variable model for mortality (Hosmer-Lemeshow ⫽ 0.001; c ⫽ 0.669), which was, in order of selection: American Society of Anesthesiologists classification, indication, wound class, surgical extent, COPD, weight loss, body mass index class, functional status, prothrombin, alcohol use, emergent condition, smoking, steroid use, sepsis, creatinine, disseminated cancer, albumin, pneumonia, and platelet count. Except for platelet count (p ⫽ 0.077), all 18 remaining variables were statistically significant in the hierarchical model (at p ⬍ 0.05). O/E ratios from logistic and hierarchical models

J Am Coll Surg

were very close (r ⬎ 0.999, p ⬍ 0.001), and the same 14 hospitals with low O/E values and 11 hospitals with high O/E values were identified as statistical outliers (Fig. 1A). The hierarchical model identified 2 additional high outliers that were not identified by the logistic model. p Values were highly correlated (r ⫽ 0.991, p ⬍ 0.001), and the slope was very close to 1.0 (Fig. 2A). Mortality

Stepwise logistic regression yielded a 16-variable model for mortality (Hosmer-Lemeshow ⫽ 0.150; c ⫽ 0.926), which was, in order of selection: functional status, American Society of Anesthesiologists classification, sepsis, age group, albumin, creatinine, disseminated cancer, platelet count, dyspnea, ascites, indication, weight loss, white blood count, COPD, peripheral vascular disease, and blood urea nitrogen. All 16 variables were statistically significant in the hierarchical model (at p ⬍ 0.05). O/E ratios from logistic and hierarchical models were very close (r ⬎ 0.999; p ⬍ 0.001), and the same 5 hospitals with low O/E values and 3 hospitals with high O/E values were identified as statistical outliers (Fig. 1B). p Values were highly correlated (r ⫽ 0.998; p ⬍ 0.001), and the slope was very close to 1.0 (Fig. 2B). Patients per hospital

Substantial similarity in results from logistic and hierarchical models remained, when hospitals with ⬍ 100 patients were permitted in the analysis. Correlations between O/E values and p values from logistic versus hierarchical models were very high, and near-perfect consistency in outliers identified was maintained, when patients from hospitals with as few as 50 patients (159 hospitals) or 25 patients (188 hospitals) were permitted in the dataset. For morbidity among the 22,097 patients in 188 hospitals that provided at least 25 patients, the correlation between logistic and hierarchical O/E values was ⬎ 0.999, and the correlation between p values was 0.994; there were 18 low and 17 high outliers in common, and the hierarchical model identified 1 additional low outlier and 2 additional high outliers. For mortality, the correlation between logistic and hierarchical O/E values was ⬎ 0.999, and the correlation between p values was 0.997; the same 5 low and 7 high outliers were identified in each model.

DISCUSSION Although logistic regression has often been used to adjust risk for differences in hospital case mix, hierarchical modeling is frequently recommended as an approach that provides more accurate estimates of performance.5-13 Hierarchical model applies to a class of methods that adjusts

Vol. 209, No. 6, December 2009

Cohen et al

Comparison of Logistic and Hierarchical Modeling

691

Figure 1. Observed events-to-expected events (O/E) ratios for (A) morbidity and (B) mortality derived from logistic and hierarchical models. Small gray circles identify sites that are not outliers, large open circles sites that are outliers, and half white and half black circles sites that are outliers only for the hierarchical model. O/E values derived from logistic and hierarchical models are highly consistent.

estimates for lack of independence among data. Patients treated by specific surgeons, or at specific hospitals, might be more likely to share similar characteristics (including outcomes) than patients not in these cohorts. If these intraclass correlations are ignored, as they are in logistic regression, and if they are nontrivial, then statistical error will be underestimated. A possible end result of underestimation is that logistic regression could yield p values that are improperly small and could identify some hospitals as statistical outliers when actually they are not. We compared logistic regression with hierarchical modeling for quality assessment of ACS-NSQIP hospitals with regard to colorectal operation morbidity and mortality outcomes. O/E ratios, p values for those ratios (with respect to the null hypothesis that O/E ⫽ 1.0), and identified hospital outliers were nearly identical for the 2 approaches. Despite hierarchical models having an advantage in accounting for lack of independence because of patients being nested in hospitals, we did not observe fewer outliers with this implementation of hierarchical modeling, as has been reported in other studies.5,7-10,20 The ACS-NSQIP Semi-Annual Report provides member hospitals with performance data on ⬎ 40 outcomes (or

outcomes for specific surgical groups), with plans for many additional outcomes to be added over time.2 In selecting a risk-adjustment methodology it is necessary to consider whether distributional assumptions are met for all outcomes or for only certain outcomes; whether the methodology is computationally robust so that it yields results for sample sizes and event rates encountered and for typical predictor sets; whether available software for the methodology incorporates procedures to streamline model development (eg, stepwise selection of variables, generation of c-statistics); whether use of increasingly complex methodologies, or more than one type of methodology, might compromise certain values of consistency, transparency, and conceptual grasp by end users; and how the methodology might contribute to motivating quality improvement. It is unlikely that a single approach will be superior on all dimensions. We have not identified any functional advantage of this particular implementation of hierarchical modeling when applied to this ACS-NSQIP dataset for quality-improvement purposes. On the contrary, although logistic modeling was shown to yield nearly identical O/E ratios and p values as hierarchical models and to identify either the same number of

692

Cohen et al

Comparison of Logistic and Hierarchical Modeling

J Am Coll Surg

ance and public reporting), statistical rigor might be essential regardless of the actual magnitude of the associated cost-to-benefit ratio for those efforts. For qualityimprovement purposes, nearly identical results for logistic models would seem to make them quite appropriate. Limitations

Figure 2. Scatter plot of p values and regression lines for (A) morbidity and (B) mortality derived from logistic and hierarchical models. The horizontal line at y ⫽ 0.025 and the vertical line at x ⫽ 0.025 (A) denote significance with a 95% CI. The horizontal line at y ⫽ 0.05 and the vertical line at x ⫽ 0.05 (B) denote significance with a 90% CI. p Values are highly correlated, with intercept close to 0.0 and slope close to 1.0.

outlier hospitals or sometimes slightly fewer, this is accomplished using a well-understood, robust methodology that is more fully featured than most implementations of hierarchical modeling. In certain contexts (eg, pay for perform-

The hierarchical model studied here (a random intercepts model) is only one of many forms of hierarchical models, including random intercepts, random slopes models, and Bayesian models.17,21-27 Each of these hierarchical methods and logistic regression might or might not yield different results (depending on features of the particular dataset and outcomes studied),28-30 and different findings might or might not be equally valid, depending on whether assumptions implicit in each model approach are met.7,29,31,32 Bayesian methods, incorporating a reliability adjustment, will almost certainly (except in the case of large samples for all hospitals) result in fewer statistical outliers than those identified in routine logistic regression.33 The issue of reliability adjustment is distinct from that of accurate risk adjustment and merits a separate discussion. The demonstrated absence of an effect under one scenario does not prove that the effect will be absent in all situations; this is contradicted by other evidence. Certainly, the assumption that observations within an institution are independent is theoretically incorrect, but it is also true that intraclass correlation can be low. We argue that the impact of multilevel modeling is not always substantive and must be considered in the context of the task at hand. Our comparison is based on data for a single group of surgical procedures and for two outcomes: one with relatively high incidence (overall morbidity) and one with low incidence (mortality). It is not unreasonable that this effect would be consistent across subgroups of ACS-NSQIP data. In conclusion, logistic regression is a simple, wellunderstood methodology that provides risk-adjusted O/E ratios and CIs that can identify well and poorly performing hospitals. The analytic output is conceptually accessible by a wide range of users for a spectrum of functions. Although hierarchical modeling might, theoretically, provide more correct estimates of performance and has been reported in other studies to identify fewer hospitals as statistical outliers, this has not been shown to be the case here. Because an aim of the ACS-NSQIP is to extend evaluation to smaller hospitals and to target specific operations, there will be a need to develop unique prediction models using smaller sample sizes. Hierarchical modeling software often lacks variable selection methodology and can present greater convergence problems under some anticipated modeling conditions. Although accuracy is one attribute in assessing statistical methods, the best approach will also depend on

Vol. 209, No. 6, December 2009

Cohen et al

functionality and programmatic objectives. For reasons of complexity, accessibility, feasibility, and analytic goals, logistic regression might remain a viable alternative to hierarchical modeling for certain applications, including analysis of ACS-NSQIP data. Author Contributions Study conception and design: Cohen, Bilimoria, Ko, Hall Acquisition of data: Cohen, Bilimoria Analysis and interpretation of data: Cohen, Dimick, Bilimoria, Ko, Hall Drafting of manuscript: Cohen, Bilimoria Critical revision: Dimick, Ko, Hall REFERENCES 1. Khuri SF, Henderson WG, Daley J, et al. Successful implementation of the Department of Veterans Affairs’ National Surgical Quality Improvement Program in the private sector: the Patient Safety in Surgery study. Ann Surg 2008;248:329–336. 2. American College of Surgeons National Surgical Quality Improvement Program. Semi-annual report. Chicago, IL: American College of Surgeons; 2008. 3. Iezzoni LI, ed. Risk adjustment for measuring health care outcomes. 3rd ed. Chicago: Health Administration Press; 2003. 4. Shaughnessy PW, Hittle DF. Overview of risk adjustment and outcome measures for Home Health Agency OBQI reports: highlights of current approaches and outline of planned enhancements. Denver, CO: Center for Health Services Research, University of Colorado Health Sciences Center; 2002. 5. Austin PC, Tu JV, Alter DA. Comparing hierarchical modeling with traditional logistic regression analysis among patients hospitalized with acute myocardial infarction: should we be analyzing cardiovascular outcomes data differently? Am Heart J 2003;145:27–35. 6. Birkmeyer JD, Shahian DM, Dimick JB, et al. Blueprint for a new American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg 2008;207:777–782. 7. Christiansen CL, Morris CN. Improving the statistical approach to health care provider profiling. Ann Intern Med 1997;127:764–768. 8. Shahian DM, Normand SL, Torchiana DF, et al. Cardiac surgery report cards: comprehensive review and statistical critique. Ann Thorac Surg 2001;72:2155–2168. 9. Shahian DM, Torchiana DF, Shemin RJ, et al. Massachusetts cardiac surgery report card: implications of statistical methodology. Ann Thorac Surg 2005;80:2106–2113. 10. DeLong E. Hierarchical modeling: its time has come. Am Heart J 2003;145:16–18. 11. D’Errigo P, Tosti ME, Fusco D, et al. Use of hierarchical models to evaluate performance of cardiac surgery centres in the Italian CABG outcome study. BMC Med Res Methodol 2007;7:29. 12. Krumholz HM, Brindis RG, Brush JE, et al. Standards for statistical models used for public reporting of health outcomes: an American Heart Association Scientific Statement from the Quality of Care and Outcomes Research Interdisciplinary Writing Group: cosponsored by the Council on Epidemiology and Prevention and the Stroke Council. Endorsed by the American College of Cardiology Foundation. Circulation 2006;113:456–462. 13. Shahian DM, Blackstone EH, Edwards FH, et al. Cardiac surgery risk models: a position article. Ann Thorac Surg 2004;78: 1868–1877.

Comparison of Logistic and Hierarchical Modeling

693

14. ACS-NSQIP program specifics. ACS NSQIP data: participant use data file. Available at: http://acsnsqip.org/pug/pufrequesthomepage. aspx. Accessed January 26, 2009. 15. Khuri SF. The NSQIP: a new frontier in surgery. Surgery 2005; 138:837–843. 16. Dai J, Li Z, Rocke D. Hierarchical logistic regression modeling with SAS GLIMMIX. Available at: http://www.lexjansen.com/ wuss/2006/Analytics/ANL-Dai.pdf. Accessed September 17, 2009. 17. Houchens R, Chu B, Steiner C. Hierachical modeling using HCUP data. Rockville, MD: US Agency for Healthcare and Research Quality; 2007. 18. SAS Institute. The GLIMMIX Procedure, June 2006. Available at: http://support.sas.com/rnd/app/papers/glimmix.pdf. Accessed September 17, 2009. 19. Sun J, Ono Y, Takebuchi Y. A simple method for calculating the exact confidence interval of the standardized mortality ratio with an SAS function. J Occup Health 1996;38:196–197. 20. Peterson ED, DeLong ER, Muhlbaier LH, et al. Challenges in comparing risk-adjusted bypass surgery mortality results: results from the Cooperative Cardiovascular Project. J Am Coll Cardiol 2000;36:2174–2184. 21. Austin PC. A comparison of Bayesian methods for profiling hospital performance. Med Decis Making 2002;22:163– 172. 22. Austin PC, Naylor CD, Tu JV. A comparison of a Bayesian vs. a frequentist method for profiling hospital performance. J Eval Clin Pract 2001;7:35–45. 23. DeLong ER, Peterson ED, DeLong DM, et al. Comparing riskadjustment methods for provider profiling. Stat Med 1997;16:2645– 2664. 24. Goldstein H, Browne W, Rasbash J. Multilevel modelling of medical data. Stat Med 2002;21:3291–3315. 25. Sullivan LM, Dukes KA, Losina E. Tutorial in biostatistics. An introduction to hierarchical linear modelling. Stat Med 1999; 18:855–888. 26. Brown H, Prescott R. Applied mixed models in medicine. Chichester, UK: John Wiley & Sons, Ltd; 1999. 27. Leyland AH, Goldstein HE. Multilevel modelling of health statistics. Chichester, UK: John Wiley & Sons, Ltd; 2001. 28. Liu Y, Sun XW, Kurtz S, Tabak YP. Logistic and hierarchical logistic models for profiling hospital performance: a study of patients hospitalized for hemorrhagic stroke in Pennsylvania. Boston, MA: International Society for Quality in Health Care; 2007. 29. Glance LG, Dick AW, Osler TM, Mukamel D. Using hierarchical modeling to measure ICU quality. Intensive Care Med 2003; 29:2223–2229. 30. Hannan EL, Wu C, DeLong ER, Raudenbush SW. Predicting risk-adjusted mortality for CABG surgery: logistic versus hierarchical logistic models. Med Care 2005;43:726–735. 31. Fedeli U, Brocco S, Alba N, et al. The choice between different statistical approaches to risk-adjustment influenced the identification of outliers. J Clin Epidemiol 2007;60:858–862. 32. Glance LG, Dick A, Osler TM, et al. Impact of changing the statistical methodology on hospital and surgeon ranking: the case of the New York State cardiac surgery report card. Med Care 2006;44:311–319. 33. Dimick JB, Birkmeyer JD. Are we being fooled by randomness? Using reliability adjustment to improve hospital quality rankings. J Surg Res 2009;151:279.