Journal of Pediatric Surgery (2011) 46, 226–231
www.elsevier.com/locate/jpedsurg
Outcomes research in pediatric surgery part 2: how to structure a research question☆ David C. Chang a , Daniel S. Rhee b , Dominic Papandria b , Gudrun Aspelund c,1 , Robert A. Cowles c,1 , Eunice Y. Huang d,1 , Catherine Chen e,1 , William Middlesworth c,1 , Marjorie J. Arca f,1 , Fizan Abdullah b,⁎,1 a
Department of Surgery, University of California San Diego School of Medicine, San Diego, CA 92103, USA Division of Pediatric Surgery, Johns Hopkins University School of Medicine, Baltimore, MD 21287-0005, USA c Department of Pediatric Surgery, Columbia University College of Physicians and Surgeons, Morgan Stanley Children's Hospital of New York-Presbyterian, New York, NY 10032, USA d Division of Pediatric Surgery, University of Tennessee Health Science Center, Memphis, TN 38111, USA e Department of Surgery, Children's Hospital Boston, Harvard Medical School, Boston, MA 02115, USA f Department of Pediatric Surgery, Children's Hospital of Wisconsin, Medical College of Wisconsin, Milwaukee, WI 53201, USA b
Received 27 September 2010; accepted 30 September 2010
Key words: Surgical outcomes; Pediatric surgery; Study design; Outcomes research; Comparative effectiveness research
Abstract Innovative treatments and procedures are essential to the advancement of surgery. Outcomes research provides the mechanism to analyze these new treatments as they enter clinical practice and evaluate them against established therapies. Information gained through this methodology is essential because new techniques and innovations often gain rapid acceptance before clinical trials can be conducted to assess them. Increasing national emphasis is placed on comparative effectiveness as health care costs rise. Surgeons must take the lead in surgical outcomes and comparative effectiveness research, with the goal of identifying the most efficient and effective treatment for our patients. The authors show how to structure and design a research project involving pediatric surgical outcomes. The model consists of the following 3 phases: (1) study design, (2) data preparation, and (3) data analysis. The model we present provides the reader with a basic format and research structure to serve as a guide to performing high-quality surgical outcomes research. © 2011 Elsevier Inc. All rights reserved.
1. Background Comparative effectiveness or outcomes research is defined by the Agency for Healthcare Research and Quality ☆
Presented at the 2010 Annual Meeting of the American Pediatric Surgical Association, Orlando, FL. ⁎ Corresponding author. Tel.: +1 410 955 1983; fax: +1 410 502 5314. E-mail address:
[email protected] (F. Abdullah). 1 For the 2010 APSA Outcomes Committee. 0022-3468/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jpedsurg.2010.09.095
as research “designed to inform health care decisions by providing evidence on the effectiveness, benefits, and harms of different treatment options. The evidence is generated from research studies that compare drugs, medical devices, tests, surgeries, or ways to deliver health care” [1-3]. In surgical practice, this most often involves the comparison of alternative forms of surgical treatment, such as minimally invasive to open surgery. In its analysis of surgical outcome, comparative effectiveness research can evaluate predictors beyond those at the patient level and may also include
Outcomes research in pediatric surgery national, regional, hospital, and provider factors. The power of this type of research lies in its ability to inform patients, providers, and policy makers about which interventions are most effective for which patients under various circumstances [4]. There are important fundamental differences between randomized clinical trials (RCTs) and comparative effectiveness research. Randomized clinical trials are designed to evaluate the safety and efficacy of a new drug or a new surgical technique in a very controlled setting. An independent variable (x) is varied between members of a population, and an outcome or dependent variable (y) is analyzed. The differences in the members of the population are controlled at the outset by randomizing the members assigned to each group. Outcomes research takes the patient populations in the natural practice setting, evaluates the effectiveness of the treatment, and accounts for the differences in the populations through analysis. Although RCTs remain the gold standard for research, they have important limitations. The controlled setting may be so specific that the measured outcomes of the trial may be difficult to generalize to the population at large. In contrast, because outcomes research evaluates effectiveness of therapy in the natural practice setting, its results are easier to adapt to the population as a whole. Randomized clinical trials isolate a single variable and its causal association with an outcome measure. Outcomes research allows us to look at factors that influence outcome but are difficult to factor into a clinical trial, such as surgeon variability, surgeon volumes, hospital volumes, type of hospital, and regional characteristics such as the density of providers in a particular area [59]. Another limitation of an RCT involves the length of time required to accrue, analyze, and disseminate results. When a new intervention is introduced, there is a learning curve among surgeons. Randomized clinical trials comparing this new intervention and a gold standard should take place when surgeons are equally facile with both procedures. In addition, a significant number of patients may be needed to adequately power a well-designed RCT. With the current information exchange, new interventions are often introduced into practice despite inadequate evidence of superior benefit or safety. Practice patterns may change rapidly and may therefore erase equipoise in the surgical community. When this occurs, conducting the clinical trial may be difficult; its results, although valid, may not be readily incorporated into practice. In contrast, comparative effectiveness studies can be performed even when new surgical techniques have become accepted practice. Data can be quickly collected for both the new and the old technique, and relative benefits and disadvantages can be compared as they occur in actual practice. Thus, in certain circumstances, surgical outcomes research may be a better way to generate evidence for the integration of new surgical techniques into clinical care. It can certainly complement information garnered by RCTs. It is crucial to use the most appropriate database to gather information for the study. There are many multispecialty
227 databases that are publicly available, including the National Inpatient Sample, the Kids' Inpatient Database, and the National Surgical Quality Improvement Program (NSQIP) [10-12]. Although NSQIP has included only adult patients, the Pediatric NSQIP has been piloted, and data are currently being collected at more than 25 hospitals. Specialty-specific databases, including Surveillance Epidemiology and End Results, National Trauma Data Bank, and the United Network for Organ Sharing, are available for use as well [13-15]. This vast resource of data allows for rapid and relatively inexpensive evaluation of outcomes in large nationally representative patient populations. Outcomes research plays an expanding role in benchmarking quality of care, comparing treatment effectiveness, refining treatment strategies, and educating patients as well as third-party payers. Thus, it is important for surgical researchers to learn the basic methods in the design and execution of studies that use population-based data.
2. Designing an outcomes research project Outcomes research is hypothesis driven and is not merely “data mining.” The investigator must be involved throughout the process. Having a statistician alone to perform the analysis may compromise the clinical relevance of the study and undermine the analysis of the data. The next sections will discuss in detail the phases of structuring an outcomes research project: study design, data preparation, and data analysis (Table 1). We will use the example comparison of laparoscopic to open surgical fundoplication for treatment of gastroesophageal reflux disease (GERD) in children.
2.1. Study design The study design phase includes the development of a question, definition of the population to be studied (including any subsets of that population that may be of relevance), delineating clinical outcomes, identifying the primary comparison, and specifying the covariates. Table 1
Components of an outcomes research study
Study design
Data preparation
Data analysis
1. Question development 2. Define population
1. Select database 2. Link databases, if necessary 3. Select data elements 4. Generate new data elements
1. Univariate 2. Bivariate
3. Define subset 4. Define outcomes 5. Define primary comparison 6. Define covariates
3. Multivariate 4. Sensitivity 5. Subset analysis
228 As with any research study, the process begins with a relevant question. There are 2 types of questions that can be asked: open ended and closed ended. Open-ended questions ask what, why, when, and how questions, such as “What is the incidence of appendicitis in 11-year-old children?” or “What is the typical presentation of appendicitis in children?” The data acquired for open-ended questions are descriptive in nature. Because there is no specific testable hypothesis, there are no comparisons or statistical tests. P values are not applicable in descriptive studies because comparisons are being made between different subsets of the same population. When open-ended questions are applied to 2 different populations, a comparative analysis can be made. In this situation, statistical testing can be applied using measures of significance. The difference between descriptive and comparative analyses is illustrated in Fig. 1. On the left, 43% of population A has a certain characteristic, whereas 57% of the same population does not. No statistical test can be used here because we are describing the incidence of the characteristics within one population. By contrast, in a comparative analysis, we would compare the incidence of a certain characteristic between population A (43%) on the left and population B (45%) on the right of the figure. This would involve a comparison of 2 different populations and, hence, 2 denominators; in such a case, statistical testing can be applied (Fig. 1). Closed-ended questions are the second type of questions that can be posed. Unlike open-ended questions, which are exploratory, closed-ended questions require a specific knowledge of the issue to be meaningful. Closed-ended questions should have a hypothesis and must be testable. Four main components need to be identified to formulate a closed-ended question. These components are represented by the mnemonic PICO: Patients (populations), Interventions, Comparison groups, and Outcomes of interest. Each of these must be clearly defined at the outset, although in no particular order. The outcome(s) of interest is a very important component in designing an outcomes study but is often oversimplified. Identifying an appropriate outcome requires knowledge of what would best answer the question at hand and what is
Fig. 1 Looking at the percentages of A alone would be used in a descriptive analysis for an open-ended question. A comparative analysis for a closed-ended question would look at the differences in percentages between populations A and B and would require tests for statistical significance.
D.C. Chang et al. measurable from the data available. In our example of laparoscopic vs open fundoplication, mortality is an easily measured outcome. Because death is rare after fundoplication, it is not a particularly useful outcome for comparison. Although recurrence of reflux symptoms would be a more appropriate outcome, it may not be easily measurable within the data available. Other outcomes of interest might include specific surgical complications, length of hospital stay, and total hospital charges. Thus, both clinical and administrative outcomes may be evaluated. The interventions under investigation and criteria for comparisons also need to be specified. The intervention in many surgical outcome studies is a surgical procedure for a specific condition, such as open vs laparoscopic fundoplication for GERD. Comparison groups are decided by the more specific questions of the study. Surgical outcome studies typically compare patients undergoing a new technique or innovation to the current standard of care. The provider, hospital size/type, and other geographic/regional factors can also be chosen for comparison. In our example, we compare outcomes for the new technique, laparoscopic fundoplication, to those of the more established open fundoplication. Comparison groups are then further defined according to database criteria, such as International Classification of Diseases, Ninth Revision, diagnosis and Current Procedural Terminology procedure codes. Patients or populations of interest must be defined to select the databases to be queried. The population is easily identifiable at the outset but must be refined depending on the type of questions asked. Determining inclusion and exclusion criteria that restrict the database contents to the patients of interest is the next step. For the example of laparoscopic vs open surgery, it must be decided whether to include all patients assigned an International Classification of Diseases, Ninth Revision, code diagnosis of GERD; all patients with Current Procedural Terminology procedure codes for either laparoscopic or open fundoplication; or patients assigned both the procedure code and diagnosis code. For our example, the latter should be chosen. If patients were selected by diagnosis code alone, this would include patients who were not surgically treated. Selecting patients on the basis of procedure code alone might result in inclusion of patients who were treated for other diagnoses. It is essential to design the inclusion criteria to best capture the targeted population and only this targeted population. This often requires incorporation of both procedural and diagnostic codes. In studying laparoscopic operations, the question sometimes arises about how to identify laparoscopic cases if a separate code was not available. In these cases, a general laparoscopy code can be used. In the pediatric population, it is particularly important to decide which ages to include in the study. Should all pediatric patients be included or only those within a certain age range? Should neonates be included? Although populations with a broad age range can be included when studying adults, the differences between infants and
Outcomes research in pediatric surgery teenagers warrant careful attention in deciding on the patients of interest to a particular study. Compared to adults, children are more likely to live in poverty, are treated more frequently for acute illnesses as opposed to long-term conditions, and are dependent on caregivers. In addition, children are more likely to receive more preventive care and undergo rapid physical, cognitive, and social development compared to adults [16,17]. Although these differences present challenges to the accurate study of surgical outcomes in children, they also represent an opportunity to explore other factors that impact child health and the pediatric surgical patient. Similarly, for exclusion criteria, we look to exclude any patients who fit the inclusion criteria but not the question under study. Returning to our example, we would want to exclude any patient who had both laparoscopic and open fundoplication, which would mean excluding any patient who returned to the operating room during the same hospitalization for a second procedure. In this example, we would distinguish patients undergoing a laparoscopic procedure that was converted to open from those who had a laparoscopic operation but returned to the operating room at a later date by checking procedure dates. Cases converted from laparoscopic to open are generally treated as open cases. After determining the intervention and comparison criteria, defining the patient population, and specifying the outcomes of interest, the covariates must then be identified. Covariates are variables that may influence outcome irrespective of the research question and, so, confound the association between your primary variable of comparison and the outcome. The purpose of including covariates in the study is to account for any differences in the patient groups being compared that are not primarily associated with the variable of comparison (which in our example is laparoscopic vs open fundoplication). For example, if patients undergoing open operations are younger or have more comorbidities than patients undergoing laparoscopic procedures, we will need to adjust for these factors to compare patients of similar status. Otherwise, we may not know whether differences in outcome are associated with the type of operation or with baseline patient characteristics. Other patient level covariates may include race, sex, and age. Provider, hospital, and regional characteristics can be included as covariates as well. Identification of appropriate covariates requires indepth understanding of the study question and should be supported by a basic literature search for factors known to influence your outcomes of interest. This is a major distinction between randomized controlled trials and outcomes research. In clinical trials, known and unknown covariates are controlled for by randomization. In outcomes research, patients are not randomized, and so, defining appropriate covariates is very important to achieve comparable study populations. Unknown covariates can introduce bias.
229 Once the PICO elements have been defined, we state the research question in an analytical format, represented by the following equations: y = mx y = m1 x1 + m2 x2 + m3 x3 ; where y is the dependent outcome of interest and each x is an independent variable. For each x value, there is an “m” value that represents an (x,y) relationship. The estimates of these m values and the statistical testing of these estimates are what enable us to test specific research hypotheses between variable differences between each group. Each analytical question can handle multiple independent variables (x), which result in only 1 outcome (y). Thus, a research project can have multiple “questions” or independent variables, but each question can only examine 1 outcome at a time. To fully explore the research question, it is also important to identify subsets within the population. These subsets are specialized groups the members of which fit the inclusion criteria but have characteristics that one believes may influence outcome. The subset can then be analyzed separately to assess whether the findings are different from those of the overall study population. This subset analysis can help to decide whether to exclude a certain group of patients. If that group of patients is markedly different, with very different expected outcomes, then one might exclude them. However, if it is not clear that the subgroup is different, one can choose to include them in the analysis and perform a subsequent subset analysis to evaluate the effects of the intervention on comparison groups from that subset alone. From our example, laparoscopic procedure converted to open might be a subset for separate analysis.
2.2. Data preparation After completion of the study design, the next step in the process is the data preparation phase. First, an appropriate database must be chosen. The selected database will often lack some of the information required to answer the study question. One solution is to link databases to combine data, provide new variables for analysis, and thereby create a more robust patient sample. The authors have successfully used a nonoverlapping combination of the National Inpatient Sample and Kids' Inpatient Database databases, allowing the use of data from both in our study. Data can also be linked to physician profiles, hospital characteristics, and system characteristics, allowing us to answer questions about how physician or hospital characteristics affect outcomes. Once a database is chosen, data elements are selected based on the decided criteria. Data elements for the outcomes of interest, the primary independent variable, and the covariables must now be determined. A data dictionary specifically defining all of the variables should be created.
230
D.C. Chang et al.
Some components of the research question will probably not be readily evident or available. For instance, comorbidity indices are generally not available in these databases. If required data elements are not found in individual or linked databases, then new data elements may need to be generated, or a preexisting index may be required. In our fundoplication example, defining significant comorbidities such as congenital heart disease or cerebral palsy may need to be defined. Generating new data elements can be the most timeconsuming step of the outcomes analysis.
Table 3 Bivariate analysis comparing characteristics and outcomes by laparoscopic and open Nissen fundoplication in children
2.3. Analysis
Sample table for bivariate analysis. The characteristics and outcomes between 2 populations (eg, laparoscopic vs open Nissen fundoplication in children) are listed.
After the database is selected and the data elements are chosen or generated, the analysis is completed. The analysis phase begins with a descriptive or univariate analysis, where each variable is individually reported, with appropriate measures (mean, median, and proportion). This is summarized in Table 2. Next, a bivariate analysis compares characteristics and the outcomes of interests. A bivariate analysis conducts a simultaneous analysis of 2 variables to see if one variable is related to another variable. It tests the hypothesis of causation and causality. The equation stated above is y = mx, where y is the outcome (dependent) variable and x is the independent variable. To set up the analysis, a bivariate table is constructed (Table 3). The primary independent variables are listed in columns and cross-referenced, including the outcomes and pertinent secondary variables. These comparisons by the primary independent variable involve statistical analysis with measures of significance. P values or confidence intervals should be presented as part of the bivariate analysis. Choosing the appropriate statistical test is essential. To choose the correct test, one must ask the following 3 questions:
Laparoscopic
Open
Age Sex Race
Mean years % %
Mean years % %
Death Complications Length of stay
% % Mean days
% % Mean days
P
3. Using Fig. 2 as a reference, which of the 4 combinations should be used? Is the outcome categorical or continuous, and is the predictor categorical or continuous? Distinguishing between the outcome and predictor is critical because it can lead to the wrong test if improperly chosen. Fig. 2 provides the appropriate test depending on the type of outcome and predictor that is chosen for the study. For example, a categorical outcome (dependant variable) with a categorical predictor (independent variable) would warrant a χ2 test for analysis, whereas the same outcome with a continuous variable as the predictor would require a receiver operating characteristic test. Multivariate analysis controls for other variables included in the study. Multivariate analysis mathematically adjusts for the preidentified covariates, such as age, sex, race, or hospital characteristics, to allow for an appropriate comparison that
1. What are the variables being cross-referenced? Classify data elements as either outcomes or predictors. 2. What types of data are there for each? Ask if the outcomes and predictors are continuous or categorical data elements.
Table 2 Univariate analysis describing the total patient population Characteristic or outcome
Value
Age Sex Race
Mean years % %
Death Complications Length of stay
% % Mean/median day
Sample table for univariate analysis. The characteristics and outcomes describe the total patient population.
Fig. 2 This diagram shows which statistical test is appropriate based on the type of data (categorical or continuous) for both the outcome and predictor in the study. ROC indicates receiver operating characteristic.
Outcomes research in pediatric surgery
231
Table 4 Multivariate analysis with adjusted odds ratios of outcomes for laparoscopic compared with open Nissen fundoplication in children Outcome
Odds ratio
Rhee, Papandria, Aspelund, Cowles, Huang, Chen, Middlesworth, and Arca.
P
Death Complications Length of stay Sample table for multivariate analysis. Adjusted odds ratios of outcomes for laparoscopic vs open Nissen fundoplication in children.
isolates the primary variable as the main predictor of outcome. This allows comparison of patients across populations that are very different in demographics. Either logistic or linear regression analysis is used, depending on the type of outcome data studied (categorical/binary or continuous, respectively). The format for displaying the results of the multivariate regression is presented in Table 4. Further analysis can be performed beyond the model presented here, which is designed to function as a framework and to show the essential components of a pediatric surgical outcomes study.
3. Conclusion The development of innovative new treatments is essential to advance pediatric surgical care. We must rigorously examine evidence of the benefits and harms of new therapies to promote best practices and enhance safety, effectiveness, and efficiency. The national emphasis on comparative effectiveness research and the vast resources of data now available have positioned surgical outcomes research as the primary method for generating evidence that will direct the practice of surgery. Outcomes research can best examine the integration of novel surgical techniques into active clinical practice. This is essential for the advancement of pediatric surgical care. We must promote the proper use of data and rigorous study design of outcomes research to generate reliable evidence. We have presented a detailed, structured method in which to perform a surgical outcomes study, and we promote this model for widespread use in the future.
Acknowledgments Author contributions: Article concept and design: Abdullah and Chang; drafting of the manuscript: Abdullah, Chang, and Rhee; critical revision of the manuscript: Abdullah,
References [1] Agency for Healthcare Research and Quality. What is comparative effectiveness research? http://www.effectivehealthcare.ahrq.gov/ index.cfm/what-is-comparative-effectiveness-research1. Accessed August 2010. [2] Initial National Priorities for Comparative Effectiveness Research. Institute of Medicine of the National Academies. Washington, DC: The National Academies Press; 2009. [3] To err is human: building a safer health system. Washington, DC: Institute of Medicine of the National Academies; 1999. [4] Tunis SR, Benner J, McClellan M. Comparative effectiveness research: methods development, policy context and research infrastructure. Stat Med 2010;29:1963-76. [5] Birkmeyer JD, Siewers AE, Finlayson EV, et al. Hospital volume and surgical mortality in the United States. N Engl J Med 2002;346:1128-37. [6] Hall BL, Hsiao EY, Majercik S, et al. The impact of surgeon specialization on patient mortality: examination of a continuous Herfindahl-Hirschman index. Ann Surg 2009;249:708-16. [7] Birkmeyer JD, Stukel TA, Siewers AE, et al. Surgeon volume and operative mortality in the United States. N Engl J Med 2003;349: 2117-27. [8] Ponsky TA, Huang ZJ, Kittle K, et al. Hospital- and patient-level characteristics and the risk of appendiceal rupture and negative appendectomy in children. JAMA 2004;292:1977-82. [9] Safford SD, Pietrobon R, Safford KM, et al. A study of 11,003 patients with hypertrophic pyloric stenosis and the association between surgeon and hospital volume and outcomes. J Pediatr Surg 2005;40:967-72. [10] Agency for Healthcare Research and Quality. Introduction to the HCUP Nationwide Inpatient Sample (NIS). http://www.hcup-us.ahrq. gov/nisoverview.jsp. 2008 Accessed April 2010. [11] Agency for Healthcare Research and Quality. Introduction to the HCUP Kids' Inpatient Database (KID). http://www.hcup-us.ahrq.gov/ kidoverview.jsp. 2006 Accessed April 2010. [12] American College of Surgeons. National Surgical Quality Improvement Program. http://acsnsqip.org/main/about_overview.asp. Accessed April 2010. [13] National Cancer Institute. Surveillance epidemiology and end results. http://seer.cancer.gov/about. Accessed April 2010. [14] American College of Surgeons Committee on Trauma. National Trauma Data Bank. http://www.facs.org/trauma/ntdb/index.html. Accessed April 2010. [15] Health Resources and Services Administration. Organ Procurement and Transplantation Network. http://www.unos.org/donation/index. php?topic=data. Accessed April 2010. [16] McDonald KM, Davies SM, Haberland CA, et al. Preliminary assessment of pediatric health care quality and patient safety in the United States using readily available administrative data. Pediatrics 2008;122:e416-425. [17] Seid M, Varni JW, Kurtin PS. Measuring quality of care for vulnerable children: challenges and conceptualization of a pediatric outcome measure of quality. Am J Med Qual 2000;15:182-8.