Journal of Clinical Epidemiology 72 (2016) 56e65
Deficiencies in reporting of statistical methodology in recent randomized trials of nonpharmacologic pain treatments: ACTTION systematic review Jordan D. Dworkina, Andrew McKeownb, John T. Farrara, Ian Gilronc, Matthew Hunsingerd, Robert D. Kernse, Michael P. McDermottf,g, Bob A. Rappaporth, Dennis C. Turki, Robert H. Dworkinj, Jennifer S. Gewandterj,* a
Department of Biostatistics and Epidemiology, University of Pennsylvania, 423 Guardian Dr., Philadelphia, PA 19104, USA b City University of New York-Hunter College, 695 Park Ave, New York City, NY 10065, USA c Department of Anesthesiology, Queen’s University, 99 University Ave, Kingston, Ontario K7L3N6, Canada d School of Professional Psychology, Pacific University, 222 SE 8th Ave, Hillsboro, OR 97123, USA e VA Connecticut Healthcare System, 950 Campbell Ave, West Haven, CT 06516, USA f Department of Biostatistics and Computational Biology, University of Rochester School of Medicine and Dentistry, 265 Crittenden Rd, Rochester, NY 14642, USA g Department of Neurology, University of Rochester School of Medicine and Dentistry, 601 Elmwood Ave, Rochester, NY 14642, USA h Arlington, VA, USA i Department of Anesthesiology and Pain Medicine, University of Washington, 1959 NE Pacific St, Seattle, WA 98195, USA j Department of Anesthesiology, University of Rochester School of Medicine and Dentistry, 601 Elmwood Ave, Rochester, NY 14642, USA Accepted 26 October 2015; Published online 18 November 2015
Abstract Objective: The goal of this study was to assess the quality of reporting of statistical methods in randomized clinical trials (RCTs), including identification of primary analyses, missing data accommodation, and multiplicity adjustment, in studies of nonpharmacologic, noninterventional pain treatments (e.g., physical therapy, cognitive behavioral therapy, acupuncture, and massage). Study Design: Systematic review of 101 articles reporting RCTs of pain treatments that were published between January 2006 and June 2013 in the European Journal of Pain, the Journal of Pain, and Pain. Setting: Systematic review. Results: Sixty-two percent of studies identified a primary outcome variable, 46% identified a primary analysis, and of those with multiple primary analyses, only 21% adjusted for multiplicity. Slightly over half (55%) of studies reported using at least one method to accommodate missing data. Only four studies reported prespecifying at least one of these four study methods. Conclusion: This review identified deficiencies in the reporting of primary analyses and methods to adjust for multiplicity and accommodate missing data in articles disseminating results of nonpharmacologic, noninterventional trials. Investigators should be encouraged to indicate whether their analyses were prespecified and to clearly and completely report statistical methods in clinical trial publications to maximize the interpretability of trial results. Ó 2016 Elsevier Inc. All rights reserved. Keywords: Nonpharmacologic trials; Reporting; Missing data; Multiplicity; Prespecification; Clinical trials
Conflict of interest: The views expressed in this article are those of the authors and no official endorsement by the Food and Drug Administration (FDA) or the pharmaceutical and device companies that provided unrestricted grants to support the activities of the Analgesic, Anesthetic, and Addiction Clinical Trial Translations, Innovations, Opportunities, and Networks (ACTTION) publiceprivate partnership should be inferred. Funding: Financial support for this project was provided by the ACTTION publiceprivate partnership which has received research contracts, grants, or other revenue from the FDA, multiple pharmaceutical and device companies, and other sources. J.D.D. did not receive payment for his time. * Corresponding author. Tel.: þ1 585-276-5661; fax: þ1 585-2447271. E-mail address:
[email protected] (J.S. Gewandter). http://dx.doi.org/10.1016/j.jclinepi.2015.10.019 0895-4356/Ó 2016 Elsevier Inc. All rights reserved.
1. Introduction In randomized clinical trials (RCTs), many statistical issues must be considered in the study design and data analysis plan. When these issues are improperly addressed, the validity and utility of the study results may be compromised. In order for readers to understand and evaluate the results of the analyses and the conclusions that the authors present when they publish clinical trials, it is important that investigators prespecify and thoroughly report the statistical methods used to analyze the data.
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
What is new? Articles reporting randomized clinical trials (RCTs) of nonpharmacologic pain treatments (e.g., physical therapy, cognitive behavioral therapy, acupuncture, and massage) have deficiencies in reporting primary analyses and methods to adjust for multiplicity and accommodate missing data. This article is the first to assess these statistical issues in nonpharmacologic treatments for pain and compares the quality of reporting to previous studies of pharmacologic and invasive treatments. This article demonstrates that authors, reviewers, and editors should pay close attention to how statistical details are described in reports of RCTs for nonpharmacologic pain treatments to increase the validity and transparency of the conclusions drawn pertaining to treatment benefits.
Two statistical issues that can have a substantial impact on study conclusions are (1) the use of multiple hypothesis tests (i.e., multiplicity) in the primary analysis and (2) the amount of missing data and how the unavailability of these data are addressed [1e7]. Whether multiple analyses are considered primary, specifying the primary analysis a priori ensures that the results of numerous analyses are not examined before one is designated as primary, which will inflate the chance of obtaining false positive results [1,2]. Although multiple hypotheses are often of equal importance to the investigator, when multiple statistical tests are considered primary, some adjustment for multiplicity must be used to prevent an increase in the probability of a type I error (i.e., falsely concluding a treatment effect). For example, an investigator may wish to compare multiple treatment groups, compare treatment groups at multiple time points, or compare treatment groups with respect to multiple outcome variables. Statistical methods that can be used to control the overall probability of a type I error include but are not limited to (1) defining ‘‘coprimary’’ outcome variables (i.e., to declare a treatment successful, it is required to have a significant benefit on all coprimary analyses at the 0.05 level), (2) Bonferroni correction [8], (3) various stepwise procedures (e.g., Holm [9], Hochberg [10], and Hommel [11] corrections), specialized methods for multiple group comparisons (e.g., the TukeyeKramer method [12] or Dunnett’s test [13]), and (4) gatekeeping or hierarchical procedures [14,15]. Although the number of statistical tests performed in the primary analysis of an RCT and statistical methods to control for multiplicity should be planned by the investigators, missing data are less predictable and methods to
57
accommodate them are less straightforward. As a result, it is recommended that researchers make every effort to minimize the amount of missing data in clinical trials [4]. Missing data, however, are often unavoidable [3,4]. Because the pattern of missing data can be different in each situation, guidelines dictating precisely when to use each method are not feasible. Each method to accommodate missing data uses an assumption regarding the probability that a set of values is missing given the observed data and the true values of the unobserved data (see Table 1 for definitions of these assumptions). Although it is almost always impossible to determine the exact probability model, reasonable assumptions about the model (i.e., the missing data mechanism) can often be made and methods can be selected that are best suited for those assumptions. With large amounts of missing data, however, the validity of the results and conclusions of RCTs can still be jeopardized. To minimize the effects of missing data, statisticians [3,17e19], the Food and Drug Administration [20], the European Medicines Agency [21], and the National Research Council [4] all suggest that studies use multiple methods (i.e., a primary method and secondary methods, or sensitivity analyses) that make different assumptions about the missing data mechanism. When the assumption underlying the method used is correct, bias in the treatment effect will be minimal, if not eliminated, and variability estimates will be appropriate. Because the choice of methods for accommodating missing data and correcting for multiplicity can substantially affect the conclusions of the trial, it is important that the methods used to address these issues in primary analyses of RCTs be prespecified to prevent biased selection of analyses that produce the most favorable results. Two recent studies investigated the reporting of multiplicity adjustment, missing data methods, and primary analyses in RCTs of pharmacologic and interventional (i.e., invasive) treatments for pain [16,22]. Deficiencies were identified in the reporting of each of these aspects of data analysis, and in some cases, these deficiencies likely reflected a failure to use appropriate methods. Reporting of these statistical issues has not been investigated in RCTs of nonpharmacologic treatments that are not invasive (e.g., physical therapy, acupuncture, cognitive-behavior therapy, and massage) for pain. Nonpharmacologic treatments are very common in chronic pain patients and have been recommended in a number of clinical practice guidelines ([23,24]). For example, approximately 44% of people with pain seek complementary and alternative medicine (CAM) treatments, a subset of nonpharmacologic treatments, at some point. Additionally, these treatments are generally well received by patients, a majority of whom report a ‘‘great deal’’ of benefit from them [25]. In RCTs of pharmacologic treatments, the main outcome of interest is often pain intensity, whereas the major goals of nonpharmacologic treatments typically include not only reducing pain intensity, but also improving physical
58
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
Table 1. Definitions of assumptions for missing data mechanisms Assumption Missing Completely at Random Missing at Random
Missing Not at Random
Definition Missing data mechanism is independent of observed and unobserved outcomes. ‘‘Implies that missing data are unrelated to the study variables’’ [3]. For example, the participant was a passenger in a bus accident and could not come in for a treatment visit at which data would have been collected. Conditional on the observed outcomes, missing data mechanism is independent of the unobserved outcomes. ‘‘Implies that recorded characteristics can account for differences in the distribution of missing variables for observed and missing cases’’ [3]. For example, at the last visit the participant attended, she reported zero pain relief from baseline. Sometime before the next visit, she decided the study was no longer worth her time because she was not experiencing pain relief and decided not to go back for her next visit. In this situation, the missing data are related to the patient’s poor pain relief, which was recorded by the research team at the last visit she attended. Missing data mechanism may depend on the unobserved outcomes. ‘‘Implies that recorded characteristics do not account for differences in the distribution of the missing variables for observed and missing cases’’ [3]. For example, a participant in a flexible dosage trial comes in for his second visit and is not feeling any pain relief and also reporting no side effects. The research physician increases the treatment dosage to the highest level and sends the participant home. The participant’s pain worsens over the next week. He decides to stop participating in the trial and does not contact the research team again. The research team is completely unaware that the participant’s pain worsened after he left the clinic.
Reproduced from reference number [16].
functioning, decreasing emotional distress, and improving health-related quality of life. Because of these multiple goals of nonpharmacologic treatments and the lack of knowledge of the mechanisms of action of many such treatments, it can be very challenging for nonpharmacologic investigators to choose just one primary outcome variable [26]. In addition, nonpharmacologic treatments are generally not regulated by governmental agencies as are pharmaceuticals and devices, and they are often perceived to have less harm potential, both of which could lead to less rigorous methodology and statistical reporting, as was seen in a review of adverse event reporting in nonpharmacologic trials [27]. The objective of this study was to investigate the quality of statistical reporting related to methods used to address multiplicity and accommodate missing data in nonpharmacologic trials published in three major pain journals. If appreciable inadequacies in the reporting of statistical methods exist, increased attention to the reporting of statistical methods in nonpharmacologic trials has the potential to reduce the prevalence of inappropriate statistical practices, enhance the interpretability and credibility of the results, and thereby potentially lead to greater recognition of the value, or lack thereof, of nonpharmacologic treatments.
2. Methods 2.1. Article selection Reports of nonpharmacologic RCTs published between January 2006 and June 2013 were selected from three major pain journals (i.e., the European Journal of Pain, the Journal of Pain, and Pain). These journals were investigated because they are, respectively, the official journals of the European Federation of International Association for the Study of Pain (IASP) Chapters, American Pain Society, and IASP. Nonpharmacologic interventions included any
treatment that was nonpharmacologic and noninterventional (i.e., noninvasive). Examples include massage, cognitive behavioral therapy, yoga, laser treatments, and acupuncture. We focused on articles published between 2006 and 2013 to evaluate articles published as recently as possible but still include a reasonably large sample size. The selected articles reported trials that were randomized and that compared at least two treatments. Studies including only a ‘‘waitlist’’ control or ‘‘standard of care’’ group that was not standardized in any way by the study protocol were excluded because these studies do not control for the time and attention that participants receive in a clinical trial and therefore were not considered of sufficiently high design quality to be included. They all focused on pain outcome variables (i.e., a patient-reported outcome measuring pain intensity, relief, qualities, area, or any composite patient-reported outcome including pain) [27]. We chose RCTs because they eliminate confounding of treatment effects due to participant characteristics but did not require blinding because nonpharmacologic treatments are often impossible to blind. 2.2. Data extraction A coding manual was developed to investigate the reporting of the primary outcome variable and primary statistical analysis plan, as well as efforts made to prevent missing data, methods used to accommodate missing data, and adjustment for multiple testing. When a primary analysis was identified, questions regarding missing data and multiplicity were answered only in reference to the stated primary analysis. The coding manual was developed using multiple rounds of pretesting and modification for clarity as described previously [16,22]. The manual instructed coders to search the Methods and Results sections of the articles, as well as the final paragraph of the Introduction section. The order of the articles was randomized in two separate lists and each article was
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
double coded, with one person (J.D.D.) coding all articles in the first list and two people (J.S.G. and A.M.) each coding half of the articles in the second list. After the articles were coded independently, J.S.G. reviewed the data for discrepancies. Discrepancies due to oversight were corrected, and discrepancies due to alternative interpretations were discussed to arrive at a consensus. Journal, study design, year, and number of randomized participants were also extracted for each trial. In addition, information regarding trial type (experimental pain model vs. clinical pain condition), trial size ( or !100 subjects/arm included in the primary analyses), and sponsor (industry, government, professional organization, not reported) were collected to determine whether these variables were associated with the quality of reporting. The cutoff of 100 subjects per arm was used because of literature that suggests that bias in clinical trial conclusions decreases when more than 200 subjects are included in the study [28]. 2.3. Statistical analysis Descriptive statistics were used to determine (1) the number and percentage of articles specifying (a) the number of subjects who were randomized and who completed the trial; (b) a primary outcome variable; (c) a primary analysis, including a primary outcome variable, statistical test, and time point of assessment; (d) multiple primary analyses; (e) a multiplicity adjustment for all primary analyses, when appropriate; (f) a multiplicity adjustment for any or all analyses when a primary analysis was not identified; (g) methods to prevent missing data and increase retention; and (h) a method to accommodate missing data; (2) the frequency with which trials used each method for (a) adjusting for all multiplicity in the primary analysis; (b) preventing missing data; and (c) accommodating missing data. For the purposes of this study, a so-called ‘‘complete case’’ analysis (i.e., when authors clearly stated that the analysis omitted cases that did not have complete data for the outcome variable) was considered as one possible method to accommodate missing data. Fisher exact tests were used to examine the relationships between trial type and identification of a primary outcome variable, a primary analysis, a method to accommodate missing data, and, if applicable, adjustment for multiplicity. Planned comparisons of large (100 subjects/arm) versus small (!100 subjects/arm) trials and trials that were sponsored by industry versus any other funding source were not carried out because so few trials were large (n 5 6) or supported by industry (n 5 4). The Holm method was prespecified to adjust for multiple comparisons of experimental pain model vs. clinical pain condition trials. In the initial step of the Holm method, the smallest P-value from all the planned comparisons is compared to a significance level of 0.05/k, where k is the number of comparisons to be made. If the null hypothesis is rejected, the next smallest P-value is compared to a significance level of 0.05/(k1), and so on. The process stops when a null hypothesis can
59
no longer be rejected [9]. In this case, a total of four comparisons were made; therefore, the significance level in the initial step was set to a 5 0.0125.
3. Results 3.1. Coder discrepancies A total of 3,684 original articles were screened; 120 articles reporting results of RCTs were identified. The active treatment was only compared to a waitlist or no treatment control in 13 articles, and in 6, the active treatment(s) was only compared to a nondefined standard of care. Thus, 101 RCTs of nonpharmacologic treatments investigating pain outcomes met the eligibility criteria (Fig. 1, Supplemental File 1/Appendix C at www.jclinepi.com). The number of items coded differed among articles depending on the types of analyses conducted and the quality of their reporting. The total number of items coded was 2,540, and 218 coding discrepancies occurred (8.6%). Of these discrepancies, 204 were mistakes or oversights by one of the coders, and 14 were due to differences in interpretation. The coders discussed the different interpretations to arrive at a consensus. 3.2. Trial characteristics Most studies were published in Pain (51%), with 24% in the European Journal of Pain, and 25% in the Journal of Pain. Most trials used a parallel group design (87%); 13% used a crossover design. Seventy-six (75%) of the studies investigated clinical pain conditions, with the remainder examining experimental pain models (25%). The articles covered a variety of interventions, with the most common being psychological treatments (34%), TENs (18%), and acupuncture (13%). The median total number of subjects randomized was 83 (interquartile range: 47e137), and five (5%) of the trials randomized more than 100 subjects to each treatment group that was included in the primary comparison. Only 3% of the studies were supported by industry (Table 2). 3.3. Primary outcome variable and analysis Of the 101 trials, 63 (62%) identified a primary outcome variable, and 46 (46%) identified a primary analysis, including a primary outcome variable, the statistical test used, and the time point at which the primary assessments were made (Table 3). Two studies were described as exploratory, one of which cited the exploratory nature of the study as the reason for not specifying a primary outcome variable and analysis. Reporting the prespecification of these trial elements was rare, with only 6% (n 5 4) and 4% (n 5 2) of those trials reporting a primary outcome variable and analysis, respectively, reporting prespecification.
60
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
Fig. 1. PRISMA diagram. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
3.4. Multiple outcome analyses Of the 46 articles that identified a primary analysis, 29 (63%) identified multiple primary analyses. Twenty-two (76%) of these trials had multiple primary outcome variables, 13 (45%) compared multiple treatment groups, and 6 (21%) carried out multiple analyses of one primary outcome variable. Of the 29 articles that identified multiple primary analyses, only 6 (21%) adjusted for multiplicity for all the primary analyses that were conducted, and only one of these six trials stated that the multiplicity adjustment was prespecified. Additionally, none of the trials that failed to adjust for multiple primary analyses acknowledged this fact in the Methods, Results, or Discussion sections. Three methods were used to adjust for multiplicity, including ak adjustment Bonferroni correction, gatekeeping, and Sid (Table 3). Of the 55 studies that did not specify a primary analysis, 25 (45%) mentioned adjusting for some of the multiple analyses presented, but none adjusted for all of the analyses presented in the article.
Studies involving experimental pain models were significantly less likely to identify a primary outcome variable than those involving clinical pain conditions (9/25 [36%] vs. 54/76 [71%], P 5 0.003). After adjustment for multiple comparisons, statistically significant differences were not found between these two groups in terms of the proportions of studies that identified a primary analysis or adjusted for multiple primary analyses when appropriate (primary analysis: 7/25 [28%] vs. 39/75 [52%], P 5 0.04; multiplicity adjustment: 1/5 [20%] vs. 5/24 [21%], P 5 1.0). 3.5. Preventing missing data Thirty-seven (37%) of the articles reported using at least one of the methods we identified for increasing retention or limiting missing data (see Appendix 1/Appendix A at www. jclinepi.com). The two most commonly used methods were allowance of concomitant analgesic medications other than acetaminophen and/or nonsteroidal anti-inflammatory drugs (16%) and compensating participants for their time
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65 Table 2. Clinical trial characteristics Characteristic Journal European Journal of Pain Journal of Pain Pain Year published 2006 2007 2008 2009 2010 2011 2012 2013 Study design Parallel group Crossover Trial type Experimental pain model Clinical pain condition Total number randomizeda Trial size 100 subjects/arm !100 subjects/arm Number randomized per arm Sponsor Industry Other Nonplacebo intervention type Psychological (e.g., CBT, distraction, pain coping skills) TENs Acupuncture Transcranial stimulation (e.g., electrical, magnetic) Exercise (e.g., strength training, yoga, qigong) Massage/muscle relaxation Physical therapy Laser 2 types of treatment Other Placebo controlledb
N (%) or median (IQR) 24 (24) 25 (25) 52 (51) 12 8 17 10 13 17 15 9
(12) (8) (17) (10) (13) (17) (15) (9)
88 (87) 13 (13) 25 (25) 76 (75) 83 (47e137) 5 (5) 96 (95) 36 (20e60) 3 (3) 98 (97) 34 (34) 18 (18) 13 (13) 8 (8) 6 4 3 2 10 3 55
(6) (4) (3) (2) (10) (3) (54)
Abbreviations: IQR, interquartile range; TENs, transcutaneous electrical nerve stimulation. a If number randomized was not specified, the best guess for the number included in the trial was coded. b Trials were counted as placebo controlled if they included treatments such as sham TENs, acupuncture, laser, transcranial stimulation, or an attention control for psychological treatments. If no nonactive treatment was included, the trial was not considered placebo controlled.
(10%), whereas the rest of the methods were each used by 4% or fewer of trials (Table 4). 3.6. Accommodating missing data Fifty-six (55%) trials reported using at least one statistical method to accommodate missing data. Of these 56 articles, 40 articles used only one method, and 6 identified one method as primary. The remaining 10 articles did not
61
Table 3. Specification of primary outcome variable, primary analysis, and multiplicity adjustment Reporting outcome Specified a primary outcome variable (of total, n 5 101) Specified a primary analysisa (of total, n 5 101) Primary analysis included multiple analyses (of those that identified a primary analysis, n 5 46) Types of multiple analyses included (of those with multiple primary analyses, n 5 29) Multiple outcome variables Multiple treatment groups Multiple analyses of the same outcome variable Adjusted for all multiple comparisons (of those with multiple analyses, n 5 29) Methods to adjust for multiplicity in primary analysis (total n 5 6) Bonferroni Gatekeeping or hierarchical testing Sidak adjustment
Frequency (%) 63 (62) 46 (46) 29 (63)
22 13 6 6
(76) (45) (21) (21)
4 (67) 1 (17) 1 (17)
a Requires primary outcome variable, statistical test, and time of analysis.
specify any of the multiple methods of accommodating missing data as primary or secondary. Only one article reported that the method to accommodate missing data was prespecified. The method most commonly used as the primary or only method for accommodating missing data was last observation carried forward (LOCF), with 10 (22%) of the 46 trials that identified a primary or singular method using this strategy. Other methods included a clearly stated complete case analysis (i.e., analysis of only observed, nonimputed data) Table 4. Strategies to prevent missing data reported in analgesic clinical trials
Strategy to prevent missing data Concomitant medications allowed (other than acetaminophen and/or NSAIDs) Participants paid for their time Patients were able to continue routine treatment Data recorded for subjects who discontinued intervention Rescue medications allowed (other than acetaminophen and/or NSAIDs) Rescue acetaminophen or NSAIDs only allowed Reminder e-mails/letters Concomitant nonpharmacological treatments allowed Outcome assessment mailed to subjects and filled out at home Othera
Frequency N (% of articles using the strategy) 16 (16) 10 (10) 4 (4) 4 (4) 2 (2) 2 (2) 2 (2) 2 (2) 2 (2) 4 (4)
Abbreviation: NSAIDs, nonsteroidal anti-inflammatory drugs. a Other strategies included designing the study with favorable odds of being in the active treatment group, reminder phone calls, designing the protocol to minimize the number of study visits, and giving subjects the option to switch to the control arm for conventional treatment.
62
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
Table 5. Missing data methods used in analgesic clinical trials
Missing data methods used (primary or only method) Single imputation Last observation carried forward Baseline observation carried forward Other single imputation Complete case Linear mixed models Mixed-effects models for repeated measures Linear or generalized linear mixed model Other mixed/multilevel modeling Generalized estimating equations Multiple imputation EM algorithm (in SPSS software)
Frequency (%) of the 46 articles using each method as a primary or only method to accommodate missing data 20 10 5 5 7 10 1
(43) (22) (11) (11) (15) (21) (2)
6 (13) 3 4 4 1
(7) (9) (9) (2)
(15%), linear mixed models (21%), baseline observation carried forward (BOCF) (11%), other single imputation methods (11%), multiple imputation (9%), and generalized estimating equations (9%) (Table 5). Of the six articles that had specified pairings of primary and secondary (sensitivity) analyses, four used LOCF as at least one sensitivity analysis. All the pairings are listed in Supplemental Table 1/Appendix B at www.jclinepi.com. Methods for accommodating missing data were identified significantly more often in trials of clinical pain conditions than in those of experimental pain models (51/76 [67%] vs. 3/25 [12%], P ! 0.0001).
4. Discussion The results of this study demonstrate clear deficiencies in the reporting of statistical methods in nonpharmacologic trials published in three major pain journals. Only 46% of trials identified a primary analysis, and only 4% of these trials stated that the primary analysis was prespecified. Although the absence of a statement reporting prespecification does not necessarily indicate an actual lack of prespecification, the lack of information about prespecification makes it impossible for readers to accurately assess the rigor of the methods used and the validity of the results and conclusions presented. If a trial is registered on clinicaltrials.gov, a reader could investigate whether the primary analysis was prespecified and whether it matches the analysis reported in the article; however, having to do so is burdensome, and clinicaltrials.gov is not always updated appropriately when protocols are amended during the study and before unblinding. The identification of a primary analysis is less critical in exploratory trials, in which the goal is usually to generate hypotheses for future research as opposed to demonstrating treatment efficacy. Hence, the exploratory nature of a trial should be clearly reported so that readers understand the goal of the study. It is important
to emphasize that only 2 of the 101 trials examined were declared exploratory by the authors and therefore can be justified in not reporting a primary analysis. Although only 1 of the 29 trials testing experimental pain models was identified as being exploratory, these studies are often intended to be exploratory. This may explain why only 28% of experimental pain trials identified a primary analysis, whereas 52% of clinical pain condition trials did so, although after adjusting for multiple comparisons this difference was not statistically significant. Although secondary or unspecified analyses can be important for determining the focus of future research, we believe it would be beneficial for investigators conducting experimental pain studies to consider, on a case-by-case basis, whether identifying primary analyses in their trials would provide a more compelling evidence base for future research. The fact that only 52% of the clinical pain condition trials, which are less likely to be exploratory, identified a primary analysis demonstrates a substantial need for improvement. Of the articles that reported multiple primary analyses, fewer than 25% included an adjustment for multiplicity. None of the articles that failed to report a multiplicity adjustment stated that this was a limitation of the study. Although including an adjustment for multiple comparisons in primary analyses should be standard [7], acknowledgment of the absence of one would inform readers of the limitation so they could better interpret the study results. When an adjustment is reported, it is important that the authors state whether it was prespecified, which only occurred in one article. In a situation where two primary analyses using two primary outcome variables yield P-values of 0.03 and 0.04 and the group difference for both analyses is in the same direction, prespecification that both P-values must be less than 0.05 to reject the null hypothesis would support a conclusion that the treatment has a significant effect. However, if it was prespecified that the treatment effect was required to be statistically significant for only one outcome variable and the adjustment chosen was a Bonferroni correction, then the conclusion would be that the treatment does not have a significant effect. Thus, prespecification is essential to ensure that authors do not choose an adjustment method that best supports the desired conclusion. The results of this study also revealed deficiencies in reporting methods to accommodate missing data and inconsistencies between the methods reported and current methodological recommendations for accommodating missing data [4,19]. Of the 101 trials, only 56 reported using any method to accommodate missing data, and 7 of those trials used a complete case analysis, which simply excludes participants who did not provide complete data, as a primary or only method. Although common and simple, analyzing complete cases is insufficient as a primary method because it makes the unlikely assumption that the data are missing completely at random (Table 1) and does
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
not include partial data from incomplete cases in the analysis [21]. Single imputation methods (e.g., LOCF, BOCF) were also commonly used. Although they are straightforward to implement, single imputation methods are not recommended as primary methods for accommodating missing data. This is largely due to the unrealistic assumptions these methods make about the missing data mechanism. Additionally, single imputation methods do not incorporate the uncertainty associated with the data imputation process. This leads to underestimation of standard errors and increases the risk of spuriously declaring significant treatment effects [3,4,18,21]. Even when more realistic assumptions are made about the missing data mechanism (e.g., missing at random [3,4,18,21]) (Table 1), it is impossible to verify the validity of these assumptions for a given set of data. Therefore, statisticians recommend performing both a primary analysis and a secondary, or sensitivity, analysis that each use different assumptions [3,4,18,19]. Seven articles that identified a primary method did use a sensitivity analysis, but in three of these seven trials, a complete case analysis was used as the primary method, and LOCF was used as the sensitivity analysis, neither of which are based on realistic assumptions. Comparisons of reporting quality between large and small studies were not conducted because only six trials randomized 100 or more subjects to each arm. Additionally, the median total number of patients randomized was 83. Because only 5% of the 101 trials examined included more than 100 subjects per arm, it appears that much of the recent literature on nonpharmacologic treatments is comprised of smaller studies, which are more likely to be misinterpreted than larger studies and also subject to greater bias due to nonpublication of negative results [28e31]. Although small sample sizes are acceptable in studies meant to help shape the direction of future trials, it is difficult to determine the efficacy of a treatment using only the results of such studies. A recent article discussing cognitive behavioral therapy clinical trials in chronic pain patients even went so far as to state that studies with small sample sizes are of ‘‘low quality’’ and ‘‘should be ignored’’ [31]. However, it should be noted that 13% (n 5 13) of the trials in the sample used a randomized crossover design, which typically achieves adequate power with smaller sample sizes than other designs. These studies were generally smaller than the randomized parallel group trials, with a median of 36 subjects per arm and none exceeding 50 subjects. Because nonpharmacologic treatments are generally not regulated by governmental agencies and may be perceived to have less harm potential than pharmacologic or invasive interventions, it is possible that less rigorous methodology and statistical reporting characterizes publications of trials of these treatments. On the basis of a comparison of adverse event reporting in nonpharmacologic and pharmacologic trials that supports this notion [27,32], we hypothesized that reporting of statistical details related to efficacy might be
63
poorer in nonpharmacologic trials as well. However, although there are clear deficiencies in the statistical reporting of nonpharmacologic trials, these deficiencies do not seem to be appreciably different than those found in trials of pharmacologic and interventional treatments for pain [16,22]. Similar percentages of nonpharmacologic trials reported a primary outcome variable and primary analysis compared to pharmacologic/interventional trials (primary outcome variable: 62% vs. 63%; primary analysis: 46% vs. 52%) [22]. The overall proportions of studies reporting a method for accommodating missing data were also comparable, with 56% of nonpharmacologic trials and 45% of pharmacologic/interventional trials reporting a method [16]. Fewer nonpharmacologic studies (20%) reported a method to adjust for multiplicity than pharmacologic and interventional trials (45%) [22]). This difference could be due to several factors; for example, trials for nonpharmacologic treatments are typically not reviewed by regulatory agencies and nonpharmacologic trials could be seen as more exploratory in nature by investigators. This study has several limitations that should be acknowledged when interpreting the results. The first is that the analyses examined the quality of reporting various statistical methods, but not the quality of the practices actually used. Although this makes it difficult to draw conclusions regarding the methodology actually used in these trials, at the very least, our results reveal a distinct lack of comprehensive and rigorous reporting. Furthermore, unless many articles failed to comprehensibly report the statistical methods used, our results suggest that deficiencies in multiplicity adjustment and missing data accommodation are common. Comprehensive and clear reporting of these statistical methods is essential to ensure that readers are able to accurately interpret the results and evaluate the validity of the conclusions presented in clinical trial publications. Another limitation is that only the Methods and Results sections, as well as the last paragraph of the Introduction section, were searched for identification of the primary outcome variable, primary analysis, multiplicity adjustment, and missing data accommodation. This could lead to somewhat inflated estimates of reporting deficiencies if they were reported in other areas, but it is unlikely that these statistical issues would be covered in other locations without being mentioned in the Methods or Results sections. Nevertheless, it is important that these aspects of the trial methodology be discussed in the proper sections of the article, and authors should be encouraged to include information pertaining to statistical analyses in the Methods and Results sections to maximize the clarity of these analyses for readers. Trials were only counted as having adjusted for multiplicity in the primary analysis if the authors stated that they adjusted for all instances of multiplicity. If a study adjusted for some of the multiplicity present in the primary analysis, but not for all of the multiple analyses, then it was classified as not adjusting for multiplicity. Therefore, it is possible
64
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
that some of the trials classified as not including an adjustment did make an attempt to account for multiplicity but either did not adequately correct for the multiple comparisons or did not report any priority that the authors may have had on some of the analyses performed. Although we chose to only evaluate RCTs, the issues of multiplicity and accommodation of missing data are also important in nonrandomized or uncontrolled studies. The results presented here are not likely generalizable to these other types of studies. Future research should evaluate reporting issues in observational studies of nonpharmacologic treatments for pain. Furthermore, we did not evaluate methods of blinding or randomization, other issues that the Cochrane Handbook highlights that can contribute to biased conclusions in RCTs [33]. Finally, our analyses were conducted using trials chosen from three pain journals (i.e., European Journal of Pain, Journal of Pain, Pain), and thus, our results may not apply to clinical trials of nonpharmacologic treatments for pain published in other journals. In future research, it would be important to compare the reporting of the trials reviewed here, which were published in pain journals, with reporting in trials published in general medical journals and in journals specializing in psychological, rehabilitation, and CAM studies.
5. Conclusions Based on the results of this systematic review, we recommend that investigators conducting RCTs of pain treatments prespecify primary outcome variables, primary analyses (including primary outcome variables, statistical methods, and time points of assessment), methods for multiplicity adjustment, and methods for accommodating missing data. We encourage investigators to consult IMMPACT recommendations for guidelines on multiplicity adjustment [7], and the National Research Council report [4] and other discussions of missing data when selecting methods for missing data accommodation. These analyses should be comprehensively described in the Methods and Results sections of publications, and whether these were prespecified should be clearly identified. In the case of exploratory studies, authors should state that the study is exploratory and acknowledge any lack of identification or prespecification of these methodologic factors and the limitations that this may present. Finally, we recommend that editors and reviewers closely examine the methods described in submitted articles reporting results of nonpharmacologic trials of pain treatments and encourage comprehensive and rigorous reporting of the issues discussed previously.
Acknowledgments This article was reviewed and approved by the Executive Committee of the ACTTION publiceprivate partnership
with the U.S. Food and Drug Administration. The authors thank Sharon H. Hertz, MD, and Allison H. Lin, PharmD, PhD, from the U.S. Food and Drug Administration for their numerous contributions to ACTTION. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jclinepi.2015.10.019. References [1] Fleming TR. Clinical trials: discerning hype from substance. Ann Intern Med 2010;153:400e6. [2] Ioannidis JP. Why most published research findings are false. PLos Med 2005;2:e124. [3] Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med 2012;367:1355e60. [4] National Research Council. The prevention and treatment of missing data in clinical trials 2010. Available at http://www.nap.edu/catalog/ 12955.html. Accessed November 30, 2015. [5] O’Neill RT. Secondary endpoints cannot be validly analyzed if the primary endpoint does not demonstrate clear statistical significance. Control Clin Trials 1997;18:550e6. [6] Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials. A survey of three medical journals. N Engl J Med 1987;317:426e32. [7] Turk DC, Dworkin RH, Mcdermott MP, Bellamy N, Burke LB, Chandler JM, et al. Analyzing multiple endpoints in clinical trials of pain treatments: IMMPACT recommendations. Pain 2008;139: 485e93. [8] Sankoh AJ, Huque MF, Dubey SD. Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med 1997;16:2529e42. [9] Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979;6:65e70. [10] Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 1998;75:800e2. [11] Hommel G. A stepwise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 1988;75:383e6. [12] Tukey JW. Exploratory data anlysis. Reading, MA: Addison-Wesley; 1977. [13] Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 1955;50:1096e121. [14] Dmitrienko A, Wiens BL, Tamhane AC, Wang X. Tree-structured gatekeeping tests in clinical trials with hierarchically ordered multiple objectives. Stat Med 2007;26:2465e78. [15] Westfall P, Krishen A. Optimal, weighted, fixed sequence, and gatekeeping multiple testing procedures. J Stat Plan Infer 2001;99: 25e40. [16] Gewandter JS, McDermott MP, McKeown A, Smith SM, Farrar J, Hunsinger M, et al. Methods used to address missing data in recent analgesic clinical trials: ACTTION systematic review and recommendations. Pain 2014;155:1871e7. [17] Fleming TR. Addressing missing data in clinical trials. Ann Intern Med 2011;154:113e7. [18] Little RJ, Rubin D. Statistical analysis with missing data. 2nd ed. New York: John Wiley & Sons; 2002. [19] Molenberghs G, Kenward MG. Missing data in clincal studies. Chichester, UK: John Wiley & Sons; 2007. [20] O’Neill RT, Temple R. The prevention and treatment of missing data in clinical trials: an FDA perspective on the importance of dealing with it. Clin Pharmacol Ther 2012;91:550e4. [21] European Medicines Agency. Guideline on Missing Data in Confirmatory Clinical Trials: Available at: http://www.ema.europa.eu/
J.D. Dworkin et al. / Journal of Clinical Epidemiology 72 (2016) 56e65
[22]
[23]
[24]
[25]
[26]
[27]
docs/en_GB/document_library/Scientific_guideline/2010/09/WC500 096793.pdf. Accessed November 30, 2015 Gewandter JS, Smith SM, McKeown A, Burke LB, Hertz SH, Hunsinger M, et al. Reporting of primary analyses and multiplicity adjustment in recent analgesic clinical trials: ACTTION systematic review and recommendations. Pain 2014;155:461e6. Chou R, Qaseem A, Snow V, Casey D, Cross T Jr, Shekelle P, et al, For the Clinical Efficacy Assessment Subcommittee of the American College of Physicians and the American College of Physicians/American Pain Society Low Back Pain Guidance Panel. Diagnosis and treatment of low back pain: a joint clinical practice guideline from the American College of Physicians and the American Pain Society. Ann Intern Med 2007;147:478e91. Hauser W, Thieme K, Turk DC. Guidelines on the management of fibromyalgia syndromeda systematic review. Eur J Pain 2010;14: 5e10. Institute of Medicine. Relieving pain in America: a blueprint for transforming prevention, care, education, and research. Washington, DC: The National Academies Press; 2011. Bonnett MI, Closs SJ. Methodological issues in nonpharmacological trials for chronic pain. Anaesth Pain Intensive Care 2010;14: 49e55. Hunsinger M, Smith SM, Rothstein D, McKeown A, Parkhurst M, Hertz S, et al. Adverse event reporting in non-pharmacologic, non-
[28]
[29] [30]
[31]
[32]
[33]
65
interventional pain clinical trials: ACTTION systematic review. Pain 2014;155:2252e62. Moore RA, Eccleston C, Derry S, Wiffen P, Bell R, Straube S, et al, for the ACTINPAIN writing group of the IASP Special Interest Group on Systematic Reviews in Pain Relief and the Cochrane Pain, Palliative and Supportive Care Systematic Review Group editors. ‘‘Evidence’’ in chronic paindestablishing best practice in the reporting of systematic reviews. Pain 2010;150:386e9. Moore RA, Derry S, Wiffen PJ. Challenges in design and interpretation of chronic pain trials. Br J Anaesth 2013;111:38e45. Moore RA, Gavaghan D, Tramer MR, Collins SL, McQuay HJ. Size is everythingdlarge amounts of information are needed to overcome random effects in estimating direction and magnitude of treatment effects. Pain 1998;78:209e16. Morley S, Williams A, Eccleston C. Examining the evidence about psychological treatments for chronic pain: time for a paradigm shift? Pain 2013;154:1929e31. Smith SM, Chang RD, Pereira A, Shah N, Gilron I, Katz NP, et al. Adherence to CONSORT harms-reporting recommendations in publications of recent analgesic clinical trials: an ACTTION systematic review. Pain 2012;153:2415e21. Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions: Cochrane Book Series. Cambridge, UK: MRC Biostatistics Unit; 2008.