Pitfalls in reporting sample size calculation in randomized controlled trials published in leading anaesthesia journals: a systematic review

Pitfalls in reporting sample size calculation in randomized controlled trials published in leading anaesthesia journals: a systematic review

BJA Advance Access published June 3, 2015 British Journal of Anaesthesia, 2015, 1–9 doi: 10.1093/bja/aev166 Review Article REVIEW ARTICLE M. Abdulat...

NAN Sizes 1 Downloads 35 Views

BJA Advance Access published June 3, 2015 British Journal of Anaesthesia, 2015, 1–9 doi: 10.1093/bja/aev166 Review Article

REVIEW ARTICLE

M. Abdulatif*, A. Mukhtar, and G. Obayah Anaesthetic Department, Faculty of Medicine, Cairo University, Giza, Egypt *Corresponding author: 1 Al-Saray Street, Al-Manial, Cairo Postal Code 11559, Egypt. E-mail: [email protected]

Abstract We have evaluated the pitfalls in reporting sample size calculation in randomized controlled trials (RCTs) published in the 10 highest impact factor anaesthesia journals. Superiority RCTs published in 2013 were identified and checked for the basic components required for sample size calculation and replication. The difference between the reported and replicated sample size was estimated. The sources used for estimating the expected effect size (Δ) were identified, and the difference between the expected and observed effect sizes (Δ gap) was estimated. We enrolled 194 RCTs. Sample size calculation was reported in 91.7% of studies. Replication of sample size calculation was possible in 80.3% of studies. The original and replicated sample sizes were identical in 67.8% of studies. The difference between the replicated and reported sample sizes exceeded 10% in 28.7% of studies. The expected and observed effect sizes were comparable in RCTs with positive outcomes (P=0.1). Studies with negative outcome tended to overestimate the effect size (Δ gap 42%, 95% confidence interval 32–51%), P<0.001. Post hoc power of negative studies was 20.2% (95% confidence interval 13.4–27.1%). Studies using data derived from pilot studies for sample size calculation were associated with the smallest Δ gaps (P=0.008). Sample size calculation is frequently reported in anaesthesia journals, but the details of basic elements for calculation are not consistently provided. In almost one-third of RCTs, the reported and replicated sample sizes were not identical and the assumptions for the expected effect size and variance were not supported by relevant literature or pilot studies. Key words: research hypothesis, effect size; statistical power, sample size; study design, superiority trials

Editor’s key points • In this systematic review, the authors identify errors in sample size calculation in the leading anaesthesia journals. • Frequently, there were differences between the reported and the replicated sample sizes, with the assumptions for the expected effect size not supported by relevant data.

A randomized controlled trial (RCT) is the most frequently used, valid method to evaluate clinical interventions in medical

subspecialties.1 The Consolidated Standards of Reporting Trials (CONSORT) group developed a set of evidence-based recommendations to improve the quality of design, analysis, and interpretation of RCTs.2 Sample size calculation contributes to the quality of RCTs3 4 and is an important item in the CONSORT checklist.2 Adequate reporting of sample size calculation should normally include four main components: the expected minimal clinically relevant difference between the study groups, the  of measurements for continuous primary outcomes, the power of the study (generally set between 80 to 90%), and type I error (usually 5%).5 Charles and colleagues6 demonstrated that 43% of RCTs published in the six highest impact factor medical journals did not

© The Author 2015. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: [email protected]

1

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

Pitfalls in reporting sample size calculation in randomized controlled trials published in leading anaesthesia journals: a systematic review

Abdulatif et al.

report all the required parameters for sample size calculation. Moreover, in 30% of the reports that gave enough data to recalculate the sample size, the variation in the replicated sample size was greater than 10%.6 Major deficiencies in sample size calculations were also reported by other specialties.7–11 Previous reports indicated significant improvement in the frequency of reporting of sample size calculation from 52 to 86% in anaesthesia journals.12 13 However, the integrity of sample size reporting for RCTs published in anaesthesia literature was not previously evaluated. This systematic review evaluated the pitfalls in reporting sample size calculation in parallel-group superiority RCTs published in the 10 highest impact factor anaesthesia journals during a 1 year period.

Methods

(Fig. 1). The choice of the included journals was based on their impact factor ranking at the time of conceptualization of this systematic review. All superiority RCTs with parallel-group designs were included in the analysis. Randomized controlled trials with crossover or factorial design, those involving animals, and simulation or manikin studies were excluded. Two of the authors (M.A. and A.M.) independently extracted data from the full text of the selected articles relating to the design and sample size calculation in duplicate. Any disagreement was resolved by discussion. We created a checklist to assess the frequency and adequacy of reporting of sample size estimation (Table 1). Sample size calculation was considered to be done if the authors explicitly stated this in the methodology section. Furthermore, we checked for the availability of a clearly specified primary outcome measure, the basic components of sample size calculations, 15 and the source for estimating the minimal expected effect size. An effect size (Δ) is a standardized measure that describes the magnitude of the difference between study groups. 16 The effect size was calculated using formulae outlined in Appendix I according to the type of the primary outcome and the number of the groups in each RCT 16–18 (Appendix I).

Included journals* Acta Anaesthesiologica Scandinavica - Anaesthesia - Anesthesiology - Anesthesia and Analgesia British Journal of Anaesthesia - Canadian journal of Anesthesia - European Journal of Anaesthesiology - Journal of Neurosurgical Anesthesiology - Pediatric Anesthesia Regional Anesthesia and Pain Medicine

Studies assessed for eligibility: 231

Included studies: 194

Excluded studies: 37 - Crossover design: 14 - Non-inferiority: 8 - Equivalence: 3 - Manikin/simulation: 4

Sample size calculation was not reported: 16

Sample size calculation was reported: 178

- Non-randomized: 3 - Factorial design: 1 - Step-up/step-down dose finding: 1 - Premature termination: 1

Sample size calculation was replicated: 143

Sample size calculation was not replicated: 35

- French language: 1 - Conflict of interest: 1†

Fig 1 Flow chart of the included and excluded studies and details of sample size replication. *Journals are listed in alphabetical order. †The authors of this review contributed to this excluded study.14

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

The Medline database was searched via ‘Pubmed’ using the search terms ‘randomized controlled trials’, ‘controlled clinical trial’, and ‘randomized controlled trial’ for the period from January to December 2013. The tables of contents and the abstracts of clinical reports published in all issues of 10 of the leading anaesthesia journals were also screened to identify the relevant RCTs

| 2

3

| Pitfalls in sample size calculation

Table 1 Data extraction checklist

• Primary outcome measure clearly specified • Type of primary outcome measure • Sample size estimation was described in the methodology section

• Source for justification and magnitude of the expected • • • • • •

• •

The number of patients randomized and the observed effect sizes were extracted from the results section. We also noted whether the results of the trial were statistically significant for the primary outcome. The reviewers were not blinded to the journal name and authors. Sample size replication was primarily done using the same software and statistical test specified by the authors. In studies not reporting sample size estimation software, the Power Analysis and Sample Size (PASS 13 software; NCSS, LLC, Kaysville, UT, USA) was used for replication. Replicate estimations were done for studies providing all the required components for sample size estimation. For RCTs missing the details of the statistical test, we used the formulae adapted for χ2 test and two-tailed Student’s t-test for binary and continuous end points, respectively.19–21 When adjustments for multiple testing were not reported, we assumed no adjustment had been applied. Sample size replication was not possible for studies that did not report the expected effect size. The difference between the reported and replicated sample size was defined as the reported sample size minus the recalculated sample size divided by the reported sample size.6 A difference of 10% or more indicated important deviation from the reported sample size6 and was confirmed by expert statistical consultation.6 8 The magnitude of the expected and observed dropout rates was recorded. The difference between the expected minimal effect size and the observed effect size in the primary outcome was considered the Δ gap.22 The magnitude of the Δ gap was compared between studies with positive and negative outcomes. Furthermore, the impact of the source for assumption of the expected effect size on the Δ gap was evaluated. Post hoc power calculation was performed using standardized formulae for each study that reported a non-significant difference in the primary outcome.23

Statistical analysis Continuous variables are expressed as means () or medians (interquartile range; IQR) and discrete variables as counts

Parameter

Number of studies

Expected effect size in studies with continuous primary outcome Reported 111 (62.4%) Not reported 4 (2.2%) Expected effect size in studies with binary primary outcome Reported 56 (31.5%) Not reported 7 (3.9%) Expected variance in studies with continuous primary outcome Reported 87 (48.9%) Not reported 28 (15.7%) Source for justification of the expected effect size Relevant published studies 92 (51.7%) Pilot study 29 (16.3%) Unspecified 57 (32.0%) Hypothesis testing Two tailed 50 (28.1%) One tailed 1 (0.6%) Unspecified 127 (71.3%) Type I error (α) 0.05 173 (97.2%) 0.01 5 (2.8%) Number of studies including more than two 36 (20.2%) groups Number of studies that compensated for 3 (1.7%) multiple comparisons Power of study 0.7 1 (0.6%) 0.8 131 (73.6%) 0.85 3 (1.7%) 0.87 1 (0.6%) 0.9 37 (20.7%) 0.95 5 (2.8%)

( percentage), unless otherwise stated. We calculated the expected and observed Δ values and dropout rates across all trials and compared them using Student’s paired t-test. Differences among the groups were analysed using Mann–Whitney U-test or Kruskal–Wallis test, as appropriate. The software SPSS v15.0 for Windows (SPSS, Inc., Chicago, IL, USA) was used for statistical analysis.

Results A total 231 RCTs were identified during literature search. Of these reports, 194 studies were enrolled in the analysis and 37 did not fulfil the prespecified inclusion criteria.14 Sixteen studies (8.3%) did not report any calculation for sample size, while 178 (91.7%) reported one or more of the essential parameters of sample size estimation (Fig. 1). Most of the included studies did not specify the statistical test and the use of one-tailed or two-tailed hypothesis for sample size calculation. The α error was most frequently set at 5%, and the reported power of the studies ranged between 70 and 95%. Thirty-four of the enrolled studies included more than two groups, but only three studies used Bonferroni correction to adjust the level of α error to compensate for multiple comparisons (Table 2). Replication of sample size calculation was

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

• •

effect size (Δ) Power of the study and α level The expected and reported dropout rates Number of study groups Type of statistical test used for sample size calculation Hypothesis testing: one-tailed, two-tailed, or unspecified Compensation for multiple comparison in studies including more than two groups Software used for sample size calculation The final number of patients analysed in each of the study groups Difference between reported and replicated sample size Results of the primary outcome: positive vs negative

Table 2 Reported parameters required for sample size calculation in the 178 included studies. Values are expressed as number ( percentage)

Abdulatif et al.

The results of this systematic review indicated that sample size calculation is frequently reported, in 91.7% of RCTs published in leading anaesthesia journals. However, the details of the basic elements required for appropriate sample size calculation or replication are not consistently reported. In almost one-third of the enrolled RCTs, the variation in the replicated sample size was greater than 10% and the assumption for the expected minimal effect size was not supported by relevant published literature or pilot studies. Studies with a negative outcome were associated with significant overestimation of the expected effect

Table 3 Details of the reported and replicated sample sizes and dropout rates (143 studies). Values are expressed as number ( percentage) or median [interquartile range (IQR)]. *The expected dropout rate was significantly higher than the observed rate (P<0.0001) Reported sample size Not finally achieved Studies compensating for possible dropouts Expected dropout rate Observed dropout rate Replicated sample size Identical to reported More than the reported Less than the reported Difference exceeded 10% Difference less than 10%

100

*

50

D gap (%)

Discussion

size. In contrast, the least variations between the expected and observed effect sizes were encountered in RCTs using pilot studies as a source for estimating the minimal expected effect size. In line with the findings of this systematic review, variations between the reported and replicated sample sizes have been reported at the protocol development stage,24 25 in trial registration databases,26 and after publication in high-impact medical journals.6 This finding is difficult to explain and could be related to the suboptimal cooperation between expert statisticians and clinical investigators. Involvement of a statistician in sample size calculation has been recommended15 27 and was used in the present systematic review. Using multivariate logistic regression, Koletsi and colleagues8 recently reported that the involvement of a methodologist in planning statistical analysis and trial design is associated with two-fold higher odds in adequate reporting of sample size calculation (odds ratio=1.97, 95% CI 1.10–3.53). However, the final outcome of sample size estimation cannot be guaranteed without a clear estimate and justification for the expected clinically important effect size by the investigator.28 29 The rationale for choosing a predetermined magnitude for the expected effect size was not adequately justified in RCTs included in this systematic review. A recent evaluation of research protocols submitted to UK ethics committees indicated that only 3% provided a reasoned explanation for the chosen effect size.24 The effect size is one of the most important indicators of clinical significance. Unfortunately, this fact is frequently overlooked, and many researchers mistakenly relate statistically significant outcomes with clinical relevance.30 The concept of clinical relevance in the primary outcome is identified in literature by what is called the minimal clinically important difference (MCID), defined as ‘the smallest change that patients perceive as beneficial or detrimental, and is useful to aid the clinical interpretation of health status data, particularly in response to intervention’.31–33 The use of the MCID in calculation of appropriate sample size for RCTs is expected to improve their clinical relevance and offers a threshold above which outcome is experienced as relevant by the patient.34 The concept of MCID was not addressed in the RCTs included in this systematic review, but the source used for estimating the

0

26 (18.2%) 92 (64.3%) 16.8% (IQR 9.5–20.3%) 4.6% (IQR 0–6.6%)* 97 (67.8%) 29 (20.3%) 17 (11.9%) 41 (28.0%) 5 (3.5%)

–50 Unspecified

Previous literature

Pilot study

Fig 2 The impact of the source of prediction of the magnitude of effect size on the calculated Δ gaps (the difference between the observed and expected effect size). Box represents median observations (horizontal rule), with 25th and 75th percentiles of observed data (top and bottom of box). Whiskers indicate the maximal and minimal values. *Denotes significance relative to the other two groups, P=0.008.

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

possible in 143 (80.3%) studies. In 35 (19.7%) studies, replication was not possible because of unspecified expected effect size (Fig. 1). Our analysis indicated that the replicated and reported sample sizes were different in 46 (32.2%) studies. This difference exceeded 10% in 41 (28.7%) studies. Ninety-two studies reported a compensation for a possible dropout rate. However, the observed dropout rate at the conclusion of the study was significantly lower than the expected rate (Table 3). The median expected effect sizes for dichotomous and continuous outcomes were 0.5 (IQR 0.4–0.63) and 0.8 (IQR 0.6–1.0), respectively. The median observed effect sizes for dichotomous and continuous outcomes were 0.42 (IQR 0.2–0.6) and 0.7 (IQR 0.3–1.0), respectively. The overall median Δ gaps for dichotomous and continuous outcomes were 9% (IQR −5 to 30%) and 9% (IQR −30 to 30%), respectively. In studies with positive outcome (73.7%), the expected and observed effect sizes were not significantly different [Δ gap −6.0%, 95% confidence interval (CI) −14 to 2.0%], P=0.1. In the studies with negative outcomes (26.3%), the expected effect size was significantly higher than the observed effect size (Δ gap 42%, 95% CI 32–51%), P<0.001. The mean (95% CI) post hoc power of studies with negative outcome was 20.2% (13.4–27.1%). The assumption for the expected difference and variance between the control and intervention groups was based on relevant published reports in 92 (51.7%) studies and on pilot studies in 29 (16.3%) studies, while 57 (32.0%) studies did not specify the source of the expected difference. The Δ gaps were smaller for RCTs using pilot studies for estimating the magnitude of the expected effect size (P=0.008; Fig. 2).

| 4

5

| Pitfalls in sample size calculation

Furthermore, it is generally considered unethical to enrol participants in underpowered trials with low potential to provide a reliable answer to the research question.47 The same ethical concern extends to exposing patients to unjustified risks in large clinical trials with overestimated sample size.48 The dropout rate during the conduct of RCTs may complicate statistical analysis by introducing bias in the findings and reducing statistical power.49 Long-term follow-up has been associated with increased dropout rate.50 Apart from the relatively high dropout rate that might be encountered in chronic pain studies,51 RCTs evaluating anaesthetic interventions tend to have short follow-up periods. This could account for the relatively low (4.6%) observed attrition rate in the present systematic review. This finding could be used as a guide in optimizing sample size calculation for future RCTs in anaesthesia. Appropriate sample size estimation commonly integrates scientific precision and available resources.52 There is currently an emerging approach incorporating study-specific optimal α and β errors that considers the clinically relevant effect size and the cost rather than the statistical significance in sample size estimation.53 54 The ultimate goals of this approach are to improve the feasibility of the study and to minimize the consequences of type I and II errors on clinical decision making. In most of the RCTs included in this review, the α and β errors were commonly set by convention at a level of 5 and 20%, respectively. Studies that compared more than two groups did not indicate that adjustment of α error was considered to compensate for multiple comparisons. A similar observation was reported in a recent review of the integrity of sample size determination in original research protocols.24 Anaesthesia trials frequently use a longitudinal design to assess the change in groups’ outcome at multiple time points.55 Multiple end points, multiple treatment arms, or interim analyses of the data require α error adjustment.56 It has been estimated that reducing the α error from 0.05 to 0.01 to compensate for five multiple tests will almost double the minimal sample size at a study power of 90%.37 Furthermore, estimation of sample size among studies with repeated measures mandates identification of variance at each time measurement and the pattern of correlation among pairs of repeated measurements.57 Therefore, for a multi-arm study or a study with repeated measures, it would be useful to conduct a pilot study to provide sufficient details about the pattern of mean, mean difference, and the variances of response variable at each time point.58 Inadequate description of sample size calculation continues to be reported7–11 despite the recommendations of the CONSORT2 and the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT)59 groups. Adherence to these recommendations has been reported to be suboptimal in anaesthesia,60 critical care,61 and other medical journals.62 63 Furthermore, the instruction and obligation to adhere to all the elements in the CONSORT checklist in journal author guides has been reported to be heterogeneous.64–67 We therefore developed a simple structured algorithm based on our results and on review of the literature to guide investigators to conduct and report sample estimation appropriately (Fig. 3). We also suggest that the transparency could be improved by adopting an obligatory short checklist of all the basic elements of sample size calculation at the different stages of designing and reporting of RCTs. Our assumption is in line with the recommendation of the World Health Organization Minimal Registration Data Set68 and is supported by the finding that perfect 100% compliance was reported in obligatory data, while a variable adherence was observed in optional data in one of the most popular trial registration databases.69

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

expected effect size and variance was identified in 68% of studies and was mostly derived from relevant published literature and pilot studies. Our finding is in line with a recent review by Clark and colleagues,24 who reported that only 59% of the protocols for pain and anaesthesia RCTs submitted to UK research ethics committees reported the source for the value of expected effect size. A recent comprehensive systematic review and survey identified seven potential methods for specifying the target expected effect size in sample size calculation, with wide variability in awareness and use among expert triallists in UK and Ireland.35 The use of pilot studies was recognized by 90% of responders as a source for identifying the minimal expected effect size required for sample size calculation.35 Clinical trials using data derived from pilot studies for sample size calculation were associated with the smallest Δ gaps between the expected and observed effect sizes. However, pilot studies are generally underpowered and could yield an imprecise estimate of the difference in the population.36 37 The use of the upper limit of the 95% CI of the variance derived from internal pilot studies has been shown to improve the precision of sample size estimation.38 For RCTs using variance derived from an external pilot study or published literature, the use of the 60% upper confidence limit of  has been recommended.39 An additional strategy to optimize sample size calculation is the use of a prespecified adaptive approach that allows recalculation of the sample size during the course of a clinical trial, with subsequent adjustments to the initially planned size.40 This approach is based on revision of the event rate, the variance, or the treatment effect.41 The expected and observed effect sizes were significantly different only in studies with negative outcomes. This might indicate overestimation of the expected treatment effect in the intervention group. Consistent with our findings, Aberegg and colleagues22 investigated the possible bias in the design of RCTs reporting negative results in mortality in adult critical care settings. The mean expected and observed differences in mortality rates were 10.1 and 1.4%, respectively (P<0.0001), with a Δ gap of 8.7%.22 Two possible explanations were suggested to account for the increased Δ gap in negative trials; first, the investigators might adjust the expected effect size to reduce sample size, ‘sample size samba’,8 27 and second, the unrealistic optimism about the efficacy of treatment in the intervention study arm.36 42 This systematic review elicited several factors that might be associated with unrecognized underpowered RCTs; these include: inappropriate estimation of the expected effect size, incorrect mathematical estimation of sample size, failure to achieve the minimal target sample size, and waiving of sample size calculation. An earlier study assessing the quality of RCTs published in anaesthesia literature reported that 72% of studies that had negative results did not consider the possibility of an underlying type II error and inadequate a priori sample size estimation.12 It has been suggested that many of the published anaesthesia studies are underpowered.3 43 This is supported by the results of our post hoc power calculation for RCTs with negative outcome. It should be noted, however, that combining the MCID with the 95% CI has been suggested as a valuable strategy for a more appropriate clinical interpretation of the outcome of negative RCTs.30 44 45 If the MCID lies within the 95% CI of the observed effect size, the treatment may be clinically effective regardless of the point of estimate.30 Randomized controlled trials that are too small could be deceiving, either by missing sensibly direct treatment impacts that would be clinically paramount or by overestimating the effect of treatment and finding it significant merely by chance.46 47

Abdulatif et al.

Identify primary outcome measure

Continuous outcome

Binary outcome

Justify expected effect size/variance*

Pilot study

Relevant published literature

Improve the precision of variance

Pilot study: 95% UCL of SD

Literature: 60% UCL of SD

Continuous 2 groups: t-test

Binary 2 groups: Z/c2

Continuous >2 groups: ANOVA

Binary >2 groups: c2

Specify study power (80–90%)

Magnitude of expected effect size

Cost, logistics and resources

Specify alpha error (significance)†

Conventional: 0.05

Support critical decisions: 0.01

Compensate for possible dropouts

Short follow up: 5–10%

Long follow up: Pilot/literature

Estimate minimum sample size

Adequately report all details

| 6

Chose appropriate statistical test

of the expected variance is required for continuous primary outcomes. †Adjustment of α for multiple comparison is required for studies including more than two groups. , analysis of variance; UCL, upper confidence limits.

Our review has some limitations. First, we have restricted our reappraisal of sample size estimation to parallel-group superiority RCTs to ensure uniform conclusions; therefore, our findings cannot be extended to other types of RCTs or other study designs. Second, our literature search covered all issues of the 10 highest impact factor anaesthesia journals published in 1 year. Probably, a wider spectrum could have pointed out additional shortcomings in sample size calculation. However, we planned to have the most up-to-date status of this issue by including only RCTs published in 2013. Third, we extracted relevant sample size data from published RCTs. Comparisons with the original study protocols of registered trials were not considered. Registration of intervention clinical trials is increasingly required in anaesthesia70 and other medical specialties.71 However, at the time of preparation of this systematic review, some of the selected journals were not yet adopting a compulsory prior trial registration policy.72 In conclusion, the results of this systematic review indicated that despite a high frequency (91.7%) of sample size reporting, some of the required basic assumptions for calculation were deficient or not supported by plausible reasoning in 19.7 and 32% of studies, respectively. This could explain the large differences between the design assumptions and the observed data. The continued suboptimal reporting of sample size estimation calls for more strict guidelines at the different stages of designing and reporting of RCTs. We suggest that the use of our proposed simple algorithmic approach at the design stage and an obligatory checklist in journals’ author guides could improve the integrity

of reporting sample size calculation. A follow-up future evaluation will be required to validate this recommendation.

Authors’ contributions M.A. and A.M. contributed to the design of the review, data extraction, analysis of the results, writing, revision, and final approval of the manuscript. G.O. contributed to tabulation of data, analysis, critical revision, and final approval of the manuscript.

Acknowledgements We would like to acknowledge the support and advice of Professor Jeffrey S. Kane for data analysis and optimization of sample size replication.

Declaration of interest M.A. is a member of the associate editorial board of the British Journal of Anaesthesia. A.M. and G.O. have no conflict of interest to declare.

Funding Departmental resources.

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

Fig 3 Algorithm to improve detailed transparent reporting of sample size estimation for parallel-group superiority randomized controlled clinical trials. *Estimation

7

| Pitfalls in sample size calculation

References

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

1. Sessler DI, Devereaux PJ. Emerging trends in clinical trial design. Anesth Analg 2013; 116: 258–61 2. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol 2010; 63: e1–37 3. Pua HL, Lerman J, Crawford MW, Wright JG. An evaluation of the quality of clinical trials in anesthesia. Anesthesiology 2001; 95: 1068–73 4. Todd MM. Clinical research manuscripts in Anesthesiology. Anesthesiology 2001; 95: 1051–3 5. Whitley E, Ball J. Statistics review 4: sample size calculations. Crit Care 2002; 6: 335–41 6. Charles P, Giraudeau B, Dechartres A, Baron G, Ravaud P. Reporting of sample size calculation in randomised controlled trials: review. Br Med J 2009; 338: b1732 7. Nikolakopoulos S, Roes KC, van der Lee JH, van der Tweel I. Sample size calculation in pediatric clinical trials conducted in an ICU: a systematic review. Trials 2014; 15: 274 8. Koletsi D, Fleming PS, Seehra J, Bagos PG, Pandis N. Are sample sizes clear and justified in RCTs published in dental journals? PLoS One 2014; 9: e85949 9. Ayeni O, Dickson L, Ignacy TA, Thoma A. A systematic review of power and sample size reporting in randomized controlled trials within plastic surgery. Plast Reconstr Surg 2012; 130: 78e–86e 10. Abdul Latif L, Daud Amadera JE, Pimentel D, Pimentel T, Fregni F. Sample size calculation in physical medicine and rehabilitation: a systematic review of reporting, characteristics, and results in randomized controlled trials. Arch Phys Med Rehabil 2011; 92: 306–15 11. Weaver CS, Leonardi-Bee J, Bath-Hextall FJ, Bath PM. Sample size calculations in acute stroke trials: a systematic review of their reporting characteristics and relationship with outcome. Stroke 2004; 35: 1216–24 12. Greenfield ML, Rosenberg AL, O’Reilly M, Shanks AM, Sliwinski MJ, Nauss MD. The quality of randomized controlled trials in major anesthesiology journals. Anesth Analg 2005; 100: 1759–64 13. Greenfield ML, Mhyre JM, Mashour GA, Blum JM, Yen EC, Rosenberg AL. Improvement in the quality of randomized controlled trials among general anesthesiology journals 2000 to 2006: a 6-year follow-up. Anesth Analg 2009; 108: 1916–21 14. Abdulatif M, Ahmed A, Mukhtar A, Badawy S. The effect of magnesium sulphate infusion on the incidence and severity of emergence agitation in children undergoing adenotonsillectomy using sevoflurane anaesthesia. Anaesthesia 2013; 68: 1045–52 15. Noordzij M, Tripepi G, Dekker FW, Zoccali C, Tanck MW, Jager KJ. Sample size calculations: basic principles and common pitfalls. Nephrol Dial Transplant 2010; 25: 1388–93 16. Fritz CO, Morris PE, Richler JJ. Effect size estimates: current use, calculations, and interpretation. J Exp Psychol Gen 2012; 141: 2–18 17. Ellis PD. The Essential Guide to Effect Sizes: Statistical Power, MetaAnalysis, and the Interpretation of Research Results. Cambridge, UK: Cambridge University Press, 2010 18. Cohen J. Statistical Power Analysis for the Behavioral Sciences, 2nd Edn. New Jersey, USA: Lawrence Erlbaum Associates, 1988 19. Jones SR, Carley S, Harrison M. An introduction to power and sample size estimation. Emerg Med J 2003; 20: 453–8

20. Moyé LA, Tita AT. Defending the rationale for the two-tailed test in clinical research. Circulation 2002; 105: 3062–5 21. Columb MO. One-sided P values, repeated measures analysis of variance: two sides of the same story! Anesth Analg 1997; 84: 701–2 22. Aberegg SK, Richards DR, O’Brien JM. Delta inflation: a bias in the design of randomized controlled trials in critical care medicine. Crit Care 2010; 14: R77 23. Bausell RB, Li Y-F. Power Analysis for Experimental Research. A Practical Guide for the Biological, Medical and Social Sciences. Cambridge, UK: Cambridge University Press, 2002 24. Clark T, Berger U, Mansmann U. Sample size determinations in original research protocols for randomised clinical trials submitted to UK research ethics committees: review. Br Med J 2013; 346: f1135 25. Chan AW, Hróbjartsson A, Jørgensen KJ, Gøtzsche PC, Altman DG. Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols. Br Med J 2008; 337: a2299 26. Huic´ M, Marušic´ M, Marušic´ A. Completeness and changes in registered data and reporting bias of randomized controlled trials in ICMJE journals after trial registration policy. Plos One 2011; 6: e25258 27. Schulz KF, Grimes DA. Sample size calculations in randomised trials: mandatory and mystical. Lancet 1005; 365: 1348–53 28. Adams-Huet B, Ahn C. Bridging clinical investigators and statisticians: writing the statistical methodology for a research proposal. J Investig Med 2009; 57: 818–24 29. Gibbs NM, Weightman WM. Beyond effect size: consideration of the minimum effect size of interest in anesthesia trials. Anesth Analg 2012; 114: 471–5 30. Page P. Beyond statistical significance: clinical interpretation of rehabilitation research literature. Int J Sports Phys Ther 2014; 9: 726–36 31. Jones PW, Beeh KM, Chapman KR, Decramer M, Mahler DA, Wedzicha JA. Minimal clinically important differences in pharmacological trials. Am J Respir Crit Care Med 2014; 189: 250–5 32. Kvien TK, Heiberg T, Hagen KB. Minimal clinically important improvement/difference (MCII/MCID) and patient acceptable symptom state (PASS): what do these concepts mean? Ann Rheum Dis 2007; 66 Suppl 3: iii40–1 33. Kon SS, Canavan JL, Jones SE, et al. Minimum clinically important difference for the COPD assessment test: a prospective analysis. Lancet Respir Med 2014; 2: 195–203 34. Clement ND, Macdonald D, Burnett R. Predicting patient satisfaction using the Oxford knee score: where do we draw the line? Arch Orthop Trauma Surg 2013; 133: 689–94 35. Cook JA, Hislop J, Adewuyi TE, et al. Assessing methods to specify the target difference for randomised controlled trial: DELTA (Difference ELicitation in TriAls) review. Health Technol Assess 2014; 18: 1–175 36. Vickers AJ. Underpowering in randomized trials reporting a sample size calculation. J Clin Epidemiol 2003; 56: 717–20 37. Wittes J. Sample size calculations for randomized controlled trials. Epidemiol Rev 2002; 24: 39–53 38. Sim J, Lewis M. The size of a pilot study for a clinical trial should be calculated in relation to consideration of precision and efficiency. J Clin Epidemiol 2012; 65: 301–8 39. Chen H, Zhang N, Lu X, Chen S. Caution regarding the choice of standard deviations to guide sample size calculations in clinical trials. Clin Trials 2013; 10: 522–9

Abdulatif et al.

62. Turner L, Shamseer L, Altman DG, et al. Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals. Cochrane Database of Systematic Reviews 2012, 11: MR000030 63. Strech D, Soltmann B, Weikert B, Bauer M, Pfennig A. Quality of reporting of randomized controlled trials of pharmacologic treatment of bipolar disorders: a systematic review. J Clin Psychiatry 2011; 72: 1214–21 64. Folkes A, Urquhart R, Grunfeld E. Are leading medical journals following their own policies on CONSORT reporting? Contemp Clin Trials 2008; 29: 843–6 65. Hopewell S, Altman DG, Moher D, Schulz KF. Endorsement of the CONSORT statement by high impact factor medical journals: a survey of journal editors and journal ‘Instructions to Authors’. Trials 2008; 9: 20 66. Schriger DL, Arora S, Altman DG. The content of medical journal instructions for authors. Ann Emerg Med 2006; 48: 743–9 67. Altman DG. Endorsement of the CONSORT statement by high impact medical journals: survey of instructions for authors. Br Med J 2005; 330: 1056–7 68. Moja LP, Moschetti I, Nurbhai M, et al. Compliance of clinical trial registries with the World Health Organization minimum data set: a survey. Trials 2009; 10: 56 69. Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.gov: a crosssectional analysis. PloS Med 2009; 6: 1–9 70. Myles PS. Trial registration for anaesthesia studies. Br J Anaesth 2013; 110: 2–3 71. Huser V, Cimino JJ. Evaluating adherence to the International Committee of Medical Journal Editors’ policy of mandatory, timely clinical trial registration. J Am Med Inform Assoc 2013; 20: e169–74 72. Wager E, Elia N. Why should clinical trials be registered? Eur J Anaesthesiol 2014; 31: 397–400

Appendix I: Formulae used for effect size calculations: Continuous primary outcome • For studies including two groups, the effect size was calculated by taking the difference in means and dividing that number by the  of the control group (Glass’s ▵).16 17 • For studies including three or more groups, the effect size was calculated as follows. We first calculated Cohen’s d as follows: d¼

mmax  mmin σ

where mmax and mmin are the highest and lowest means, respectively, and σ is the  within population. Then d was transformed to f using the following formula (Cohen’s f ): rffiffiffiffiffiffi 1 2k

f ¼d

where k is the number of population means.18

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015

40. Friede T, Kieser M. Sample size recalculation in internal pilot study designs: a review. Biom J 2006; 48: 537–55 41. Wittes J, Brittain E. The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med 1990; 9: 65–71 42. Chalmers I, Matthews R. What are the implications of optimism bias in clinical research? Lancet 2006; 367: 449–50 43. Myles PS. Stopping trials early. Br J Anaesth 2013; 111: 133–5 44. Levine M, Ensom MH. Post hoc power analysis: an idea whose time has passed? Pharmacotherapy 2001; 21: 405–9 45. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med 1994; 121: 200–6 46. Avidan MS, Wildes TS. Power of negative thinking. Br J Anaesth 2015; 114: 3–5 47. Guyatt GH, Mills EJ, Elbourne D. In the era of systematic reviews, does the size of an individual trial still matter? PLoS Med 2008; 5: e4 48. Altman DG. Statistics and ethics in medical research: III How large a sample? Br Med J 1980; 281: 1336–8 49. Bell ML, Kenward MG, Fairclough DL, Horton NJ. Differential dropout and bias in randomised controlled trials: when it matters and when it may not. Br Med J 2013; 346: e8668 50. Fewtrell MS, Kennedy K, Singhal A, et al. How much loss to follow-up is acceptable in long-term randomised trials and prospective studies? Arch Dis Child 2008; 93: 458–61 51. Gehling M, Hermann B, Tryba M. Meta-analysis of dropout rates in randomized controlled clinical trials: opioid analgesia for osteoarthritis pain. Schmerz 2011; 25: 296–305 52. Kopman AF, Lien CA, Naguib M. Neuromuscular dose– response studies: determining sample size. Br J Anaesth 2011; 106: 194–8 53. Mudge JF, Baker LF, Edge CB, Houlahan JE. Setting an optimal α that minimizes errors in null hypothesis significance tests. PLoS One 2012; 7: e32734 54. Bacchetti P. Current sample size conventions: flaws, harms, and alternatives. BMC Med 2010; 8: 17 55. Ma Y, Mazumdar M, Memtsoudis SG. Beyond repeatedmeasures analysis of variance: advanced statistical methods for the analysis of longitudinal data in anesthesia research. Reg Anesth Pain Med 2012; 37: 99–105 56. Baron G, Perrodeau E, Boutron I, Ravaud P. Reporting of analyses from randomized controlled trials with multiple arms: a systematic review. BMC Med 2013; 11: 84 57. Guo Y, Logan HL, Glueck DH, Muller KE. Selecting a sample size for studies with repeated measures. BMC Med Res Methodol 2013; 13: 100 58. Brooks GP, Johanson GA. Sample size consideration for multiple comparison procedures in ANOVA. JMASM 2011; 10: 97–109 59. Chan AW, Tetzlaff JM, Gøtzsche PC, et al. SPIRIT 2013 explanation and elaboration: guidance for protocols of clinical trials. Br Med J 2013; 346: e7586 60. Can OS, Yilmaz AA, Hasdogan M, et al. Has the quality of abstracts for randomised controlled trials improved since the release of Consolidated Standards of Reporting Trial guideline for abstract reporting? A survey of four high-profile anaesthesia journals. Eur J Anaesthesiol 2011; 28: 485–92 61. Latronico N, Metelli M, Turin M, Piva S, Rasulo FA, Minelli C. Quality of reporting of randomized controlled trials published in Intensive Care Medicine from 2001 to 2010. Intensive Care Med 2013; 39: 1386–95

| 8

9

| Pitfalls in sample size calculation

Binary primary outcome • For studies including two groups, the effect size was calculated using the following formula: ð p1  p2Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ð1  PÞ  P

where p1 and p2 are the proportions in the two groups and  ¼ ð p1 þ p2Þ=2 is the mean of the two values.5 P

• For studies with more than two proportions, χ2 statistics was used to estimate the effect size.18 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m uX ðOik  Eik Þ ω¼t Eik i

where ω is the effect size, Oik and Eik denote the observed and expected proportion, respectively, m is the total number of cells and i the cell number. Handling editor: J. G. Hardman

Downloaded from http://bja.oxfordjournals.org/ at University of Manitoba on June 12, 2015