Randomized Controlled Trials in Pediatric Surgery: Could We Do Better? By Joe I. Curry, Barnaby Reeves, and M a r k D. Stringer
London, England
Background/Purpose: Randomized controlled trials (RCTs) are accepted as the gold standard for assessing the effectiveness of clinical interventions but are rarely reported in pediatric surgery. Have RCTs submitted to the British Association of Paediatric Surgeons (BAPS) Annual Congress during the last 5 years been adequately designed and large enough to produce a valid result? Methods: Abstracts accepted by the Annual BAPS Congress meetings between 1996 and 2000 were examined in collaboration with a senior health services researcher. The quality of the design, methodology, statistical analysis and conclusions, and the adequacy of the sample size were assessed for all identifiable clinical RCTs. Results: From 760 accepted abstracts, there were only 9 RCTs (1%) of clinical interventions, In only 4 trials was the relevant primary end-point specified at the outset of the
ANDOMIZED controlled trials (RCTs) of health care were first reported in the 1940s. t The RCT is recognized as the gold standard study design for the evaluation of the effectiveness of clinical interventions and treatments. It is the only study design that allows truly valid inferences to be made about cause and effect. There is evidence to suggest that RCTs are underused in both adult general 2.3 and pediatric s u r g e r y , 4-6 with other study designs such as uncontrolled case studies and series or historically controlled trials predominating. There also is evidence within both general and specialty pediatrics showing a lack of their use in clinical research. 7-]° In contrast, studies in pediatric oncology have shown the positive effects of such trials in clinical practice. Well-designed large-scale studies both in the United States and England have achieved high levels of recruitment.~ J.~2 It is likely that such trials have contrib-
R
From the BAPS Multicentre Research Office, British Association of Paediatric Surgeons and Health Services Research Unit, London School of Hygiene and Tropical Medicine, London, England. Address reprint requests to Joe L Curry, 104 Mill Court, Ashford Kent, England TN24 8DP. Copyright 2003, Elsevier Science (USA). All rights reserved. 0022-3468/03/3804-0007530.00/0 doi: l O.1053/jpsu.2003.50121
556
study, and none documented the method of randomization. Only one abstract mentioned blinding with respect to the intervention or outcome measure. Sample sizes were inadequate to detect even large clinical differences. To date, only one of these RCTs has been published in an English-language, peer-reviewed journal.
Conclusions: Clear guidelines exist for the conduct of RCTs, yet compliance with these standards was rarely documented in abstracts of pediatric surgical RCTs presented at BAPS. Sample sizes were inadequate. RCTs in pediatric surgery are difficult to perform, but the specialty would benefit from well-designed, carefully conducted, multicentre, clinical RCTs to advance evidence-based practice. J Pediatr Surg 38:556-559. Copyright 2003, Elsevier Science (USA). All rights reserved. INDEX WORDS: Randomized controlled trial.
uted to a dramatic increase in the survival rate from many pediatric malignancies during the last 2 decades. ~3 What evidence is there that RCTs are being used to advance evidence-based pediatric surgical practice? To investigate this question, we undertook an analysis of abstracts accepted by the British Association of Paediatric Surgeons (BAPS) for their Annual International Congresses. Abstracts reporting RCTs were evaluated against accepted criteria for the reporting of RCTs, including an assessment of the adequacy of the power of each study.
MATERIALS AND METHODS Abstracts accepted for presentation at the BAPS Annual Congresses between 1996 and 2000 were analyzed. All clinical RCTs were identified, and the abstract was critically evaluated in conjunction with a senior health services researcher. The quality of each abstract was checked against accepted criteria for the design t4 and reporting of RCTsJ s Specific features, including the reporting of sample size justification, process of randomization, blinding, and patient group comparabihty in the component arms of the study, were appraised. To judge the adequacy of the power of each RCT, the number of subjects that would be needed to detect a halving or doubling in the frequency of the binary primary outcome (depending on whether the outcome was harmful or beneficial) was calculated. In a single study that reported a continuous primary outcome, the number of subjects needed to detect a difference between groups equivalent to one standard deviation was calculated instead. It was considered that differences of these magnitudes would represent clinically important treatment effects.
Journal of Pediatric Surgery, Vol 38, No 4 (April), 2003: pp 556-559
RANDOMIZED CONTROLLED TRIALS
557 Table 1. Clinical RCTs at BAPS Congresses 1996-2000
Study No.
Condition
Alternative Interventions
1 2 3 4
Hypospadias Anorectal anomaly Hypospadias Abdominal pain
5 6 7 8 9
Necrotizing enterocolitis Postoperative ileus Caustic ingestion Acute appendicitis Circumcision
Outcome
No dressing versus dressing Psychological input versus none No stent versus stent 4 d versus 7 d treatment for eradication of Helicobacter pylori Stoma versus Anastamosis Prokinetic drug versus placebo Steroids versus placebo Laparoscopic versus open surgery Glue versus suture
Calculations of sample size were performed using a standard statistical formula for calculating the approximate number of subjects required in a trial, assuming that subjects have an equal chance of being randomly allocated to either group fie, sample sizes m the 2 arms of the trial are similar)16: n ( f o r one g r o u p ) = [(~-¢ * (1 - ~',)) + Or, * (1 - rr,))] * 7.85/[(~'~ - ~-,) • (~r, - ~r,)] where ~c = expected proportion experiencing outcome in control group, 7r, = expected proportion experiencing outcome in intervention group, 7.85 is a constant, determined by the assumption that a study should have 80% power to detect the target difference at a 5% significance level (2-tailed). rrc was taken to be the frequency of outcome reported for the control group. 7r, was set to be half (or double), and ~'~. - .n-, was calculated accordingly. Where the outcome was not dichotomous, but continuous, the following formula was used: n (for one g r o u p ) = 2 * o .2 * 7.85/6-' where o- = standard deviation of the primary outcome measure, 6 = target difference in raw units of the primary outcome measure, 7.85 is a constant, determined by the assumption that a study should have 80% power to detect the target difference at a 5% significance level. Setting the target difference to be one standard deviation meant that this formula simplified to n = 2 × 7.85 ~ 16. For both types of calculation, the total sample size required was 2n, assuming an equal chance of random allocation to either group.
RESULTS
For the 5 BAPS Congresses between 1996 and 2000 there were 760 accepted submissions. Of these, there
Complication rate Fecal continence Complication rate Failure to eradicate Morbidity and mortality Establishing enteral feeding Stricture rate Feasibility and complications Feasibility and complications
were only 9 clinical RCTs. The subject of each investigation, intervention, and outcome under analysis is given in Table 1. Of the 9 abstracts evaluated, the following specific points were noted: only 4 of the 9 abstracts specified the relevant primary end-point from the outset of the study, no abstract reported the method of randomization, no abstract justified the size of the study population, only one abstract specified blinding to the intervention or outcome measure, and only 4 abstracts stated the comparability of the trial groups. Table 2 indicates the same abstracts but includes not only the actual number of recruited patients in each study but also the total number of patients that would need to be recruited to detect a halving or doubling of the outcome variable. A marked difference can be seen between the numbers needed to detect this size of effect versus the number actually recruited into each study. Differences smaller than the target difference may be clinically significant but would have required even more participants. The final column in this table shows the differences between arms that the studies had 80% power to detect at a 5% significance level, given the actual sample sizes recruited. These differences certainly would be clinically important but are implausible, illustrating the high likelihood that these studies would lead to "type 2 errors," ie, a conclusion of no difference between arms
Table 2. Clinical RCTs at BAPS Congresses 1996-2000 --Actual Versus Calculated Sample Sizes to Detect Halving or Doubling of Outcome Variable
Study No.
Condition
No,
1 2 3 4
Hypospadias Anorectal anomaly Hypospadias Abdominal pain
100 24 100 200
5 6 7 8 9
Necrotizing enterocolitis Postoperative ileus Caustic ingestion Acute appendicitis Circumcision
39 22 90 204 65
Sample Size to Detect Halving or Doubling in Outcome Frequency (Probability of Outcome in Control and Intervention Groups) 511 (16% v8%) 157 (20% v 40%) 864 (10% v 5%) 864 (10% v 5%) 377 (mortality) (20.7% v 10.4%) 170 (mortality & morbidity) (38% v 19%) 32 (to detect 1 SD difference) 451 (17.8% v8.9%) 1021 (8.6% v4.3%) 832 (10.3% v 5.2%)
Difference Between Groups That Studies Had 80% Power to Detect 16% (control) versus <1% (intervention) 20% (control) versus 75% (intervention) 10% (control) versus <1% (intervention) 10% (control) versus 1% (intervention) mortality: 20.7% (control) versus <1% (intervention) mortality & morbidity: 38% versus 3% 32 (to detect 1 SD difference) 18% (control) versus <1% (intervention) 9% (control) versus <1% (intervention) 10% (control) versus <1% (intervention)
558
CURRY, REEVES, AND STRINGER
when a clinical important difference might exist. An author search of the MEDLINE database found that only one study (No. 6) has subsequently been published in an English-language, peer-reviewed journal. ,7 DISCUSSION
Any analysis of abstract data is inherently liriaited by the fact that only a small amount of information can be incorporated into a restricted summary. Abstracts submitted to BAPS conferences are limited by space, but no guidance is given specifically for reporting RCTs. However, because this is the only information available to a conference selection committee, then a basic level of quality needs to be assured. Quality is not a trivial issue. For example, Schultz et al ~8 showed that, on average, RCTs that used flawed methods of randomization exaggerated the effect of the intervention by about 40%. This simple analysis has highlighted some methodologic inadequacies that may cast doubt on, and possibly negate, any clinical conclusions drawn from some of these studies. It also has shown that sample sizes are inadequate to detect even large clinical differences should they exist. Smaller differences may still be clinically relevant but would require even larger study populations. These factors may have contributed to such a poor publication rate in peer-reviewed journals indexed in Medline. We acknowledge that our findings are based on only a small sample. However, the abstracts reviewed were the only RCTs identified over a period 5 years; we were reluctant to search for older abstracts because these would have been less likely to reflect the current situation. At a more fundamental level, our analysis indicates that there are very few clinical RCTs being conducted in pediatric surgery. Of course, it may be that well-designed clinical RCTs of sufficient power are being performed in England but are not submitted for presentation to the BAPS, but this is not supported by the pediatric surgical literature. Because the BAPS Annual Congresses are international (over 40 countries represented at the 2001 Congress) then this probably reflects the global situation in pediatric surgery. Why is there a dearth of clinical RCTs in pediatric surgery? Clinical RCTs are difficult to perform in surgical sciences~9.2o: surgical techniques are difficult to standardize; some operations are associated with a variable learning curve; uncommon conditions attract small patient numbers; there are concerns about patient choice; and many conditions, particularly in pediatric surgery,
require long-term follow up during which time other aspects of management evolve that affect the original condition. Adjunctive therapies are likely to be easier to evaluate than surgical procedures 2~ and it is interesting to note that only 2 of the 9 abstracts reported comparisons of the latter kind. Occasionally, a nonrandomized trial design may provide the required clinical information. However, it is a myth that such studies are easier to do; they merely are easier to do badly. Most of the quality criteria for RCTs are still relevant to nonrandomized studies and should be complied with by researchers, including justification of an adequate sample size. They usually require more, rather than less, information to be collected to take account of the potential lack of comparability between the groups being compared. Research in pediatrics has its own unique issues and difficulties in relation to consent, nontherapeutic trialing, and subjective outcome variables?-2,23 Partly because of this, there still is a disappointing lack of basic clinical information about commonly prescribed medication in children. 24 Trial design needs to be meticulous and, by necessity, should involve the skills of a senior medical statistician from initial design through to analysis. 25 Because of the reputation of such studies, and the likelihood that their results subsequently will be included in meta-analyses, it is particularly important that all possible measures are taken to ensure their quality. RCTs may fail to recruit sufficient numbers of patients for a wide variety of reasons, thus, limiting their interpretation, but this should not deter us from attempting to conduct high-quality studies of sufficient power to generate clinically robust outcomes. Checklists for reporting the results of RCTs are now freely available. 26 In 1996 the BAPS Multi-Centre Research Committee was formed to encourage cooperation and collaboration in surgical research between surgeons and centres in England. It is hoped that this will enable sufficient patient numbers to be recruited to address important clinical questions. The first BAPS multicentre study currently is nearing completion, and further studies are planned. The importance of RCTs needs to be promoted among both clinicians and parents who often find the issue of equipoise difficult to grasp. Good clinical trials will attract funding, an especially important issue in a relatively small specialty such as pediatric surgery. We should aim to include more of our patients in clinical trials.
REFERENCES
1. MedicalResearch Council: Streptomycintreatmentof pulmonary tuberculosis. BMJ (ii): 769-782, 1948
2. Horton R: Surgical research or comic opera: Questions, but few answers. Lancet 347:984-985, 1996
RANDOMIZED CONTROLLED TRIALS
3. Solomon MJ, McLeod RS: Clinical studies in surgical journals: Have we improved? Dis Colon Rectum 36:43-48, 1993 4. Kenny SE, Shankar KR, Rintala R, et al: Evidence-based surgery: Interventions in a regional paediatric surgical unit. Arch Dis Child 76:50-53, 1997 5. Moss RL, Henry MCW, Dimmitt RA, et al: The role of prospective randomized clinical trials in pediatric surgery: State of the art? J Pediatr Surg 36:1182-1186, 2001 6. Thakur A, Wang EC, Chiu TT, et al: Methodological standards associated with quality reporting in clinical studies in pediatrc surgery journals. J Pediatr Surg 36:1160-1164, 2001 7. Campbell H, Surry SAM, Royle EM: A review of randomised controlled trials published in Archives of Disease in Childhood from 1982-96. Arch Dis Child 79:192-197, 1998 8. Cheng K, Smyth RL, Motley J, et al: Randomised controlled trials in cystic fibrosis, 1966-97, categorised by time, design and intervention. Pediatr Pulmonol 29:1-7, 2000 9. Feldman BF, Giannini EH: Where's the evidence? Putting the clinical science into pediatric rheumatology. J Rheumatol 23:15021504, 1996 10. Polnay L: Research in community child health. Arch Dis Child 64:981-983, 1989 I 1. Chessels JM, Bailey C, Richards SM: Intensification of treatment and survival in all children with lymphoblastic leukaemia: results of UK Medical Research Council trial UKALL X. Lancet 345:143-148, 1995 12. Bleyer WA: The U.S. pediatric cancer clinical trial programmes: International implications and the way forward. Eur J Cancer 33:14391447, 1997 13. Taub JW: Factors in improved survival from paediatric cancer. Drugs 56:757-765, 1998 14. Guyatt GH, Sackett DL, Cook DJ, for the Evidence-Based
559
Medicine Working Group: Users' Guides to the Medical Literature II. How to use an article about therapy or prevention A. Are the results of the study valid? JAMA 270:2598-2601, 1993 15. Begg C, Cho M, Eastwood S, et al: Improving the quality of reporting of randomized controlled trials: The CONSORT statement. JAMA 276:637-639, 1996 16. Machin D, Campbell M, Fayers P, et al: Sample Size Tables for Clinical Studies (edz). Oxford, England, Blackwell Science, 1997 17. Lander A, Redkar R, Nicholls G, et al: Cisapride reduces neonatal postoperative ileus: Randomized placebo controlled trial. Arch Dis Child 77:F119-122, 1997 18. Schulz KF, Chalmers I, Hayes RJ, et al: Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273:408-412, 1995 19. Black N: Why we need observational studies to evaluate the effectiveness of health care. BMJ 312:1215-1218, 1996 20. Reeves BC: Principles of research: Design and analysis of clinical trials. Surgery 18:101-104, 2000 21. Reeves BC: Health technology assessment in surgery. Lancet 353: SI 3-5, 1999 (suppl) 22. Chambers TL: Seven questions about paediatric research. J Roy Soc Med 93:320-321, 2000 23. Smyth RL, Weindling AM: Research in children: Ethical and scientific aspects. Lancet 354: s 11 21-24, 1999 24. Choonara I: Clinical trials of medicines in children. BMJ 321: 1093-1094, 2000 25. Murray GM: Essential statistics, in Schein M, Farndon JR, Fingerhut A (eds): A Surgeon's Guide to Writing and Publishing. Shrewsbury, Shropshire, UK, Kemberton Publishing, 2001, pp 95-106 26. Gardner MJ, Machin I), Campbell MJ: The use of check lists in assessing the statistical content of medical studies. BMJ 292:810-812, 1986