Journal of Hepatology 44 (2006) 607–615 www.elsevier.com/locate/jhep
Review
The culture of designing hepato-biliary randomised trials Christian Gluud* The Cochrane Hepato-Biliary Group, Copenhagen Trial Unit, Department 7102, Centre for Clinical Intervention Research, Rigshospitalet, Copenhagen University Hospital, DK-2100 Copenhagen, Denmark
Research evidence may assist in identifying the best prevention, diagnostic method, or treatment. At the top of the evidence hierarchy you find the randomised clinical trial with low bias risk and systematic reviews of such trials. Over 8,500 articles on hepato-biliary randomised trials have been published in over 1,000 journals. Currently, over 500 articles on hepato-biliary randomised trials are published each year. Designing a randomised clinical trial you have to decide the participants and data to include, the experimental intervention (explanatory or pragmatic), the comparator (placebo or active), the basic design (parallel-group, crossover, factorial, cluster), and goal (superiority, equivalence, or non-inferiority). You also have to secure the internal validity (reliability of the results) of the trial. Internal validity may be jeopardised by random errors. The median number of participants per intervention arm in hepatobiliary trials is only about 23, giving ample room for random errors. Internal validity may be jeopardised by systematic errors or bias. Only 48% of hepato-biliary randomised trials report adequate generation of the allocation sequence, 38% adequate allocation concealment, and 34% adequate ‘double blinding’. Randomised trials with adequacy of these components are with low bias risk. Hence, more than 90% of hepato-biliary trials may be biased, overestimating intervention effects. By conducting more multi-centre trials, hepato-biliary investigators can include more participants and improve quality. Further, multi-centre trials have better external validity (generalisability of the results) than singlecentre trials.
1. Introduction The Scottish naval surgeon James Lind started his controlled trial of 12 scurvy-ridden sailors on 20th May * Tel.: C45 35 45 71 75; fax: C45 35 45 71 01. E-mail address:
[email protected] (C. Gluud).
1747 [1]. Lind divided them: two got oranges and lemons, two cider, two vinegar, two elixir vitriol, two a concoction of spices, garlic, and mustard seeds, and two sea water. Within 6 days, the two sailors given oranges and lemons became well. The others did not. Lind was intelligent. His trial marks a major breakthrough. The 20th May is now the International Clinical Trials’ Day [2]. Lind was lucky. We seldom see such dramatic intervention effects. We are usually looking for smaller, but still important effects. Such effects, however, may be blurred by random and systematic errors. Scientists have, therefore, developed larger trials using central randomisation, blinding, and intention-to-treat analyses, aiming at reducing random errors and systematic errors to a minimum [1,3–6]. Although randomised trials provide the fairest way to test the effects of interventions [1,3–6], over 200 years went before the first hepato-biliary randomised trial was published [7]. Thomas C Chalmers and co-workers conducted their two factorial-designed trials on diet, rest, and physical reconditioning in 460 patients with acute infectious hepatitis in 1955 [7]. Other trials in liver diseases followed [8] and hepato-biliary randomised trials appeared regularly from the 1970s (Fig. 1) [9]. Currently over 500 publications on hepato-biliary randomised trials are published each year (Fig. 1) [9]. Here, I describe some of the issues one has to consider when assessing or designing a randomised clinical trial. Further, I contrast the cultures of hepato-biliary randomised trials to randomised trials from any other medical field.
2. Why is it important to randomise? The hierarchy of evidence is well-established [10–13]. It is based on the risks of bias in the different study designs. Randomised trials are internationally considered the gold standard for intervention comparisons [1,3–6,10–13]. The results from randomised trials form the basis for determining which diagnostics, drugs, drills, or devices are
0168-8278/$32.00 q 2005 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jhep.2005.12.006
608
C. Gluud / Journal of Hepatology 44 (2006) 607–615
Fig. 1. Number of publications on randomised clinical trials (RCTs) during 1955 to present according to The Cochrane Hepato-Biliary Group Controlled Trials Register [9]. The decline since 2001 is due to backlog in identification and registration.
effective. Randomisation forms the basis for making fair comparisons [6]. Historically controlled studies, cohort studies, and casecontrol studies are often unreliable designs unless the intervention effect is dramatic [10–13]. Dramatic intervention effects are exceptional. When exceptionally effective interventions do occur in observational studies, the interventions need confirmation in randomised trials [14]. There is, therefore, much wisdom in Thomas C Chalmers’ 1975 statement: ‘Always randomise the first patient’ [15]. Lowest in the evidence hierarchy you find expert committee reports, expert opinions based on clinical experience, case reports, and experimental models [10–13]. Designs other than randomised trials remain important for diagnostic [13,16] and prognostic [17,18] studies and for assessing rare adverse events [19,20]. However, these designs cannot replace randomised trials in assessing beneficial effects of interventions. Fears of entering or conducting randomised trials are not based on evidence.
Outcomes of patients who participate in randomised trials are just as good as those of similar patients receiving the same treatments outside trials [21]. Interventions tested in trials may harm. However, the vast majority of trials are without significant differences between the assessed interventions. Further, it is much better to identify any harm in a randomised trial than having interventions disseminated in clinical practice without proper and fair testing. Such interventions (e.g. hormones for postmenopausal women [22], antioxidant vitamins for preventing gastrointestinal cancers [23], clarithromycin for patients with stable coronary heart disease [24], methotrexate for primary biliary cirrhosis [25]) may cause more harm if introduced into clinical practice based on insufficient clinical research. Randomised trials assessing interventions that may harm should have an independent data monitoring and safety committee [26]. Too few randomised trials are conducted with supervision from such committees.
Table 1 Methodological components used to assess the risk of bias in randomised clinical trials
Generation of allocation sequence
Allocation concealment Double blindinga
Adequate-low bias risk
Inadequate-high bias risk
Computer generated random numbers, table of random numbers, or similar Central randomisation, sealed envelopes, or similar Identical placebo tablets or similar
Not described or inadequate methods
Not described or inadequate (e.g. by an open table or similar) Inadequate (e.g. tablets versus injection), not described, or no double blinding
a For a number of interventions it may be hard or impossible to obtain ‘double-blinding’. However, it is almost always possible to obtain blinded outcome assessment.
C. Gluud / Journal of Hepatology 44 (2006) 607–615
Randomised trials are increasingly being used to guide evidence-based clinical practice [27]. You need to address a central question before you consider using the trial results for patient care: are the results valid? The external validity depends on the internal validity of the trial (reliability of the results). The internal validity of a trial depends on the risks of random errors [3,4] and the risks of systematic errors (i.e. bias) [28– 34]. Conducting large randomised trials with many participants having many outcomes decreases the risks of random errors. Conducting randomised trials with ‘high methodological quality’, avoiding selection, performance, assessment, attrition, and other biases, decreases the risks of systematic errors (bias) [28–34]. Methodological quality has been defined as ‘the confidence that the trial design, conduct, and analysis has minimised or avoided biases in its treatment comparison’ [29]. The risk of bias in a trial can be assessed as described in Table 1. Only with good internal validity of a trial (large numbers of participants with outcomes and low bias risk) will it be relevant to consider the external validity (generalisability of the results). If there are problems with the internal validity, the question about external validity becomes irrelevant [35–37]. 3. What kind of participants to include and which data to collect? The participants going to be included in a trial should be clearly defined. You should be able to list few entry criteria and few exclusion criteria. The reason for stressing few is that we often see trials with so many in- and exclusion criteria that it becomes difficult to identify such patients in clinical practice. Such trials may have adequate internal validity, but are less valuable due to lack of external validity. When designing a new trial you want to include patients having a known prognosis regarding your primary outcome. You should select a primary outcome that is prevalent and clinical relevant. Otherwise, you will get too few outcome data and hence too broad confidence intervals. The data you need to collect should be given much thought. The more data you collect, the more studies (and hence publications) you can make. On the other hand, the more data you request, the more difficult will it be to conduct the trial. Accordingly, complex trials need more recourse and run a larger risk of not being finalised. Although the ‘molecular-genetic revolution’ progresses slowly [38], I recommend a blood bank on all the participants in the majority of trials. This is especially advisable if there is an interaction between genomics and proteinomics and intervention effects. 4. Which experimental intervention? Apart from questions about which diagnostic method, drug dosage, endoscopic technique, or surgical technique to test, it is essential to decide if you want to conduct an explanatory trial or a pragmatic trial.
609
Explanatory trials test whether an intervention is efficacious. That is, whether the intervention has a beneficial effect in an ideal situation. The explanatory trial seeks to maximise the internal validity by assuring rigorous control of all variables. Explanatory trials often have a number of participant inclusion and exclusion criteria. Such trials often assess surrogate outcomes. The more money or personal interest you have in an intervention, the more you tend to make your trial explanatory. Seen from the patients and clinicians point of view, such trials may be less meaningful. Pragmatic trials measure effectiveness. These trials seek a balance between internal validity and external validity. The pragmatic trial seeks to maximise external validity to ensure that the results can be generalised. Pragmatic trials assess the effect of an intervention and the ‘things’ being applied together with this intervention in clinical practice. Some interventions can only be assessed in pragmatic trials. If you compare the benefits of upper gastrointestinal endoscopic examination plus banding of bleeding varices versus terlipressin infusion, you will never get a comparison of banding alone versus terlipressin infusion. Patients and clinicians would generally show greatest interest in the results of pragmatic trials. The development phase of the intervention and the question you pose drive the choice between explanatory and pragmatic trials. Nobody would embark on a large pragmatic trial on a new intervention without first assessing the potential benefits of the intervention in a small explanatory trial. On the other hand, we are too often witnessing that too many explanatory trials are conducted on the same topic without a single pragmatic trial being carried out. There are ways in which one can try to combine explanatory and pragmatic randomised trials [39].
5. Which comparator: placebo or active? If there is no evidence-based intervention offered in clinical practice for the potential trial participants, then placebo or ‘sham’ procedure is the right comparator choice. Claims that the Food and Drug Administration and the European Medicines Agency require placebo-controlled trials are wrong. If a systematic review of low-bias trials or other convincing evidence show that the potential participants should be offered an intervention, the intervention must be offered. There are three solutions. First, you can compare the experimental intervention with the control intervention (e.g. ribavirin versus interferon for chronic hepatitis C [40]). Second, you can add the experimental to the evidence-based intervention and compare it with placebo plus the evidencebased intervention (e.g. ribavirin plus interferon versus placebo plus interferon [41]). Third, you may find patients who will not accept or who have contraindications to the evidence-based intervention and randomise them to experimental intervention versus placebo (e.g. ribavirin versus placebo [40]). In the latter case, the patients would not get the evidence-based intervention anyhow.
610
C. Gluud / Journal of Hepatology 44 (2006) 607–615
6. Parallel-group or cross-over randomised trial? Whether you read a report on a trial or you are going to design a trial, one of the questions you have to answer is: should this trial be a parallel-group or a cross-over trial? Both parallel-group and cross-over trials offer the opportunity to randomise to experimental intervention and comparator. It is, however, a delicate decision when to use one design in stead of the other [42–45]. In parallel-group trials one randomises consecutive participants fulfilling entry criteria and no exclusion criteria to the experimental intervention or the control. Parallelgroup trials offer a number of advantages: no requirements regarding disease stability, irreversible interventions may be studied, both benefits and harms (adverse events) can readily be connected with the intervention given, and their design is easier to understand and explain [42,43]. The problem with parallel-group trials is that they require more participants, which often necessitates multi-centre trials. But multi-centre trials have lower bias risk than singlecentre trials, so this may in fact not be so bad [34]. In cross-over trials, a single participant receives both the experimental and the control intervention in a randomised sequence. These trials reduce the between-participant variability in the intervention comparison. Hence, fewer participants are needed. However, cross-over trials require that you are examining a stable condition and a reversible intervention. Further, there are inherent deficiencies in the logic of cross-over trials potentially invalidating them, like failure to return participants to their baseline state before the cross-over, non-uniform pharmacologic and psychologic carry-over effects, time-dependent outcome measures, and negative correlation between intervention responses. Accordingly, benefits and harms (adverse events) are less readily connected with the intervention given. Only 288/8698 (3%) of the randomised trials in The Cochrane Hepato-Biliary Controlled Trials Register are cross-over trials [9] compared to 116/519 (22%) PubMedindexed randomised trials from all medical fields published in December 2000 [46].
Fig. 2. Relation between confidence interval, line of no effect, and thresholds for important differences (from P. Alderson, BMJ 2004;328:476–477). [This figure appears in colour on the web.]
8. Cluster randomised trials Asking a clinician to offer an intervention to half of the patients, you run the risk of contamination in the other half. In such situations you may want to apply your intervention at a higher level than the individual participant, e.g. the individual clinician, group of clinicians, hospital wards, cities, regions, or countries. You hereby randomise trial participants in clusters [48]. Because the responses of participants within clusters can be expected to be more similar than responses of participants belonging to different clusters, sample size calculation has to be adjusted upwards. Cluster randomised trials are very complex [48].
9. What is the goal of the trial?
7. Multiple promising interventions: the factorial design Randomised trials may create plenty of problems if you have one experimental intervention and a comparator. What should you do if you have two experimental interventions that both look promising? You can of course conduct a three-armed randomised trial (experimental A versus experimental B versus control C). If the interventions do not interact, you are far better off conducting a 2 ! 2 factorial trial. You obtain the same information with fewer patients plus at the same time you assess any interaction between the interventions [47]. There is no doubt that factorial trials are underused within hepatology.
To find the goal of a trial you have to answer the three questions: do you want to show your experimental intervention is superior, equivalent, or non-inferior to your comparator? The superiority trials are the usual trials (Fig. 2). You want to establish if your experimental intervention is superior to your control. If you do not have a convincing evidence-based intervention that works, the choice of a superiority trial is straightforward. Thirty years ago there were variable approaches to whether such trials ought to be analysed onetailed (P%0.025 for experimental better than control) or twotailed (P%0.05, testing that the experimental intervention may both be superior or inferior to control) [49]. The two-tailed analysis is now the norm. This gives you the chance to analyse
C. Gluud / Journal of Hepatology 44 (2006) 607–615
and conclude on your data even when your experimental intervention shows to be more detrimental than your control. Say that you have started out with a superiority trial and find no significant difference between experimental and control interventions. Does this allow you to conclude equivalence? Of course not. Most superiority trials would have much too wide confidence intervals to allow for the conclusion of equivalence when you do not find the experimental intervention to be significantly better than the control—or vice versa. However, superiority trials ending up concluding ‘equivalence’ are the norm. This practice needs to be stopped. The equivalence trial starts from another perspective: defining there is no significant difference between the experimental and control interventions. In an equivalence trial you should set realistic borders a priori for what you think is ‘equivalent’ or an ‘irrelevant difference’ (Fig. 2). If a difference larger than this quantity is found, then the interventions are not equivalent. If the difference between the interventions is within your borders of equivalence, then they seem equivalent. The problem with equivalent trials are that your experimental intervention may both demonstrate better or worse than your control (two sided P%0.05). Hence, you need a large number of participants to demonstrate equivalence. Therefore the non-inferiority trial seems ‘handy’. Here, you have a control evidence-based intervention that works and you want to test if an experimental intervention (which cause less adverse events, is cheaper, or is easier to administer) is not inferior. By employing the one-sided P%0.025 you should randomise less participants than in an equivalence trial. Problems with non-inferiority trials arise when the experimental intervention seems to be more effective regarding the primary outcome measure. What do you conclude then?
10. Sample size estimation in randomised trials Your sample size estimation depends on the goal of the trial (superiority, equivalence, or non-inferiority) and the type of the primary outcome measure (dichotomous or continuos). In a superiority trial with a dichotomous primary outcome, the sample size is determined from four pieces of information based on the primary outcome measure [4]:
611
† The expected proportion of patients with the primary outcome during the trial in the control arm. Very often this variable is grossly overestimated. The increased availability of valid clinical databases should alleviate the problem in the future. † A priory estimate of the intervention effect, i.e. the expected minimal relevant difference. Very often this variable is grossly overestimated. † Alpha or the risk of committing a type I error (usually set to %0.05). † Beta or the risk of committing a type II error (usually set to %0.20 or %0.10). It is important to know the targeted sample size when we evaluate the internal validity of a randomised trial. Otherwise, we do not know whether the data of the trial are reported before, at, or after the targeted sample size was reached [36,50]. Depending on the journal, only 7–26% of hepato-biliary randomised trials report a sample size calculation [34,51,52] (Table 3). According to Chan and Altman, the figure was 27% in PubMed-indexed randomised trials published in December 2000 from all disease areas [46]. The sample size in a trial with a continuous outcome measure is determined from knowledge of the mean and standard deviation of the outcome and other formulas [53].
11. Sample size of randomised trials Most hepato-biliary randomised trials are too small [9,34,37,40,51,52,54–56] (Tables 2 and 3). The number of patients included in hepato-biliary randomised trials only varied a little depending on the journal in which they were published [34,37,51,52] (Table 3). The median number of participants per intervention arm was 23 (10th–90th percentiles from 7 to 102) in hepato-biliary trials published in 12 journals during 1985–1996 [54] (Table 2). In PubMedindexed randomised trials from all disease areas, the median number was 32 participants per intervention arm (10th–90th percentiles from 12 to 159) considering all designs and 80 participants per intervention arm (10th–90th percentiles from 25 to 369) considering parallel-group trials [46] (Table 2). Small sample sizes are worrying since they are connected with large risks of type I and type II errors [4,56]. With a
Table 2 Comparison of 616 hepato-biliary randomised trials from 12 journals on MEDLINE [54] and 519 randomised trials from PubMed [46] regarding sample size and adequacy of methodological components Variable
Randomised hepato-biliary clinical trials published from 1985 to 1996 [54]
Randomised trials from all disease areas published in December 2000 [46]
Median number of participants per intervention arm (10th–90th percentiles participants per intervention arm) Proportion with adequate generation of the allocation sequence Proportion with adequate allocation concealment Proportion with adequate double blinding
23 participants (7–102 participants)
32 participants (12–159 participants)
48%
21%
38% 34%
18% 38%
612
C. Gluud / Journal of Hepatology 44 (2006) 607–615
Table 3 Number of randomised trials, the proportion of randomised trials reporting sample size calculations, and number of participants per intervention arm in four journals publishing many hepato-biliary trials Liver [51] Number of trials 32 Sample size calculations 7% Number of participants per intervention arm Median 18 Interquartile range 10–36 Range 2–169
Journal of Hepatology [52]
Hepatology [34]
Gastroenterologya [37]
171 19%
235 26%
383 NDb
19 11–31 5–519
26 14–44 3–542
23 10–50 1–1107
a
Includes trials on both hepato-biliary and other gastroenterology topics. There were no major differences between hepato-biliary trials and trials on other gastroenterology topics regarding sample size, but sample size varied significantly between the different disease areas examined (Kjaergard LL et al., unpublished observations). b ND, not determined.
small sample size, important prognostic variables may be unevenly distributed. This could lead to observation of significant ‘intervention effects’ simply due to the distribution of prognostic variables. A two-group comparison with 23 patients in each arm has 26% power to detect a difference between event rates of 30% in the control group and 10% in the experimental group at the 0.05 level. The difference in intervention effect corresponds to a relative risk of 0.33 or a relative risk reduction of 67%. Such intervention effects are rarely discovered [9]. The power to detect smaller differences is less than 26%. The problem with random errors can only be overcome by developing more effective interventions (the moleculargenetic ‘revolution’ may give some hope [38]) or by clinical investigators realising that being a small part of a large trial is more important than being a large part of a small trial.
12. Methodological quality: the risk of bias Conducting randomised trials with high methodological quality (i.e. avoiding selection, performance, assessment, attrition, and other biases) decreases the risks of bias [28–33]. We have examined the methodological quality of hepatobiliary randomised trials (Tables 2 and 4). Most trials have one or more methodological deficiencies [9,34,37,51,52,54–56]. The low methodological quality raises the question if biased estimates of intervention effects have occurred. Only
a systematic review of the evidence may answer this question [9,56]. The methodological quality of a trial is related to the number of centres that were involved [34], the therapeutic area [34,37,52,54], and whether the trial was sponsored [54]. We found no significant difference in the quality of trials sponsored by for-profit or not-for-profit organisations [54]. 12.1. Generation of the allocation sequence The proportion of hepato-biliary randomised trials with adequate generation of the allocation sequence varies from 21 to 52%, depending on the journal (Table 4). About every second trial reported adequate generation of the allocation sequence among hepato-biliary trials published in 12 journals during 1985–1996 [54] compared to 21% in PubMed-indexed randomised trials from all disease areas published in December 2000 [46] (Table 2). Trials with unclear or inadequate generation of the allocation sequence are associated with a 12% (95% confidence interval 1–21%) exaggeration of the intervention effect [33]. 12.2. Allocation concealment The proportion of hepato-biliary randomised trials with adequate allocation concealment varies from 5 to 39%, depending on the journal (Table 4). A total of 38% of trials
Table 4 Number of randomised trials and the proportion of randomised trials with adequate generation of the allocation sequence, allocation concealment, and double blinding in four journals publishing many hepato-biliary trials
Number of trials Adequate generation of the allocation sequence Adequate allocation concealment Adequate double blinding a
Liver [51]
Journal of Hepatology [52]
Hepatology [34]
Gastroenterologya [37]
32 21%
171 28%
235 52%
383 42%
5%
13%
34%
39%
28%
30%
34%
62%
Includes trials on both hepato-biliary and other gastroenterology topics. There were no major differences between hepato-biliary randomised trials and randomised trials on other gastroenterology topics regarding methodological quality, but methodological quality varied significantly between the different disease areas examined (Kjaergard LL et al., unpublished observations).
C. Gluud / Journal of Hepatology 44 (2006) 607–615
reported adequate allocation concealment among hepatobiliary trials published in 12 journals during 1985–1996 [54] compared to 18% in PubMed-indexed randomised trials from all disease areas published in December 2000 [46] (Table 2). The proportion was higher in some areas of hepatology (e.g. primary biliary cirrhosis) and lower in others (e.g. hepatitis B and C) [54]. Trials with unclear or inadequate allocation concealment are associated with a 21% (95% confidence interval 5–34%) exaggeration of the intervention effect [33]. 12.3. Blinding Due to the nature of many interventions (e.g. endoscopy for portal hypertension, gallbladder surgery), ‘double blinding’ (i.e. blinding of both patient and caregivers) may not be feasible. Only blinding of all involved in a trial can secure that bias do not occur. In trials where control interventions cannot be blinded with a placebo or a sham, you can always use blinded outcome assessment. This may reduce assessment bias. The proportion of hepato-biliary trials with adequate double blinding varies from 28 to 62%, depending on journal (Table 4). A total of 34% of trials were double blind among hepato-biliary randomised trials published in 12 journals during 1985–1996 [54] compared to 38% in PubMed-indexed randomised trials from all disease areas published in December 2000 [46] (Table 2). Trials with unclear or inadequate double blinding are associated with an 18% exaggeration of the intervention effect [33]. 12.4. Statistical analyses of entry data Many randomised trials are presented with statistical tests for differences in entry experimental and control data. This is not meaningful [57]. In small trials, important prognostic factors will often be non-significant even if skews have occurred. In large trials, small differences without prognostic information will often become significant. If you test for 20 variables, at least one may become significant by chance. If you fear that randomisation may not be able to secure equal distribution of prognostic variables, then you should conduct stratified randomisation regarding these factors [4,36]. Such stratified randomisation requires that you know which variable contains prognostic or therapeutic-prognostic information and you intend to include less than 300–500 participants. In larger multi-centre trials it is always advisable to stratify for centre. 12.5. Statistical analyses of outcome data Having freely floating outcome measures opens up for the possibility always to be able to prove that the experimental intervention works better than the control.
613
You just have to test enough outcome measures. Sooner or later one will turn out significantly ‘favouring’ the experimental intervention. Chan and collaborators have shown that trialists keep changing the primary outcomes in randomised trials [58,59]. This practice is unscientific. It leaves us unable to evaluate the results of randomised trials. Public registration of all trials before inclusion of the first participant can solve this problem [60–62]. The intention-to-treat analysis is generally recommended to minimise bias in the analyses of both benefits and harms [42,43]. One should never accept ‘per-protocol’ analyses alone, but such analyses can of course provide more insight. Too often, trials are stopped too early for benefit [50,55]. Such trials show implausible large intervention effects and should be viewed with scepticism [50,55].
13. Conflicting interests The impact of conflicts of interests may have profound effects on the results of trials as well as how results are interpreted [63–66]. It is clear to many that the influence of the drug and device industry has become too large [67].
14. Discussion During the last 50 years we have witnessed a very positive increase in the number of randomised trials being conducted (Fig. 1). Compared to randomised trials in general, hepato-biliary trials are less often cross-over trials and more often conducted with adequate generation of allocation sequence and adequate allocation concealment. These are very positive observations. On the other hand, the size, the bias risks, the analysis of and the interpretation of hepato-biliary trials still leave a lot to be desired. Progress regarding these aspects has been slow or absent [34,37,52,54]. We need to pay more attention to adequate statistical power, design, analyses, and interpretation of randomised trials. The recommendations of the CONSORT Statement (www.consort-statement.org) [42,43,48] and The Cochrane Collaboration [68] may guide future research. We need more research into how to organise large randomised trials and how to reduce drop-outs, and too short follow- up. We need more research into analyses of randomised clinical trials. E.g. logistic regression analyses seem to dramatically increase rather than decrease the risks of over- and underestimation of intervention effects [69]. We also need more independent evaluation of interventions, free of commercial and other vested interests [67]. We have to face the fact that most significant P-values are false [70]. We need to take this into consideration when we evaluate the individual randomised trial as well as when assessing meta-analysis of several trials [71]. We, therefore, need additional research in methods for systematic reviewing on how best to conduct trial sequential analysis
614
C. Gluud / Journal of Hepatology 44 (2006) 607–615
with trial monitoring boundaries in order to reduce the risk of committing type I errors [72–75] and combine frequentistic and Baysian methods [71]. We also need to bridge the gaps between clinical research and clinical practice [76,77]. These tasks may be achieved with investments and dedicated collaboration. Conducting meta-analyses will increase power and precision [56,68,78,79]. Systematic reviews with meta-analyses of several randomised trials have become an important tool for clinical decision-making ([56,68,78,79], www.cochrane. org). We need to work hard in the present millennium in order not to repeat the mistakes of the last [80].
References [1] James Lind Library. Available from http://www.jameslindlibrary.org/. [2] European Clinical Research Infrastructures Nework (ECRIN), May 20th 2005, the first International Clinical Trials’ Day. Available from http://www.ecrin.org/ecrin_files/home.php?levelZ1 [3] Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med 1984;3:409–422. [4] Pocock SJ. Clinical trials–a practical approach. Chichester: Wiley; 1996. [5] Gluud C, Sørensen TIA. New developments in the conduct and management of multi-center trials: an international review of clinical trial units. Fundam Clin Pharmacol 1995;9:284–289. [6] Chalmers I. Comparing like with like: some historical milestones in the evolution of methods to create unbiased comparison groups in therapeutic experiments. Int J Epidemiol 2001;30:1156–1164. [7] Chalmers TC, Eckhardt RD, Reynolds WE, Cigarroa JG, Deane N, Reifenstein RW, et al. The treatment of acute infectious hepatitis. Controlled studies of the effects of diet, rest, and physical reconditioning on the acute course of the disease and on the incidence on relapses and residual abnormalities. J Clin Invest 1955;34: 1163–1235. [8] Chalmers TC. Randomised controlled clinical trials in diseases of the liver. Prog Liver Dis 1976;5:450–456. [9] Gluud C, Als-Nielsen B, D’Amico G, Gluud LL, Khan S, Klingenberg SL, et al. Cochrane Hepato-Biliary Group. About The Cochrane Collaboration (Collaborative Review Groups (CRGs)). The Cochrane Library, Issue 4, 2005. Art. No.: LIVER. [10] Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. Evidence-based Medicine, how to practise and teach EBM. 2nd ed. Edinburgh: Churchill Livingstone; 2000. [11] Guyatt G, Rennie D. Users’ guides to the medical literature: a manual of evidence-based clinical practice. Chicago, Ill: AMA Press; 2002. [12] Moher D, Schulz KF, Altman D. The CONSORT statement: revised recommendations for improving the quality of reports of parallelgroup randomized trials. J Am Med Assoc 2001;285:1987–1991. [13] Gluud C, Gluud LL. Evidence based diagnostics. BMJ 2005;330: 724–726. [14] Sanborn RE, Blake CD. Gastrointestinal stromal tumors and the evolution of targeted therapy. Clin Adv Hematol Oncol 2005;3: 647–657. [15] Chalmers TC. Randomization of the first patient. Med Clin North Am 1975;59:1035–1038. [16] Tatsioni A, Deborah AZ, Aronson N, Samson DJ, Flamm CR, Schmid C, et al. Challenges in systematic reviews of diagnostic technologies. Ann Intern Med 2005;142:1048–1055. [17] D’Amico G, Morabito A, Pagliaro L, Marubini E. Survival and prognostic indicators in compensated and decompensated cirrhosis. Dig Dis Sci 1986;31:468–475.
[18] Christensen E. Prognostic models including the Child-Pugh, MELD and Mayo risk scores—where are we and where should we go? J Hepatol 2004;41:344–350. [19] Ioannidis JP, Evans SJ, Gøtzsche PC, O’Neill RT, Altman DG, Schulz K, et al. Better reporting of harms in randomised trials: an extension of the CONSORT statement. Ann Intern Med 2004;141: 781–788. [20] Chou R, Helfand M. Challenges in systematic reviews that assess treatment harms. Ann Intern Med 2005;142:1090–1099. [21] Vist GE, Hagen KB, Devereaux PJ, Bryant D, Kristoffersen DT, Oxman AD. Outcomes of patients who participate in randomised controlled trials compared to similar patients receiving similar interventions who do not participate. The Cochrane Database of Methodology Reviews, Issue 4, 2004. Art. No.: MR000009. DOI: 10. 1002/14651858.MR000009.pub2. [22] Beral V, Banks E, Reeves G. Evidence from randomised trials on the long-term effects of hormone replacement therapy. Lancet 2002;360: 942–944. [23] Bjelakovic G, Nikolova D, Simonetti RG, Gluud C. Antioxidants for preventing gastrointestinal cancers: a systematic Cochrane review and meta-analysis. Lancet 2004;364:1219–1228. [24] Jespersen CM, Als-Nielsen B, Damgaard M, Fischer Hansen J, Hansen S, Helø OH, et al. A randomised, placebo controlled, multicentre trial to assess short term clarithromycin for patients with stable coronary heart disease: CLARICOR trial. BMJ 2005. DOI: 10.bmj.38666.653600.55/bmj.38666.653600.55. [25] Gong Y, Gluud C. Methotrexate for primary biliary cirrhosis. The Cochrane Database of Systematic Reviews, Issue 3, 2005. Art. No.: CD004385. DOI: 10.1002/14651858.CD004385.pub2. [26] Ellenberg SS, Fleming TR, DeMets DL. Data monitoring committees in clinical trials. A practical perspective. London: Wiley; 2003. p. 1–191. [27] Young C, Horton R. Putting clinical trials into context. Lancet 2005; 366:107–108. [28] Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. J Am Med Assoc 1995;273: 408–412. [29] Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 1998;352:609–613. [30] Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomised trials in meta-analyses. Ann Intern Med 2001;135:982–989. [31] Balk EM, Boris PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, et al. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomised controlled trials. J Am Med Assoc 2002;287:2973–2982. [32] Als-Nielsen B, Chen W, Gluud LL, Siersma V, Hilden J, Gluud C. Are trial size and quality associated with treatment effects in randomised trials? Observational study of 523 randomised trials. 12th International cochrane colloquium, Ottawa; 2004. p. 102–3. [33] Als-Nielsen B, Gluud LL, Gluud C. Methodological quality and treatment effects in randomised trials—a review of six empirical studies. 12th International cochrane colloquium, Ottawa; 2004. p. 88–9. [34] Kjaergard LL, Nikolova D, Gluud C. Randomised trials in Hepatology: predictors of quality. Hepatology 1999;30:1134–1138. [35] Becker U, Burroughs AK, Cale´s P, Gluud C, Liberati A, Morabito A, et al. Trials in portal hypertension: valid meta-analyses and valid randomized clinical trials. In: de Francis R, editor. Portal hypertension II. Proceedings of the second baveno international consensus workshop on definitions, methodology and therapeutic strategies. Oxford: Blackwell Science Ltd; 1996. p. 180–209. [36] Gluud C, Kjaergard LL. Quality of trials in portal hypertension and other fields of hepatology. Third Baveno international consensus
C. Gluud / Journal of Hepatology 44 (2006) 607–615
[37]
[38] [39]
[40]
[41]
[42]
[43]
[44] [45] [46] [47]
[48]
[49]
[50]
[51] [52] [53] [54]
[55]
[56]
[57]
[58]
workshop. Portal hypertension into the third millennium. Definition, methodology and therapeutic strategies in portal hypertension. Oxford: Blackwell Science; 2001. p. 204–18. Kjaergard LL, Frederiksen S, Gluud C. Validity of randomized clinical trials in gastroenterology from 1964–2000. Gastroenterology 2002;122:1157–1160. Royal Society of Research. Personalised medicines: hopes and realities; 2005. p. 1–56. Banarjee SN, Raskob G, Hull RD, Brandstater M, Guyatt GH, Sackett DL. A new design to permit the simultaneous performance of explanatory and management randomised clinical trials. Clin Res 1984;32:543A. Brok J, Gluud LL, Gluud C. Ribavirin monotherapy for chronic hepatitis C. The Cochrane Database of Systematic Reviews, Issue 4, 2005. Art. No.: CD005527. DOI: 10.1002/14651858.CD005527. Brok J, Gluud LL, Gluud C. Ribavirin plus interferon versus interferon for chronic hepatitis C. The Cochrane Database of Systematic Reviews, Issue 2, 2005. Art. No.: CD005445. DOI: 10. 1002/14651858.CD005445. Moher D, Schulz KF, Altman D, for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. J Am Med Assoc 2001;285:1987–1991. Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134:663–694. Woods JR, Williams JG, Tavel M. The two-period crossover design in medical research. Ann Intern Med 1989;110:560–566. Senn SJ. Cross-over trials in clinical research. Chichester: Wiley; 2002. Chan A-W, Altman DG. Epidemiology and reporting of randomised trials published in PubMed journals. Lancet 2005;365:1159–1162. McAlister FA, Straus SE, Sackett DL, Altman DG. Analysis and reporting of factorial trials: a systematic review. J Am Med Assoc 2003;289:2545–2553. Campbell MK, Elbourne DR, Altman DG, CONSORT Group. CONSORT statement: extension to cluster randomised trials. BMJ 2004;328:702–708. McKinney WP, Young MJ, Hartz A, Lee MB. The inexact use of Fisher’s exact test in six major medical journals. J Am Med Assoc 1989;261:3430–3433. Montori VM, Devereaux PJ, Adhikari NKJ, Burns KEA, Eggert CH, Briel M, et al. Randomized trials stopped early for benefit. A systematic review. J Am Med Assoc 2005;294:2203–2209. Gluud C. Evidence based medicine in Liver. Liver 1999;19:1–2. Gluud C, Nikolova D. Quality assessment of reports on clinical trials in the journal of hepatology. J Hepatol 1998;29:321–327. Julious SA. Tutorial in biostatistics. Sample sizes for clinical trials with normal data. Stat Med 2004;23:1921–1986. Kjaergard LL, Gluud C. Funding, disease area, and internal validity of hepato-biliary randomised trials. Am J Gastroenterol 2002;97: 2708–2713. Kjaergard LL, Liu J, Als-Nielsen B, Gluud C. Artificial and bioartificial support systems for acute and acute-on-chronic liver failure: a systematic review. J Am Med Assoc 2003;289:217–222. Gluud LL. Bias in clinical intervention research. Methodological studies of systematic errors in randomised trials and observational studies. (Doctoral Dissertation). Faculty of Health Sciences. University of Copenhagen; 2005. p. 1–32. Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000;355: 1064–1069. Chan A-W, Hrobjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomised trials: comparison of protocols to published articles. J Am Med Assoc 2004;291:2457–2465.
615
[59] Chan A-W, Krleza-Jeric K, Schmid I, Altman DG. Outcome reporting bias in randomised trials funded by the Canadian Institutes of Health Research. Can Med Assoc J 2004;171:735–740. [60] Gluud C. ‘Negative trials’ are positive!. J Hepatol 1998;28:731–733. [61] Dickersin K, Rennie D. Registering clinical trials. J Am Med Assoc 2003;290:516–523. [62] Krleza-Jeric K, Chan A-W, Dickersin K, Sim I, Grimshaw J, Gluud C, et al. Principles for international registration of protocol information and results from human trials of health related interventions: Ottawa statement (part 1). BMJ 2005;330:956–958. [63] Kjaergard LL, Als-Nielsen B. Association between competing interests and authors’ conclusions: epidemiological study of randomised trials published in the BMJ. BMJ 2002;325:249. [64] Als-Nielsen B, Chen W, Gluud C, Kjaergard LL. Association of funding and conclusions in randomised drug trials: a reflection of treatment effect or adverse events? J Am Med Assoc 2003;290: 921–928. [65] Bekelman JE, Li Y, Gross CP. Scope and impact of financial conflicts of interest in biomedical research: a systematic review. J Am Med Assoc 2003;289:454–465. [66] Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical industry sponsorship and research outcome and quality: systematic review. BMJ 2003;326:1167–1170. [67] House of Commons Health Committee. The influence of the pharmaceutical industry. Fourth report of session; 2004–05, vol. I. Available from http://www.publications.parliament.uk/pa/cm200405/ cmselect/cmhealth/42/4202.htm. [68] Cochrane Handbook for Systematic Reviews of Interventions 4.2.4 [updated March 2005]. In: Higgins JPT, Green S, editors. The Cochrane Library. Chichester, UK: Wiley; 2005 [Issue 2]. [69] Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess 2003;7:1–173. [70] Ioannidis JA. Why most published research findings are false. Plos Med 2005;2:e124 [Epub 2005 Aug 30]. [71] Diamond GA, Kaul S. Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials. J Am Coll Cardiol 2004;43:1929–1939. [72] Devereaux PJ, Beattie WS, Choi PT, Badner NH, Guyatt GH, Villar JC, et al. How strong is the evidence for the use of perioperative beta blockers in non-cardiac surgery? Systematic review and metaanalysis of randomised controlled trials. BMJ 2005;331:313–321 [Epub 2005 Jul 4]. [73] Wetterslev J, Thorlund K, Brok J, Gluud C. Trial sequential analyses of six Cochrane Neonatal Review Group meta-analyses using actual information size (I). Clin Trial 2005;2:32–33. [74] Brok J, Thorlund K, Gluud C, Wetterslev J. Trial sequential analyses of six Cochrane Neonatal Review Group meta-analyses considering adequacy of allocation concealment (II). Clin Trial 2005;2:61–62. [75] Thorlund K, Wetterslev J, Brok J, Gluud C. Trial sequential analyses of six Cochrane Neonatal Review Group meta-analyses considering heterogeneity and trial weight (III). Clin Trial 2005;2:62. [76] Gluud C, Afroudakis AP, Caballeria J, Laskus T, Morgan M, Rueff B, et al. Diagnosis and treatment of alcoholic liver disease in Europe. First report. Gastroenterol Int 1993;6:221–230. [77] Ku¨rstein P, Gluud LL, Willemann M, Olsen KR, Kjellberg J, Sogaard J, et al., Agreement between reported use of interventions for liver diseases and research evidence in Cochrane systematic reviews. J Hepatol 2005; 43:984–989. [78] Non-random Reflections on Health Services Research. In: Maynard A, Chalmers I, On the 25th anniversary of Archie Cochrane’s Effectiveness and Efficiency. BMJ Publishing Group; 1997. p. 1–303. [79] Wang J, Gluud C, editors. Evidence-based medicine and clinical practice (in Chinese). Beijing: Science Publisher; 2002. p. 1–339. [80] Gluud C. Trials and errors in clinical research. Lancet 1999;354: SIV59.