PRINCIPLES OF ORTHOPAEDICS
Experimental design and statistics in orthopaedic research
remember the other essential elements involved in successful clinical research. An appropriate question must be formulated and the study designed to effectively answer this. There also needs to be the implementation of an efficient and effective way of collecting and assimilating the data. No statistical algorithm or technique will make up for a poorly designed study. The application of statistical methods to the analysis of biological and medical data seems to be an area which may be poorly understood by clinical researchers. This is in most part due to a lack of awareness in the application of the appropriate methods and analyses rather than mistakes in the techniques used.1 This allows well-designed studies to be let down by a poor examination of the data and the incorrect application of statistical methods. If ineffectively applied to clinical and medical research this can have significant healthcare and cost implications. The demands generated by the ever increasing development of new technologies and implants cannot be met by finite healthcare resources alone. Statistical science will always remain an essential step in the appropriate utilization of resources alongside improvement in patient care.2 The analysis therefore should be carefully performed and clearly described with the authors presenting enough information to allow the readers to evaluate the validity of the study results.3,4 It is crucial that the statistical analysis is considered at an early stage in the design of a study as it can help avoid potential problems once the data have been collected. It is very useful to enlist the expertise of a statistician from the outset but it must be remembered that this should not be a substitute for a basic grounding in statistical methods. The purpose of this article is to help provide a brief overview of the basic principles of study design and statistical analysis used in orthopaedic research.
Timothy Hardwick Alex Vaughan Julian Gaskin Stephen Bendall
Abstract An understanding of the fundamentals of experimental design and statistical analysis is essential in the undertaking of any successful clinical research. There is no substitute for methodological quality but even well-designed studies can be let down by a poor examination of the data and incorrect application of statistical methods. The purpose of this article was to help provide an overview of the basic principles used in the design of clinical research. It covers both observational and experimental studies including case control studies, cohort studies and the gold standard randomized control trials (RCTs). It highlights the importance of sample power, type I and type II error, study randomization and blinding, and the effects of bias on outcome measures. The article uses examples from the published orthopaedic literature to stress the importance of these variables and the need for caution when interpreting the results. As a reader the most important thing is to maintain a critical eye with particular focus on the experimental design and study power and any interpretation of the data must be coupled with an awareness of the potential pitfalls.
Keywords case control study; cohort study; orthopaedic research;
Statistics in biological systems
RCT; statistics; study design
Qureshi et al., used Heisenberg’s uncertainty principle as an example to help explain dependent and independent variables. The theory proposed that an electrons spatial location was best understood as existing within a cloud of probability where a precise location was ‘uncertain’. This uncertainty arises from an understanding that countless factors exert a varying magnitude of influence on the behaviour of the electron. In the same way it is important to understand that when biological systems are subjected to observation the one intended true measure of a variable is rarely observed due to variation in the phenomenon of interest as a result of complex interplay of competing influences.2
Introduction The undertaking of a research project is a process. Although the statistical analysis of data is a fundamental part it is important to
Timothy Hardwick MBChB(Hons) BSc(Hons) MSc(Mrt) MRCS Specialist Trainee in Orthopaedic Surgery, Department of Orthopaedics, Brighton and Sussex University Hospitals NHS Trust, Royal County Hospital, Brighton, UK. Conflicts of interest: none declared.
Dependent and independent variables The dependent variable which represents the output or outcome whose variation is being studied will be affected by independent variables which represent inputs or causes and potential reasons for variations. For example, is it possible to quantify the observed variation in the survival of a particular orthopaedic implant and which independent variables (e.g. age, co-morbidity, infection) have the greatest effect on survivorship. By knowing the potential effect of these independent variables is it then possible to predict the length of survivorship of the implant in a given cohort of patients. Statistics attempts to address such questions, which in turn helps guide clinical decision-making.
Alex Vaughan MBBS BSc MEd Specialist Trainee in Orthopaedic Surgery, Department of Orthopaedics, Brighton and Sussex University Hospitals NHS Trust, Royal County Hospital, Brighton, UK. Conflicts of interest: none declared. Julian Gaskin BSc(Hons) MBBS MRCS DipSEM(UK) MSc(SEM) FEBOT DipSICOT Senior Fellow in Orthopaedic Surgery, Department of Orthopaedics, Brighton and Sussex University Hospitals NHS Trust, Royal County Hospital, Brighton, UK. Conflicts of interest: none declared. Stephen Bendall MBBS FRCS FRCS (Orth) Consultant in Orthopaedic Surgery, Department of Orthopaedics, Brighton and Sussex University Hospitals NHS Trust, Royal County Hospital, Brighton, UK. Conflicts of interest: none declared.
ORTHOPAEDICS AND TRAUMA --:-
1
Crown Copyright Ó 2017 Published by Elsevier Ltd. All rights reserved.
Please cite this article in press as: Hardwick T, et al., Experimental design and statistics in orthopaedic research, Orthopaedics and Trauma (2017), http://dx.doi.org/10.1016/j.mporth.2017.07.006
PRINCIPLES OF ORTHOPAEDICS
However, in this prospective design, there is still a potential selection bias: investigators can choose which patients to enrol unless all patients with a given diagnosis are included (i.e. a consecutive series). Even if all patients are included, there can still be other forms of bias, such as referral bias which happens frequently when cases are selected in a hospital whose activity is linked to the studied exposure or diagnostic bias in which specific criteria are required for diagnosis but potentially exclude other patients.5 It is also possible to carry out retrospective cohort studies relying on databases and hospital records. The studies can be limited by the quality of the records and often it is difficult to have retrospective access to longitudinal data sets. The exception would be large databases in countries with centralized healthcare systems and as patients are followed up over time, associations between risk factors and outcomes can, with caution, be interpreted as causative.6 In such settings, there is always the possibility that data for particular patients are not complete. Often the data missing are random, but sometimes patterns of missing data vary within the data set itself. In these situations it is important to consider how the data is analysed. In some cases the patients with data missing are removed in their entirety leaving a single homogenous population for the rest of the analysis. This is referred to as complete case analysis. Another option would be to only remove patients from consideration for those variables for which they are missing data. This option is referred to as available case analysis and is relatively simple to do. It does however leave different subpopulations for each analysis undertaken and can introduce bias, particularly if the missing data within the population are not random. A third option is conditional or unconditional mean imputation where the missing values are replaced with a mean of the remaining non-missing values. This is either an actual mean or an estimate derived from a regression with some random variation added to the estimate.13 Whatever method is used care must always be taken when using missing data sets to ensure that the validity of the study is maintained. The relative risk, which manages the magnitude of an association between an exposed group and a non-exposed group, is commonly used to assess the effect of a risk factor in a cohort study.1 If the relative risk is 1, then the risk is the same in the exposed group and non-exposed group. If in our hip example the relative risk is 3, then the individual is three times more likely to develop a hip dislocation postoperatively if the factor is present. The relative risk must not be considered in isolation but in relation to the absolute risk. If the absolute risk is very low then even if the relative risk is high the overall risk is still very small. In cohort studies it is impossible to estimate the relative risk directly as some individuals will have the condition at the outset and so their risk of developing it cannot be evaluated. In this case the odds ratio is used which is a measure of the association between an exposure and an outcome. The odds of developing a condition in those exposed to the risk factor divided by the odds in those not exposed.
Study design The purpose of a study is to address a scientific hypothesis while minimizing bias. Studies can be broadly divided into either observational or experimental. In observational studies the independent variable is not under the control of the researcher and the outcomes of interest are observed along with the factors which contribute to them. If an attempt is made to assess a relationship between the two, the study is termed epidemiological. In contrast in the design of experimental studies the investigator intervenes in some way to affect the outcome. Such studies are longitudinal and prospective with the investigator applying an intervention and observing the outcome at a future time point.1 Bias refers to the tendency of a measurement process to over- or underestimate the value of a population parameter. Bias can be minimized by using a control group or/and prospective enrolment or/and randomization of patient or/and blinding.5 Observational studies There are different types of observational study. Cross-sectional study: it may be cross-sectional in which all observations are made at a single point in time showing the incidence or prevalence of an event in a specified population, or more often longitudinal where individuals are followed over a period of time, either prospectively or retrospectively.1 Case-control study: a case-control study is an example of a retrospective observational study in which a control group is added. As the number of patients is increased there is a reduction in the chance of observing a random result and therefore is less prone to bias than a study without such a control group.5 In examining the occurrence of disease or complications after surgery, the investigators do not have the luxury of randomization. For example, patients cannot be randomly assigned to have a dislocation following a total hip replacement (THR) or develop a deep infection. Instead the study is designed to look at those patients with a given complication and compare them to those without the complication, looking for risk factors. Causal relationships are much more difficult to tease out from case-control studies than from randomized control trials (RCTs). The things to look for in case-control studies are the inclusion and exclusion criteria, whether the groups were matched to account for any extraneous variables, and the overall population from which the sample was taken.6 Did the dislocation group consist overwhelmingly of patients with a high body mass index? Cohort study: in a cohort study information is collected on individuals and comparisons made to determine whether a particular factor (risk factor) occurs more or less frequently in those who develop a given condition or complication. The studies can be either prospective or retrospective.6 A longitudinal cohort study is a prospective study in which a group of patients is followed longitudinally while the baseline parameters and their evolution are recorded. The measurement tools are chosen before the patients are included in the study. For example, one could study the quality of life and range of movement in shoulder patients prior to and after arthroscopic rotator cuff repair.
ORTHOPAEDICS AND TRAUMA --:-
Experimental studies including clinical trials A clinical trial should be comparative. In statistical terminology if it is comparative it is controlled. If for example you are investigating the effect of a new implant and have nothing to compare it
2
Crown Copyright Ó 2017 Published by Elsevier Ltd. All rights reserved.
Please cite this article in press as: Hardwick T, et al., Experimental design and statistics in orthopaedic research, Orthopaedics and Trauma (2017), http://dx.doi.org/10.1016/j.mporth.2017.07.006
PRINCIPLES OF ORTHOPAEDICS
orthopaedic literature are inadequately powered.8 The underlying conclusions of rejecting or retaining the null hypothesis are particularly dependent on the statistical power. A power calculation is usually carried out prior to data collection to establish the sample size required to determine the number of patients required to detect a clinically significant treatment effect and to minimize type I and type II error occurring.
to the investigator cannot be sure that any of the observed outcomes are not down to chance alone or other influencing factors. The control group may be either, positive in which the individual receives active treatment, or negative, with no active treatment or a placebo. The way in which comparable groups are achieved at baseline is to allocate individuals to different treatment groups via randomization, a method based on chance. It is also important to design the trial so it is free from systematic subjectivity and treatment bias. This may arise from the patient or investigator believing that one treatment arm is superior (assessment bias) or the surgeon may make a decision to withdraw a patient from the study with prior knowledge of the treatment group (attrition bias). The way to avoid this is by introducing blinding.5 A single blinded study would only have the patient, the surgeon or the researcher blinded. If two of the individuals could be blinded then the study would be doubleblinded and likewise if all three were blinded it would be triple-blinded. The blinding should be maintained till the end of the study, after the statistical analysis to achieve the best level of confidence in the data recorded.7 Triple blinded randomized control trials (RCTs) therefore provide the highest level of reliable evidence and are considered the gold standard design for clinical research. This is because randomization remains the only way to minimize selection bias. The control of all potential confounding factors is enhanced and given adequate sample size (power), even unknown factors will be distributed more evenly across the two groups of patients under study. It is important to remember however that an RCT is not always better than other types of observational studies. If an RCT is not blinded, has a small sample size or many protocol violations the quality of the study will be low.5 While RCTs represent a small proportion of original research published in surgical journals, they still represent an important component of the literature and a high level of evidence.14 This literature however appears to indicate that surgical RCTs lag behind the general literature in terms of methodological quality. This mainly refers to the formal aspects of study design, performance and analysis. For example, one study found that only 33% of RCTs published in surgical journals were of high quality when compared to 75% in medical journals.15 RCTs in orthopaedic surgery appear to be no better, with over 50% of the RCTs in one study lacking the proper concealment of randomization, blinding of outcome assessors and reporting reasons for excluding patients.16 A study by Chess et al.,14 assessed the quality of the methodology in orthopaedics-related RCTs against ten criteria which included randomization methods, participant blinding, outcome assessor blinding and outcome measurements. A total of 232 RCTs from top orthopaedic journals (based on impact factor) were included and they found that only 49% of the criteria were fulfilled across these journals with 42% of the criteria not amenable to assessment due to inadequate reporting. This highlighted obvious flaws in the methodology which can lead to biased estimates of potential effect. It is important to remember that just because a study is classed as level 1 evidence it does not imply that the design is without flaws and that it is not subject to biased reporting of outcome measures.14 A recent article published in the Journal of Bone and Joint Surgery hypothesized that a substantial proportion of RCTs in the
ORTHOPAEDICS AND TRAUMA --:-
Type I error: a type I error occurs when a treatment effect has been found but does not actually exist (a false-positive). This probability (p) is denoted as a and is typically set at 0.05. This suggests that there is a 5% chance of a significant effect occurring as a result of chance. Type II error: a type II error occurs when no treatment effect is found, when in fact such an effect does exist (a false-negative). The probability of a type II error occurring is known as b. The probability of avoiding a type II error is derived from equation (1)-b, and is known as the power of the study. Adequate power of a study has been defined at 80%.18 When a treatment effect is statistically significant (P < 0.05) the result infers a sufficient sample size. However studies that fail to show a difference may suffer type II error because the sample size is not large enough for smaller treatment effects to be detected.17 Abdullah et al.,8 found that 27.9% of 215 RCTs in the orthopaedic literature which had negative findings for the primary outcomes (failed to reject the null hypothesis) were underpowered. Therefore if an RCT lacks adequate statistical power to identify a clinically meaningful absence of a difference between two groups, there is an unacceptable risk of inappropriately failing to reject the null hypothesis. Statistically significant findings in small trials can occur at the consequence of very large differences between treatments (treatment effect). It is not uncommon for RCTs to report relative risk reductions larger than 50% when comparing one treatment with another.9,10 Sung et al.,12 found this to be the case in a review of RCTs in the orthopaedic trauma literature with the average study in the review having a sample size of 81 but reporting a large beneficial treatment effect (61% relative risk reduction). Surgeons should consider the plausibility of the magnitude of the treatment effect because chance effects do occur and statistical simulation studies have shown that RCTs can overestimate the magnitude of a treatment effect.11 Authors should cautiously interpret the positive findings of studies when sample sizes and the number of reported outcome events are small. As the number of outcome events increase the surgeon can have greater confidence in the reported magnitude of the treatment effect. For example, a trial claiming that reamed intramedullary tibial nails reduce the risk of revision surgery by 50% in 1000 patients with 200 outcome events is far less likely to be influenced by random chance than a similar study of 100 patients with 20 outcome events.12 In this respect multicentre trials are usually required as single centre studies will rarely be able to enrol sufficient numbers of patients.
Meta-analysis and systematic reviews In contrast to the other types of studies mentioned earlier, metaanalyses and systematic reviews do not explore new data. These
3
Crown Copyright Ó 2017 Published by Elsevier Ltd. All rights reserved.
Please cite this article in press as: Hardwick T, et al., Experimental design and statistics in orthopaedic research, Orthopaedics and Trauma (2017), http://dx.doi.org/10.1016/j.mporth.2017.07.006
PRINCIPLES OF ORTHOPAEDICS
reviews are useful for synthesizing the results of multiple primary investigations with the use of strategies to limit bias and random error.19 A quantitative systematic review or metaanalysis is a review in which the statistical methods are used to combine the results of two or more studies. All systematic reviews are retrospective and observational. Therefore they are subject to systematic and random error. Thus, the quality of a systematic review and accordingly its validity are dependent on the scientific methods that have been used to minimize error and bias.21 A well-conducted meta-analysis is invaluable for surgeons since it is unusual for a single trial to provide definitive answers to clinical questions. Moreover a well-conducted quantitative review may resolve discrepancies between studies with conflicting results. The guiding principles in the conduct of a metaanalysis include the use of a specific heathcare question, a comprehensive search strategy, assessment of the reproducibility of study selection, assessment of the study validity, evaluation of heterogeneity (differences in effect across studies), inclusion of all relevant and clinically useful measures of treatment effect and tests of the robustness of results relative to the features of the primary studies (sensitivity analysis).20 Orthopaedic surgeons must be aware of the limitations and risks of meta-analyses and must strive to limit bias and ensure good accepted scientific methodology. Given the increased use of meta-analyses in the orthopaedic literature, Bhandari et al.,21 identified meta-analyses on orthopaedic surgery related topics with the view to determining the methodological qualities in the published literature. Their conclusions were that the majority of meta-analyses have methodological limitations. The limitation of bias and improvement in validity can be achieved by the adherence to strict scientific methodology. However the ultimate quality of a meta-analysis depends on the quality of the primary studies on which it is based and is most persuasive when data from high-quality RCTs are pooled.21
REFERENCES 1 Petrie A. Statistics in orthopaedic papers. J Bone Joint Surg Br 2006; 88-B: 1121e36. 2 Qureshi AA, Ibrahim T. Statistical tests in orthopaedic research. Orthop Trauma 2010; 24: 463e72. 3 Leopold SS, Porcher R. Reporting statistics in abstracts in clinical orthopaedics and related research. Clin Orthop Relat Res 2013; 471: 1739e40. 4 Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the “Statistical Analyses and Methods in the Published Literature” or the SAMPL guidelines. In: Smart P, Maisonneuve H, Polderman A, eds. Science Editors’ handbook, European Association of Science Editors 2013. 5 Jolles BM, Martin E. In brief: statistics in brief: study designs in orthopaedic clinical research. Clin Orthop Relat Res 2011; 469: 909e13. 6 Jupiter DC. Made to measure: designs tailored to your study needs. J Foot Ankle Surg 2015; 54: 1001e2. 7 Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. J Am Med Assoc 1995; 273: 408e12. 8 Abdullah L, Davis DE, Fabricant PD, Baldwin K, Namdari S. Is there truly “no significant difference”? underpowered randomised controlled trials in the orthopaedic literature. J Bone Joint Surg Am 2015; 97: 2068e73. 9 Khan RJ, Fick D, Brammar TJ, Crawford J, Parker MJ. Interventions for treating acute Achilles tendon ruptures. Cochrane Database Syst Rev 2004; CD003674. 10 Bhandari M, Devereaux PJ, Swiontkowski MF, et al. Internal fixation compared with arthroplasty for displaced fractures of the femoral neck: a meta-analysis. J Bone Joint Surg Am 2003; 85A: 1673e81. 11 Pocock S, Hughes MD. Practical problems in interim analyses with particular regard to estimation. Control Clin Trials 1989; 10(suppl 4). 209S-22SI. 12 Sung J, Siegel J, Tornetta P, Bhandari M. The orthopaedic trauma literature: an evaluation of statistically significant findings in orthopaedic trauma randomised trials. BMC Musculoskelet Disord 2008; 9: 14. 13 Jupiter DC. Fill in the blanks: a tale of data gone missing. J Foot Ankle Surg 2016; 55: 437e8. 14 Chess LE, Gagnier J. Risk of bias of randomized controlled trials published in orthopaedic journals. BMC Med Res Methodol 2013; 13: 76. 15 Karanicolas PJ, Bhandari M, Teromi B, et al. Blinding of outcomes in trials of orthopaedic trauma: an opportunity to enhance the validity of clinical trials. J Bone Joint Surg Am 2008; 90: 1026e33. 16 Chan S, Bhandari M. The quality of reporting of orthopaedic randomised trials with the use of a checklist for nonpharmacological therapies. J Bone Joint Surg Am 2007; 89: 1970e8. 17 Sabharwal S, Patel NK, Holloway I, Athanasiou T. Sample size calculations in orthopaedics randomised controlled trials: revisiting research practices. Acta Orthop Belg 2015; 81: 115e22. 18 Freedman KB, Bernstein J. Sample size and statistical power in clinical orthopaedic research. J Bone Joint Surg Am 1999; 81: 1454e60.
Conclusion Most journals require a description of published studies by level of evidence. The triple-blinded RCT remains the gold standard for a reliable and objective method to eliminate bias and produce solid evidence. Care must always be taken however when interpreting the results of such studies. Particular focus on the study design and power are crucial in the interpretation of the outcome measures. It is also recognized that it is not always practical in orthopaedic clinical research. An RCT would be the choice to investigate a novel treatment, but to identify risk factors for a disease outcome a cohort or casecontrol study would be more appropriate. The case report which informs surgeons about a rare but devastating complication would be low down in terms of level of evidence but still essential from a clinical standpoint. All study designs are useful with the best being those that can provide the most appropriate evidence to answer the research question. As the reader the most important thing is to have a critical eye and an awareness of the potential pitfalls associated with the experimental design. A
ORTHOPAEDICS AND TRAUMA --:-
4
Crown Copyright Ó 2017 Published by Elsevier Ltd. All rights reserved.
Please cite this article in press as: Hardwick T, et al., Experimental design and statistics in orthopaedic research, Orthopaedics and Trauma (2017), http://dx.doi.org/10.1016/j.mporth.2017.07.006
PRINCIPLES OF ORTHOPAEDICS
the Potsdam Consultation on Meta-analysis. J Clin Epidemiol 1995; 48: 167e71. 21 Bhandari M, Morrow F, Kulkarni AV, Tornetta P. Meta-analyses in orthopaedic surgery. J Bone Joint Surg 2001; 83-A: 15e24.
19 Cook DJ, Mulrow CD, Haynes RB. In: Mulrow C, Cook D, eds. Systematic reviews: synthesis of best evidence for health care decisions. Philadelphia: American College of Physicians, 1998; 5e12. 20 Cook DJ, Sackett DL, Spitzer WO. Methodological guidelines for systematic reviews of randomised control trials in health care from
ORTHOPAEDICS AND TRAUMA --:-
5
Crown Copyright Ó 2017 Published by Elsevier Ltd. All rights reserved.
Please cite this article in press as: Hardwick T, et al., Experimental design and statistics in orthopaedic research, Orthopaedics and Trauma (2017), http://dx.doi.org/10.1016/j.mporth.2017.07.006