Chapter 116
Systematic literature review and meta-analysis: The case of medical devices and medical locations Luis Montesinosa, Leandro Pecchiab a
School of Engineering and Sciences, Tecnologico de Monterrey, Mexico City, Mexico, bSchool of Engineering, University of Warwick, Coventry, United Kingdom
Introduction Health technology assessment has been defined as the “systematic evaluation of the properties, effects, and/or impacts of health technology” (International Network of Agencies for Health Technology Assessment and Health Technology Assessment International, n.d.). Participation by clinical engineers in the assessment of health technology can directly contribute to and affect the quality of patient care and patient outcomes. Innovative health technologies (e.g., medical devices, systems, and procedures) are typically tested more than once, often by different research teams in different sites. The results of multiple tests are often divergent and even conflicting, which makes health technology assessment a challenging endeavor (Haidich, 2010). Systematic reviews of the literature and meta-analysis represent two useful tools to tackle the challenge of multiplicity and divergence of results above. They allow deriving evidence-based conclusions about a new health technology based on the body of research produced by different studies. A systematic review has been defined as “a review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze data from the studies that are included in the review” (Moher et al., 2009). Additionally, the results from multiple studies can be analyzed and combined using statistical models and methods, a procedure called meta-analysis (Moher et al., 2009). A meta-analysis may produce, for instance, a more precise estimate of the effect of an innovative surgical procedure (e.g., robotically assisted surgery) than
Clinical Engineering Handbook. https://doi.org/10.1016/B978-0-12-813467-2.00117-6 Copyright © 2020 Elsevier Inc. All rights reserved.
any individual study contributing to the pooled analysis (Haidich, 2010). Preferably, systematic reviews and meta-analyses are based on randomized controlled trials. Nevertheless, earlystage health technology assessment may require the systematic review and meta-analysis of observational studies. A detailed description of the process, methods, and tools to perform a systematic review and meta-analysis is beyond the scope of this chapter. Our aim is only to familiarize the reader with the main concepts. For more details, the reader will be referred to other books, journal articles, and websites.
Methodology for conducting a systematic review and meta-analysis Conducting a systematic review and meta-analysis of the literature entails a series of steps that are similar to any other research endeavor: problem definition, data collection and analysis, and results interpretation. Similarly, a detailed protocol which clearly describes the research question, the subgroups of interest, and the methods and criteria to be employed for identifying and selecting relevant studies and extracting and analyzing information should be written in advance. More specifically, performing a systematic review involves seven steps, which are depicted in Fig. 1. The Cochrane Handbook for Systematic Reviews of Interventions provides detailed guidance for the preparation of systematic reviews (Higgins and Green, 2011). This section summarizes the contents of Part 2 of that handbook. The reader is invited to refer to the original source for more information.
821
822 SECTION | 12 Health technology assessment
●
●
●
FIG. 1 Workflow for conducting a systematic review and meta-analysis.
Defining the review question and the eligibility criteria The first and most important step in preparing a systematic review is to define the review question. In a systematic review, the research question is broadly expressed through a main “Objective” of the review and detailed through a set of “Eligibility criteria” for including studies in the review. Ideally, the main “Objective” of the review should be a single sentence of the form: To [assess OR compare] the effects of [intervention(s)] for [health problem] in [types of subjects OR disease AND setting (if specified)].
Eligibility criteria, for their part, are a combination of elements of the research question plus specification of the type of studies that will be considered for inclusion in the review and the type of effect measures (outcomes) that will be analyzed. Namely, eligibility criteria include: ●
Types of participants. The types of participants are firstly defined by the disease or condition of interest. In addition, other traits such as age group, sex, race, or ed-
ucational status are included in the definition of the population of interest. These criteria should be sufficiently broad to encompass as many studies as possible, but sufficiently narrow to ensure that the aggregate population has an acceptable level of homogeneity. Types of interventions. Both the intervention of interest and the intervention against which this will be compared must be clearly defined (e.g., laparoscopic versus open surgery). Comparisons commonly found in clinical trials reports are placebo, standard care, and a different variant of the same intervention. Types of outcomes. Outcomes are measures of the effect of an intervention and can include both beneficial and adverse effects. They may be measured objectively (e.g., blood pressure) or subjectively as rated by a clinician, patient, or career (e.g., pain scales). Outcomes include survival (mortality), clinical events (e.g., strokes or myocardial infarction), patient-reported outcomes (e.g., symptoms, quality of life), adverse events (e.g., falls), burdens (e.g., demands on caregivers, frequency of tests, restrictions on lifestyle), and economic outcomes (e.g., cost and resource use). A systematic review can include one or more primary outcomes plus one or more secondary outcomes. Types of studies. Ideally, systematic reviews should focus on randomized controlled trials (RCTs), as they represent the most reliable study design in terms of prevention of confounders. However, RCTs are not always available. In this case, cohort studies can be included in the review, but between-study heterogeneity must be appropriately accounted for during the analysis of data and the interpretation of results. More details about how to do this are given in subsequent sections.
Searching for studies This step involves defining the databases where the search for relevant studies will be conducted, as well as a search strategy. Searches of pertinent bibliographic databases, such as MEDLINE and EMBASE, are the easiest way to identify an initial set of potentially relevant reports of studies. MEDLINE is compiled by the United States National Library of Medicine and is freely available on the Internet via PubMed. EMBASE is compiled by Elsevier and is available to individuals for a fee, with several universities, hospitals, and professional organizations offering access to affiliated members. It is never a good idea to search for study reports in a single database, as the overlap of indexed journals between them is far below 100%: MEDLINE indexes journals mostly published in the United States, whereas EMBASE indexes journals mostly published in Europe. Additionally, if RCTs are the focus of the review, searching in the Cochrane Central Register of Controlled Trials (CENTRAL) is a must.
Systematic literature review and meta-analysis Chapter | 116 823
The second decision to be made when searching for studies is the choice of a search strategy. A search strategy consists of a set of search terms and the way in which these will be inputted and combined during the actual search by using wildcards and logical operators. In general, a search strategy will contain terms drawn from the objective and the eligibility criteria of the review. For instance, the health condition of interest, the intervention(s) evaluated (compared) and the targeted population. Search terms can be either free text (i.e., any word selected by the reviewers) or standardized indexing terms developed by some databases. When free text is used only those specific words will be searched in the title and abstract of the database records, as specified by the user. When standard terms are used, a list of predefined terms is also searched. MEDLINE and EMBASE have their own taxonomy of standardized indexing terms called MeSH and EMTREE, respectively. When designing a search strategy, wildcards, and logical operators allow being as comprehensive as possible by considering synonyms, related terms, and variant spellings. For instance, reviewers might be interested in retrieving bibliographic records containing either “randomized” or “randomised,” which can be achieved using the term “random*.” Likewise, they might be interested in study reports containing related terms such as “brain” and “head,” in which case the search “brain” OR “head” would do the work.
Additional potentially relevant study reports can be identified by looking at the list of references provided in other reports (called linear search) or by searching manually in key journals.
Selecting studies and extracting the data The selection of studies for inclusion in the review also implies a sequence of steps: 1. Merging the results of searches from different databases and removing duplicate records of the same report. 2. Screening titles and abstracts to remove clearly irrelevant reports. 3. Retrieving the full text of potentially relevant studies and linking together multiple reports of the same study. 4. Scrutinizing full-text reports to verify the agreement of studies with eligibility criteria. 5. Making a final decision on study inclusion. It is highly advisable that the selection of studies is carried out by at least two reviewers, in order to reduce the risk to miss a relevant study due to human misjudgment or error. A third reviewer may resolve controversies regarding the eligibility of specific studies. Reviewers must keep a record of the selection process, as they will be expected to provide details of the number of studies included and excluded at each stage with the reasons for exclusion, as recommended by Moher et al. in the PRISMA statement (Moher et al., 2009) (Fig. 2).
FIG. 2 Flow diagram of information through the different phases of a systematic review. (Adapted from Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., 2009. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6 (7), 6.)
824 SECTION | 12 Health technology assessment
Reviewers must decide in advance which data they are going to extract from selected studies and design the forms to be used for data collection (i.e., a spreadsheet with fields for relevant pieces of data). Data can include details of methods, participants, setting, interventions, outcomes, results, and potential sources of bias, among many others. Most probably, one form will be needed to collect general characteristics of the included studies and one form or more to collect the results (i.e., one form per outcome).
Assessing the risk of bias in selected studies A bias is a deviation of a study’s results from the true intervention effect, mainly due to flaws in the design, implementation, or analysis of the study. This deviation can be very small or substantial in magnitude and is totally different from the random error (imprecision) expected from any experiment due to sampling variation. Many tools have been proposed for assessing the risk of bias of studies when performing a systematic review. Some of them are scales, in which various components of study are scored and combined to give a summary score. Some others are checklists, in which open questions are asked to the reviewer. For systematic reviews focusing exclusively on RCTs the recommended tool would be the Cochrane Collaboration’s tool for assessing risk of bias (Higgins et al., 2011), which addresses seven domains: sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective outcome reporting, and “other issues.” For systematic reviews including both randomized and nonrandomized studies, we have found useful the authors have found useful the checklist developed by Downs and Black (1998). Two actions are expected from the review authors as a result of the risk of bias assessment. Firstly, they should make explicit in the review report about their findings of the risk of bias against both within and across studies and consider these findings in the interpretation of the results. Secondly, they should take into account the level of risk of bias for individual studies when deciding whether to include them in the meta-analysis of outcomes or not. A trade-off between bias and precision must be made. Including all eligible studies in the meta-analysis regardless of their risk of bias will produce results with a high precision (narrow confidence interval), but extremely biased. In contrast, including only those studies with evidence of low risk of bias will produce unbiased but imprecise results.
Analyzing data and undertaking meta-analyses Analyses may be narrative, such as a structured summary and discussion of the studies’ characteristics and findings, or quantitative, that is, involving statistical analysis. Metaanalysis—the statistical combination of results from two or
more separate studies—is the most commonly used statistical technique. Studies comparing healthcare interventions, notably randomized trials, use the outcomes of participants to compare the effects of different interventions. Meta-analyses focus on pair-wise comparisons of interventions, such as an experimental intervention versus a control intervention, or the comparison of two experimental interventions. The contrast between the outcomes of two groups treated differently is known as the “effect,” the “treatment effect,” or the “intervention effect.” Whether analysis of included studies is narrative or quantitative, a general framework for synthesis may be provided by considering four questions: 1. What is the direction of the effect? 2. What is the size of the effect? 3. Is the effect consistent across studies? 4. What is the strength of the evidence for the effect? Meta-analysis provides a statistical method for questions 1–3. Assessment of question 4 relies additionally on judgments based on assessments of study design and risk of bias, as well as statistical measures of uncertainty. Narrative synthesis uses subjective (rather than statistical) methods to follow through questions 1–4, for reviews where meta-analysis is either not feasible or not sensible. In a narrative synthesis, the method used for each stage should be prespecified, justified, and followed systematically. Bias may be introduced if the results of one study are inappropriately stressed over those of another.
Presenting results The Results section of a review should summarize the findings in a clear and logical order, and should explicitly address the objectives of the review. Review authors can use a variety of tables and figures to present information in a more convenient format: ●
●
●
● ●
“Characteristics of included studies” tables (including “Risk of bias” tables) “Data and analyses” (the full set of data tables and forest plots) Figures (a selection of study flow diagrams, forest plots, funnel plots, “Risk of bias” plots and other figures) “Summary of findings” tables Additional tables
“Characteristics of included studies” tables present information on individual studies; “Data and analyses” tables and forest plots present outcome data from individual studies and may additionally include meta-analyses; “Summary of findings” tables present the cumulative information, data, and quality of evidence for the most important outcomes. The findings of a review also must be summarized for an abstract and for a plain language summary.
Systematic literature review and meta-analysis Chapter | 116 825
Interpreting results and drawing conclusions The purpose of systematic reviews in the context of health technology assessment is to facilitate decision-making by clinicians, administrators, and clinical engineers. A clear statement of findings, a considered discussion, and a clear presentation of the conclusions are essential parts of the review. In particular, the following issues can help people make better-informed decisions: ●
● ●
Information on all critical outcomes, including adverse outcomes. The quality of the evidence for each of these outcomes. Clarification of how particular values and preferences may bear on the balance of benefits, harms, burden, and costs of the intervention.
Understating outcome measures This section introduces the most common outcomes in systematic reviews and meta-analysis but does not represent an exhaustive list. More details can be found elsewhere (Sutton, 2000).
Binary outcomes Binary outcomes are computed from data contained in a 2 × 2 table as the one presented in Fig. 3.
Odds ratio The odds ratio (OR) is a relative measure of the chance of the event of interest in the form of the ratio of the odds of an event in the two groups (i.e., intervention and control groups). The OR can be calculated through the formula: OR =
ad , bc
(a) New treatment Control
Success / alive b d
(A) Exposed Not exposed
Diseased (cases) a c
varln ( OR ) =
1 1 1 1 + + + a b c d
If we assume the ln(OR) is normally distributed, then the 95% confidence interval is given by ln ( OR ) ± 1.96x varln ( OR )
Relative risk The relative risk (RR) is defined as the probability of an event in the treatment group divided by the probability of an event in the control group. Thus a RR = a + b c c+d It is also usual to use a logarithmic scale when combining studies; thus, the variance of the log RR is given by varln ( RR ) =
1 1 1 1 − + − a a+b c c+d
And assuming normally distributed RR, the 95% confidence interval is given by ln ( RR ) ± 1.96x varln ( RR )
Continuous outcomes
where a, b, c, and d relate to the cells of Fig. 3A and B. In an RCT, for an outcome considered desirable, an OR<1, when comparing a new intervention to the control, would indicate the intervention was less effective than any intervention
Failure / dead a c
r eceived by the control group, while an OR >1 would imply an improvement on the new intervention; (the converse is true for undesirable outcomes). It is recommended to transform the data by taking the natural logarithm of the OR, as this would provide a measure whose distribution is closer to normal. The variance of log OR is
Non diseased (controls) b d
(B) FIG. 3 Outcome data for a single (A) RCT, (B) case-control study.
There are as many different continuous scales as there are measure outcomes in the healthcare literature (e.g., blood glucose level, heart rate, and weight). In an RCT, the parameter of interest is usually the difference in average effect between the treatment and control groups. When all the studies measured the same parameter, the studies can be combined directly using the original scale used to measure the outcome variable. Thus, the measure of the treatment effect (T) is given by T= µt − µc where μt and μc are the mean responses in the treatment and control groups, respectively. The variance of this treatment difference is var ( T ) = σ 2 (1 / nt + 1 / nc )
826 SECTION | 12 Health technology assessment
When different studies measure the outcomes in different scales, then the data needs to be transformed into a single standardized scale before being combined. Thus, the effect size of an experiment, d, is defined as = d
( µt − µc ) / s*
where μt and μ are the sample means of the treated and control groups, respectively, and s⁎ is the estimate of the standard deviation of the study. If the data can be assumed to be normal, the variance of d can be estimated as var ( d ) =
nt + nc d2 + nt nc 2 ( nt + nc )
where nt is the sample size for the treatment group, nc is the sample size for the control group, and σ2 is the variance, assumed common to both groups and estimated from the data (denoted as s2). Alternatively, a pooled standard deviation can be estimated as s=
(nt − 1)(s ) + (nc − 1)(s ) nt + nc − 2 t 2
c 2
Assessing between-study heterogeneity In any meta-analysis, the estimates of the effect size from the different studies being considered will almost always differ to some extent. This is to be expected and is at least partly due to sampling error which is present in every estimate. When effect sizes vary only due to sampling error, differences between estimates are random variation and not due to systematic differences between studies. (i.e., the true effect is the same in each study). In this case, the effect estimates are considered to be homogenous. This source of variation can be dealt with in the meta-analysis by using a fixed-effects model. Otherwise, differences between studies are considered heterogeneous, and a random-effects model is more suitable (see “An introduction to fixed-effect and random-effects model for meta-analysis” section). Variation between individual study estimates can be examined graphically using forest plots. Fig. 4 shows an
FIG. 4 A forest plot.
exemplary forest plot for the meta-analysis of the mean difference in step time (outcome) between older adults without a history of falls (control) and with a history of falls (case), as measured using wearable inertial sensors (Montesinos et al., 2018). Forest plots are a very popular choice for reporting the results of a meta-analysis (see “Reporting systematic reviews and meta-analysis” section). The use of the Q-statistic and the I2 statistic is widely spread among meta-analysis. The computation of the Q-statistic is based on a modified version of the chi-square test. It tests the null hypothesis that the true treatment effects are the same in all included studies (H0: T1 = T2 = … = Tk, where Ti are the treatment effects of the corresponding i = 1 to k studies in the meta-analysis), versus the alternative hypothesis that at least one of the effect sizes differs from the others. The Q-statistic is computed as follows: k
Q = ∑wi ( Ti − T ) , 2
i =1
where k is the number of studies being pooled, Ti is the effect size estimate for the ith study, T=
∑wT ∑w
i i
i
i
i
is the weighted estimator of the effect size and wi is the weight attached to that study (usually, the reciprocal of the variance (vi) of the outcome estimate from the ith study) in the meta-analysis. The Q-statistic is approximately distributed as a chi-squared distribution on k−1 degrees of freedom under H0, hence statistical tables can be used to obtain the corresponding p-value. A significant Q-statistic is indicative of dissimilar effect sizes across studies; a threshold significance level of 0.1 has been suggested, rather than the conventional level of 0.05 (Higgins and Green, 2011; Sutton, 2000). Additionally, the I2 statistic has been developed to quantify the inconsistency across studies, expressed as a percentage of the variability in effect sizes due to heterogeneity across studies, and not due to sampling error within studies. An I2 from 30% to 60%, 50% to 90%, and 75% to 100% represent moderate, substantial, and considerable heterogeneity, respectively (Higgins and Green, 2011).
Systematic literature review and meta-analysis Chapter | 116 827
If the variation is trivial, then we would focus on reporting the mean and its confidence interval. If the variation is nontrivial, then we might want to address the substantive implications of the variation, but the mean might still be useful as a summary measure. By contrast, if the variation is substantial, then we might want to shift our focus away from the mean and toward the dispersion itself. For example, if a treatment reduces the risk of mortality in some studies while increasing the risk of mortality in others (and the difference does not appear to be due to estimation error), then the focus of the analysis should not be on the mean effect. Rather, it should be on the fact that the treatment effect differs from study to study. Hopefully, it would be possible to identify reasons (differences in the study populations or methods) that might explain the dispersion.
of reporting. The PRISMA statement is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses (Moher et al., 2009). PRISMA focuses on the reporting of reviews evaluating randomized trials, but can also be used as a basis for reporting systematic reviews of other types of research, particularly evaluations of interventions. It consists of a 27-item checklist and a four-phase process (i.e. identification, screening, eligibility, and inclusion of studies) (Fig. 2). The aim of the PRISMA Statement is to help authors improve the reporting of systematic reviews and meta-analyses.
An introduction to fixed-effect and random-effects model for meta-analysis
Subgroup analysis is a method that can be used to explore and potentially explain heterogeneity in an outcome measure for a group of studies. Subgroup analyses are the investigation of subsets of studies defined by study or patient characteristics (e.g., treatments applied, control groups, patient eligibility criteria, quality control, and study conduct). A detailed description of the methods underlying subgroup analyses can be found in Borenstein et al. (2009a).
There are two statistical models widely used in meta-analysis: the fixed-effect model and the random-effects model. A detailed exposition of these models is beyond the scope of this chapter. In this section, we aim to familiarize the reader with the basic principles underlying each of model, as well as with basic guidance to choose the adequate model for their specific situation. For a basic yet rigorous introduction to the topic the reader is referred to the paper by Borenstein et al. (2010). Under the fixed-effect model, the assumption is that there is one true effect size that underlies all the studies included in the meta-analysis and that effect size estimates differ across studies due to sampling error. Under the fixedeffect model, we assume that the true effect size for all studies is identical, and the only reason that the effect size varies between studies is the within-studies estimation error. It makes sense to use the fixed-effect model if two conditions are met. First, there is confidence that all the studies are virtually identical. Second, our goal is to compute the common effect size, which would not be generalized beyond the population included in the analysis. By contrast, under the random-effects model, true effect sizes are allowed to differ due to differences across studies in the mixes of participants and the implementation of the interventions. Therefore, under the random-effects model, the goal is to estimate the mean of a distribution of effects, not to determine one true effect. Since each study provides information about different effect size, the relative weights assigned to each study are more balanced under the randomeffects model than they are under the fixed-effect model.
Reporting systematic reviews and meta-analysis As with all research, the value of a systematic review depends on what was done, what was found, and the clarity
Additional topics in meta-analysis Subgroup analyses
Meta-regression In primary studies, linear and multiple regressions are used to assess the relationship between one or more independent variables and a dependent variable at the subject level. Essentially the same approach can be used with meta- analysis, except that the covariates are at the level of the study and the dependent variable is the effect size in the studies. Meta-regression is the term used to refer to these procedures when they are used in a meta-analysis. A detailed description of meta-regression can be found in Borenstein et al. (2009b).
References Borenstein, M., Hedges, L.V., Higgins, J.P.T., Rothstein, H.R., 2009a. Subgroup analyses. In: Introduction to Meta-Analysis. John Wiley & Sons, Chichester, UK. Borenstein, M., Hedges, L.V., Higgins, J.P.T., Rothstein, H.R., 2009b. Meta-regression. In: Introduction to Meta-Analysis. John Wiley & Sons, Chichester, UK. Borenstein, M., Hedges, L.V., Higgins, J.P.T., Rothstein, H.R., 2010. A basic introduction to fixed-effect and random-effects models for metaanalysis. Res. Synth. Methods 1 (2), 97–111. Downs, S.H., Black, N., 1998. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J. Epidemiol. Community Health 52 (6), 377–384. Haidich, A.B., 2010. Meta-analysis in medical research. Hippokratia 14, 9. Higgins, J., Green, S. (Eds.), 2011. Cochrane Handbook for Systematic Reviews of Interventions. The Cochrane Collaboration.
828 SECTION | 12 Health technology assessment
Higgins, J.P.T., et al., 2011. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ 343 (2), d5928. International Network of Agencies for Health Technology Assessment and Health Technology Assessment International, n.d. ‘HTA Glossary’. [Online]. Available: http://htaglossary.net. (accessed 25/10/2019). Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., 2009. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6 (7), 6. Montesinos, L., Castaldo, R., Pecchia, L., 2018. Wearable inertial sensors for fall risk assessment and prediction in older Adults: a systematic
review and meta-analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 26 (3), 573–582. Sutton, A.J. (Ed.), 2000. Methods for Meta-Analysis in Medical Research. Wiley, Chichester.
Further reading Cleophas, T.J., Zwinderman, A.H., 2017. Network meta-analysis. In: Modern Meta-Analysis. Springer International Publishing, Cham, pp. 145–155.