Journal of Clinical Epidemiology 68 (2015) 725e726
EDITORIAL
The new paradigm: from ‘bench to bedside’ to the translational research ‘valleys of death’ The old paradigms of B2: ‘Bench to Bedside’ and C3: ‘Cell to Clinic to Community’ have been replaced by the ‘Valleys of Death’ model developed with a varying number of valleys! The US National Institutes of Health NIH [1] proposed one ‘valley’ [basic science vs clinical science]; the Canadian Institutes of Health Research [2] proposed two ‘valleys’ [Between Basic Biomedical Research vs Clinical Science vs Clinical Practice and Health Decision Making]; Meslin et al. [3] proposed four ‘valleys’ [between Discovery vs Candidate Health vs Evidence Based Guidelines vs Health Practice vs Population Health Impact]. In this issue of JCE Yoon, et al. use the Meslin model to analyse the proportion of Systematic Reviews published in 2012-2013 in the Cochrane Library and the DARE [Database of Abstracts of Reviews of Effects] database, in each of these four Valleys categorising crossing these four valleys by translational studies T1-4: T1 includes reviews of basic science experiments; T2 includes reviews of human trials leading to guideline development; T3 includes reviews examining how to move guidelines into policy and practice; and T4 includes reviews describing the impact of changing health practices on population outcomes. Of a selection of 1000 reviews from 2012-2013 over 90% focused on T2 reviews of human trials. They comment that the absence of T1 -the basic science, is not surprising,but they call for more emphasis on T3 and T4 research in ensuring that evidencebased interventions reach and benefit patients and populations for which they are intended. This is perhaps not surprising given that only 2 of the 52 Cochrane Review Groups [Effective Practice and Organisation of Care, and the Public Health Review Groups], are primarily focused on T 3 and T4. The Cochrane 2020 [4] strategy indeed identifies impact on policy and practice as a priority and is developing metrics for quarterly feedback which may increase the proportion of reviews in these T3 nd T4 Phases. Other groups such as AHRQ, the Campbell Collaboration, EPPI [Evidence for Policy and Practice Information and Co-ordinating Center] are also moving in this direction. Guidelines and their relationship to the strength of the underlying evidence is the focus of the paper by Djulbegovic, et al. who studied this when addressing 10 questions in the clinical practice guideline development process of the American Association of Blood Banking (AABB) [6] related to use of prophylactic vs. therapeutic http://dx.doi.org/10.1016/j.jclinepi.2015.05.010 0895-4356/Ó 2015 Elsevier Inc. All rights reserved.
platelet transfusion in patients with thrombocytopenia.In this guideline development process, it is reassuring that quality of evidence (confidence in intervention’s effect) proved the key determinant of the strength of recommendations. There are two articles in this issue on Publication Bias: Thaler, et al. conducted a nice review and lay out a useful systematic ‘Chain of Evidence’ to demonstrate the process from conception of a clinical trial to dissemination of the results showing where eight interventions should impact to reduce publication bias. Trial registration helps but is not enough even if rigorously adopted so evidence is needed on combining this with the other strategies such as ensuring electronic publication of negative results and other approaches identified by these authors all the way along this chain of evidence. A review by Sherer, et al. looks at another major contributing factor in publication bias, the failure of conference abstracts to be published as full papers. They found 27 different studies across many specialtiesethese showed that only half of the abstracts resulted in a full paper. Time and the priority for writing the full paper were the dominant reasons. The investigators suggest that authors and funding agencies should negotiate some protected time within the grant funding period to allow full publication of the abstracts; alternatively, abstract authors should deposit study results to allow access to study data for systematic reviewers. Head to head trials are needed for optimal assessment of comparative effectiveness for both practice and policy. That bias may well be present in industry sponsored head-to-head trials is posed by the study of Flacco, et al. who assessed a random sample of 319 headeto-head trials of drugs biologics listed in Pubmed in 2011. Approximately half were funded by pharmaceutical companies; 73% [233] of these trials were designed as superiority, 23% [73], as non-inferiority 73 (22.9) and 4% [12] as equivalence. Fifty-five of the 57 (96.5%) industry-funded non-inferiority/ equivalence trials got desirable ‘‘favorable’’ results. To allay this concern about bias the authors call for the conduct of more large trials of comparative effectiveness and safety under the control of non-profit entities. JCE receives quite a number of manuscripts on clinical prediction rules but it is still under-appreciated that it is critical to test this out in at least one different population.
726
Editorial / Journal of Clinical Epidemiology 68 (2015) 725e726
Haskins, et al. provide evidence for this in their report on a systematic review of prognostic clinical prediction rules for the non surgical management of back pain; they found 30 clinical prediction rules but only three had been adequately validated based on the following criteria a) The clinical prediction rules under development was initiated by a formal derivation process in which a larger pool of candidate predictor variables was refined to a smaller set of variables based on their identified independent predictive value using formal multivariate statistical procedures b) A tool is clearly presented in sufficient detail that may be applied by a clinician to predict a prognostic outcome or likelihood of treatment response in an individual patient. Surrogate outcomes continue to be widely used as the primary outcome in trials. Ciani, et al. in a systematic review of surrogate outcomes in 101 trials in colorectal cancer [progression-free survival (PFS), time to progression (TTP), and tumor response rate (TR)] show how important it is to ensure that they accurately predict the patienteimportant outcomes [in this case survival]. The surrogates tended to magnify the effects and using the criteria level of evidence proposed by Lassere [5] none of these surrogates performed adequately. Aligning the method used to estimate sample size with the planned analytic method helps to ensure the sample size needed to achieve the planned power. Borkhoff, et al. report on their experience with different methods for estimating sample size when they were designing a study to assess how patient gender affected physicians’ treatment recommendations regarding total knee arthroplasty. They challenge the sample estimation methods recommended by many and recommend the asymptotic unconditional McNemar test when using generalized estimating equations (GEE) to analyze a paired binary primary outcome with no covariates. In meta-analysis in systematic reviews of the effectiveness of therapy the objective is to provide the one best estimate of the magnitude of benefit or harm. In almost all specialties both continuous and categorical scales are used to measure patient-important symptoms and physical, emotional and social function. Meister, et al. review the different methods used in 26 trials of psychotherapeutic, pharmacological, and combined treatments for chronic depression. The results are reassuring in that the odds ratios of treatment response were well approximated from continuous rating scale scores for meta-analysis. Subtil, et al. propose a major modification to the use of Receiver Operating Curves [ROC] when determining the optimal threshold for dichotomization of a continuous diagnostic test. ROCs are widely used for tests using continuous scales for screening, diagnosis and prognosis. The authors challenge basing the dichotomisation on the farthest point from the diagonal of the graph; they argue this fails to adequately account for costs of misclassifications or the disease prevalence. They propose adding a ‘sensitivity line’ and a ‘specificity line to the graph; the
ROC curve point the farthest from the specificity line is shown to be the optimal threshold in terms of expected utility. They show the utility in 2 examples. (1) the threshold for the best ratio of specific immunoglobulin G (IgG)to total IgG for the diagnosis of congenital toxoplasmosis and (2) markers for the diagnosis of left ventricular hypertrophy in hypertensive subjects. This needs confirming with other examples. Mnatzaganian, et al. use a cohort study of elective total joint replacement to provide an instructive example to remind us that although Propensity Score (PS) matching can balance baseline characteristics, it can only account for known covariates and unknown covariates cannot be accounted for. In this example, patient selection for elective procedures introduce substantial bias in prognostic features not accounted for by PS matching. Increasing questionnaire responses by various incentives is appealing but the increase in effort and costs needs to be rigorously evaluated. Drummond, et al. conducted a randomised study comparing a lottery ticket to a prize draw in a postal study of cancer survivors. The lottery ticket was more successful but by only less than 3% and indeed only half of the overall sample responded. Since studies in other conditions have not found a benefit, this suggests this study needs replication, before being adopted. Koletsi, et al. report that in nearly a quarter of studies published in leading dental journals, the authors use inappropriate statistics such as within group changes instead of betweengroup comparisons. They call for more rigorous assessment of statistical analyses in comparative research in dentistry. Lastly Stekelenburg, et al. demonstrate the clinimetic evaluation of a new technology, three-dimensional (3D) stereophotogrammetry for estimating the 3-dimensional volume of scars for use in assessing different approaches to minimising scarring after burns. They demonstrated its validity and reliability for research for group estimates, although it was not reliable enough for assessing progress in individual patients should be considered as a reliable and valid measurement technique. Peter Tugwell J. Andre Knottnerus Editors E-mail address:
[email protected] (P. Tugwell) References [1] Butler D. Translational research: crossing the valley of death Nature 2008;453:840e2. [2] Mapping the translational science policy ‘valley of death’. Clin Transl Med 2013;2:14. http://www.cihr-irsc.gc.ca/e/41204.html. [3] Meslin EM. Genet Med 2007;9:665e74. [4] Available at http://community.cochrane.org/sites/default/files/uploads/ Cochrane_Strategy_2020_2015_Targets.pdf. [5] Lassere MN. The Biomarker-Surrogacy Evaluation Schema: a review of the biomarker-surrogate literature and a proposal for a criterionbased, quantitative, multidimensional hierarchical levels of evidence schema for evaluating the status of biomarkers as surrogate endpoints. Stat Methods Med Res 2008;17(3):303e40.