Evaluating the Quality of Evidence From the Aspire and Endeavor Clinical Trials Using the Grading of Recommendations, Assessment, Development and Evaluation (Grade) Approach

Evaluating the Quality of Evidence From the Aspire and Endeavor Clinical Trials Using the Grading of Recommendations, Assessment, Development and Evaluation (Grade) Approach

A358 VA L U E I N H E A LT H 1 9 ( 2 0 1 6 ) A 3 4 7 – A 7 6 6 PRM5 Man Versus Machine - Is Accurate Automation of Data Abstraction Achievable? Ki...

69KB Sizes 13 Downloads 38 Views

A358

VA L U E I N H E A LT H 1 9 ( 2 0 1 6 ) A 3 4 7 – A 7 6 6

PRM5 Man Versus Machine - Is Accurate Automation of Data Abstraction Achievable? Kiff C1, Scott DA1, Thompson JC1, Quigley JM2, Eaton J1 Health Economics and Epidemiology, Abingdon, UK, 2ICON Health Economics & Epidemiology, Abingdon, UK

1ICON

Objectives: Systematic reviews are a core, albeit labour intensive, component of evidence based medicine. Recently a number of, mainly commercial, automation tools have been developed to aid the processes of systematic review focusing on study selection, article review and data extraction. We sought to develop an open-source, freely available program using the statistical package R to undertake data extraction. To test our program we selected the Cochrane risk of bias tool for RCTs, an established short questionnaire, answered with yes/no/unclear responses. Our aim was to accurately automate responses using a bespoke R program.  Methods: The Cochrane risk of bias tool comprises 5 components: (1) randomisation, (2) allocation concealment, (3) blinding, (4) incomplete data and (5) selective reporting. We developed an algorithm programmed in R using the ‘tm’ package to data mine publications, responding yes/no/unclear to each of the 5 components; results were generated in both a tabular format and in a risk of bias summary figure. Results were compared to abstraction conducted independently by an experienced systematic reviewer. A selection of multiple myeloma RCTs provided our case study.  Results: Abstraction conducted by the reviewer was used to check the accuracy of response to each risk of bias questions. Accuracy of abstraction generated by our R program varied considerably between questions. For the first 3 questions, accuracy was high where there are expected stock phrases as responses. Lower accuracy was seen for questions 4 and 5; language around these responses is more nuanced requiring higher level interpretation across multiple criteria.  Conclusions: Accurate automation of data extraction is achievable for ‘box ticking’ aspects of systematic review, but still requires human validation to ensure data is interpreted correctly. A potential use at this early stage is validation of human data extraction. Future work will widen the scope of our program to other aspects of systematic review. PRM6 Evaluating the Quality of Evidence From the Aspire and Endeavor Clinical Trials Using the Grading of Recommendations, Assessment, Development and Evaluation (Grade) Approach Biedermann P1, Esposito B2, Glen C3, Pippo L2, Avallone G2, Gonzalez-McQuire S1, Gatta F1 (Europe) GmbH, Zug, Switzerland, 2Amgen GmbH, Milan, Italy, 3Oxford PharmaGenesis Ltd, Tubney, UK

1Amgen

Objectives: To assess the quality and credibility of efficacy and safety data from clinical trials conducted in patients with relapsed and/or refractory Multiple Myeloma (RRMM).  Methods: A systematic literature review conducted 08–11/2015 to identify relevant studies based on the following inclusion criteria: adult (> 18 years) patients with RRMM; interventions containing carfilzomib with lenalidomide and dexamethasone (KRd), or carfilzomib with dexamethasone alone (Kd), and comparator regimens; randomised-controlled trials. 44 studies were identified and the phase 3 studies ASPIRE (KRd) and ENDEAVOR (Kd) were selected as meeting all criteria. The GRADE approach, a systematic and explicit method to judge quality, was used to effectively link evidence and quality of the following outcomes: progression-free survival, overall survival (OS), complete response, overall response rate, adverse events (AEs) ≥ grade 3, serious AEs, cardiac failure ≥ grade 3, ischaemic heart disease ≥ grade 3, peripheral neuropathy and treatment-related death. Evidence quality involved consideration of: within-study risk of bias, directness, heterogeneity, precision of effect estimates and publication bias.  Results: Both trials were categorised as providing high-quality evidence for the study design and patient outcomes, other than OS. OS data were graded as moderate- and low-quality, with serious or very serious data imprecision in the ASPIRE and ENDEAVOR trials, respectively.  Conclusions: Using GRADE, both ASPIRE and ENDEAVOR provided high-quality efficacy and safety evidence. Imprecision in OS analysis was due to immature data for OS presented at the time of primary analysis (60% of required events for ASPIRE and 33% of events for ENDEAVOR); cross-over was not allowed in either trial and follow up for OS is ongoing. By applying this objective analysis of carfilzomib trial attributes and outcomes, the trials showed high credibility of evidence, and should encourage healthcare decision makers to be confident in the meaningful benefits of the KRd or Kd regimens in patients with RRMM. PRM7 Psychometric Analyses of Clinical Outcome Assessment (Coa) Questionnaires Designed to Evaluate Satisfaction with Pharmacological Stress Agents (Psas) Used in Echocardiography Spalding J1, Hudgens S2, Litcher-Kelly L3, Shields A3, Kitt T1, Ojo O3, Kristy R1, Lee E1, Mathias A3, Morrissey L3, Houle C1 1Astellas Pharma Global Development, Northbrook, IL, USA, 2Clinical Outcomes Solutions, Tucson, AZ, USA, 3Adelphi Values, Boston, MA, USA

Objectives: This study examined the psychometric properties of scores from three COAs assessing satisfaction with dobutamine for echocardiography. Previous work documented psychometric properties of these COAs to assess satisfaction with PSAs used with single-photon emission computed tomography myocardial perfusion imaging.  Methods: This evaluation used data from 95 patients and 20 clinicians. Patients completed two questionnaires within 24-hours after the procedure: Patient Satisfaction and Preference Questionnaire (PSPQ) assessing PSA satisfaction, and the Patient Satisfaction Questionnaire (PSQ) assessing satisfaction with the site (for concurrent validity). Physicians completed the Clinician Satisfaction and Preference Questionnaire (CSPQ), and nurses/technicians completed the modified CSPQ (mCSPQ), within 2 weeks of site initiation.  Results: Though responses sometimes clustered at either the upper or lower end of item scales, results suggest that patients and clinicians utilized the range of responses available. For PSPQ, floor effects for the “Preparation” and “Reaction to Agent” domains were > 40%.

Ceiling effects for “Administration” “Effects” and “Overall Satisfaction” domains were also > 40%, which implied high levels of satisfaction in these areas. For CSPQ, ceiling effects for “Image Quality” was 33.3%. Internal consistency was strong in most instances [PSPQ: “Preparation” (α = .93), “Reaction to Agent” (α = .40); CSPQ and mCSPQ: “Preparation” (α = .99, .99) and “Administration” (α = .93, .92). Inter-item correlations for PSPQ ranged from -0.01 to 0.96, which are small to strong; CPSQ ranged from 0.15 to 0.96, which are small to strong; mCPSQ ranged from 0.81 to 0.99, which are strong.  Conclusions: Results suggest that when administered in real-world clinical settings, the PSPQ, CPSQ, and mCPSQ are capable of generating scores across the distribution of available responses (though clustering at upper and lower bound was observed in some instances), that are sufficiently reliable for most research purposes. PRM8 Single-Arm Trials in Systematic Reviews: Are They Adequately Captured by Existing Search Filters? O’Rourke JM1, Ward KE2, Thompson JC2 1ICON Health Economics & Epidemiology, Abingdon, UK, 2ICON Health Economics and Epidemiology, Abingdon, UK

Objectives: To enable earlier access, licensing of new treatments is sometimes based on evidence from Phase II single-arm trials. For systematic reviews, there are many established filters for both randomized controlled trials (RCTs) and for observational studies of clinical evidence. This study aimed to determine if singlearm trials are being adequately captured by these filters given their increasing importance in re-imbursement processes.  Methods: A targeted search of publically available filters for clinical evidence was performed with a focus on MEDLINE. Searches were subsequently carried out in Epub Ahead of Print, In-Process & Other Non-Indexed Citations, Ovid MEDLINE(R) Daily and Ovid MEDLINE(R). Selected RCT and observational study filters were combined in turn and “single arm.ti,ab” studies not retrieved by these filter combinations were identified. From these trials, a list of commonly used free-text terms used in single-arm studies was compiled. These terms were evaluated to determine if they improved search sensitivity for such trials.  Results: Targeted searches identified filters from several organisations including Cochrane, CADTH, BMJ, University of Texas School of Public Health and SIGN. All filters incorporated both MeSH terms and free-text terms. The combination of selected published RCT filters and observational study filters consistently failed to identify many single-arm trials; different filter combinations consistently missed the same studies. By reviewing a selection of abstracts of studies not captured by the published filters, a list of free-text terms commonly used to describe this trial design was compiled, including: single-arm, Phase II/Phase 2, prospective. Addition of a free-text line incorporating these terms to a search strategy improves search sensitivity for this trial design.  Conclusions: Current filters do not adequately capture single-arm trials. We identified additional freetext terms which could be used in conjunction with established filters to more comprehensively capture these trials. PRM9 Guidelines for Translation and Cross-Cultural Adaptation of NonTechnical Skills Rating Instruments Repo JP1, Rosqvist E2 of Helsinki and Helsinki University Hospital, Helsinki, Finland, 2Central Finland Health Care District, Jyväskylä, Finland, Jyvaskyla, Finland

1University

Objectives: Non-technical skills (NTS) refer to social, cognitive, interpersonal and emotional skills, and are seen as important contributor in reducing adverse events and improving medical management in healthcare teams. The objectives of the present study was to review the existing literature on translation guidelines and to establish translation and cross-cultural adaptation guidelines for NTS rating scales.  Methods: The authors conducted a systematic online database search of Embase, PubMed and the Cochrane Database of Systematic Reviews. Firstly, the authors used the keywords “non-technical skills” alone and in combination with “translation”, “evaluation methods”, “cross-cultural adaptation”, “guidelines”, “translation guidelines”, and “assessment” with no time limitations. Secondly, the authors used the keywords “translation, cross-cultural adaptation” to find guidelines on the translation and cross-cultural adaptation for rating scales published in the preceding 10 years. Finally, the authors then synthesized the results into translation and cross-cultural adaptation guidelines for NTS rating scales.  Results: Together the searches found 12 distinct guidelines. The authors combined the most widely used guidelines. The proposed guidelines for the translation and cross-cultural adaptation of the NTS rating scales consist of two forward-translations by two independent translators who are native-speakers of the language of the target population, a synthesis of these two versions, and a back-translation by at least one translator fluent in the language of the original instrument and familiar with the culture of the target language. Thereafter, a back-translation panel reviews the back-translation, and a multidisciplinary committee reviews the overall process. This latter group then harmonizes the produced version with other cross-culturally adapted versions of the instrument. The pre-final version then undergoes pilot-testing, including cognitive debriefing, with at least five evaluators. Finally, the committee reviews the outcomes and has the final version proofread.  Conclusions: The present study produced structured guidelines for the translation and cross-cultural adaptation of the NTS rating scales, thereby filling a methodological gap. PRM10 Expected Impact of the European Society for Medical Oncology Magnitude of Clinical Benefit Scale (Esmo-Mcsb) on Health Technology Assessment Rémuzat C1, Chouaid C2, Auquier P3, Borget I4, Kornfeld A1, Toumi M5 1Creativ-Ceutical, Paris, France, 2Centre Hospitalier Intercommunal, DHU-ATVB, Département de Pneumologie et Pathologie Professionnelle, Créteil, France, 3Aix-Marseille Université, Université