Development of a Quality Checklist Using Delphi Methods for Prescriptive Clinical Prediction Rules: The QUADCPR

Development of a Quality Checklist Using Delphi Methods for Prescriptive Clinical Prediction Rules: The QUADCPR

DEVELOPMENT OF A QUALITY CHECKLIST USING DELPHI METHODS FOR PRESCRIPTIVE CLINICAL PREDICTION RULES: THE QUADCPR Chad Cook, PT, PhD, MBA,a Jean-Michel ...

2MB Sizes 3 Downloads 53 Views

DEVELOPMENT OF A QUALITY CHECKLIST USING DELPHI METHODS FOR PRESCRIPTIVE CLINICAL PREDICTION RULES: THE QUADCPR Chad Cook, PT, PhD, MBA,a Jean-Michel Brismée, PT, ScD, b Ricardo Pietrobon, MD, PhD, MBA, c,d Philip Sizer Jr., PT, PhD,e Eric Hegedus, PT, DPT, MHSc,a and Daniel L. Riddle, PT, PhDf

ABSTRACT Objective: Clinical prediction rules (CPRs) are clinician decision-making aids designed to improve the accuracy of a variety of decisions made during patient care. To our knowledge, there are no formally developed consensus-based guidelines designed to provide standards for the creation of CPRs. Methods: The study used a 3-round Delphi method for consensus of a quality checklist initially developed based on recommendations derived from the literature. The 9 Delphi participants were randomly selected from the authors of peer-reviewed publications of prescriptive CPRs. Results: During the 3 rounds, the Delphi participants modified the originally derived checklist and, based on a consensus standard, agreed upon a final 23-item checklist, which involved 4 constructs: (1) sample and participants, (2) outcome measures, (3) quality of tests and measures, and (4) statistical assumptions. Conclusions: Use of the checklist has potential for improving the design and reporting of future prescriptive CPRs. (J Manipulative Physiol Ther 2010;33:29-41) Key Indexing Terms: Practice Guidelines as Topic; Delphi Technique; Mandatory Reporting

C a

linical prediction rules (CPRs), also known as clinical decision rules, are clinician decision-making aids designed to improve the accuracy of a variety of

Associate Professor, Department of Community and Family Medicine, Duke University Health System, Durham, NC. b Associate Professor, Doctor of Science Program in Physical Therapy, Texas Tech University Health Sciences Center Lubbock, TX. c Associate Professor and Associate Vice-Chair of Surgery, Duke University Health System, Durham, NC. d Associate Professor, Duke/NUS Singapore. e Professor and Program Director, Doctor of Science Program in Physical Therapy Director, Clinical Musculoskeletal Research Laboratory, Texas Tech University Health Sciences Center Lubbock, TX. f Professor and Assistant Department Chair, Department of Physical Therapy, Virginia Commonwealth University, Richmond, VA. Submit requests for reprints to: Chad Cook, PT, PhD, MBA, Associate Professor, Department of Community and Family Medicine, Department of Surgery, Duke University Health System, 2200 W Main St, Suite A-210, DUMC Box 104002, Durham, NC 27708 (e-mail: [email protected]). Paper submitted July 15, 2009; in revised form July 28, 2009; accepted July 28, 2009. 0161-4754/$36.00 Copyright © 2010 by National University of Health Sciences. doi:10.1016/j.jmpt.2009.11.010

decisions made during patient care. A CPR typically comprises a combination of medical history and clinical examination findings that has demonstrated statistically meaningful prediction for diagnosis, prognosis, or selection of the most effective treatment.1 The number of derived or validated CPRs is increasing,2 especially in rehabilitative medicine where intervention-based rules have been developed for low back pain,3-6 neck pain or cervicogenic/tensiontype headaches,7-9 knee dysfunction,10-13 and elbow pain.14 Others have used intervention based CPRs in nonrehabilitative medicine as well.15-19 Clinical prediction rules may best be classified into diagnostic, prognostic, or prescriptive groups. Studies that focus on clinical variables related to a specific diagnosis are known as diagnostic CPRs. Studies designed to develop a CPR to predict an outcome such as success or failure are considered prognostic. Clinical prediction rules designed to target the most effective intervention for patients with selective clinical characteristics are identified as prescriptive.1 Differences between diagnostic, prognostic, and prescriptive studies are subtle but notable. When designing a diagnostic study, the key elements for consideration include appropriate blinding for tests and measures and a true mechanism for appropriate diagnosis. Because prognostic studies often examine the success of an approach (eg, surgery 29

30

Cook et al Checklist for Reporting CPRs

vs no surgery), the study focus should dwell on explicit clinical findings and an appropriate dichotomizing outcome for success or failure. Prescriptive studies are designed to identify which clinical findings are associated with the outcome of a treatment approach; thus, the focus of the methodology should concern both explicit clinical findings and the unbiased administration of a treatment process. Clinical prediction rules are generally developed using a 3-step method.20 Firstly, CPRs are derived prospectively using multivariable statistical methods to examine the predictive ability of selected groupings of clinical variables.21 This derivation phase is intended to be the first step toward assisting clinicians in making rapid clinical decisions that may normally be subject to underlying biases.22 The rules are algorithmic in nature and involve condensed information identifying the smallest number of indicators that are statistically diagnostic or are associated with the targeted condition of outcome.2 Findings at the derivation level are not appropriate for clinical use until validated. The second step involves validation of the CPR in a randomized controlled fashion to reduce the risk that the predictive factors developed during the derivation phase were not selected by statistical chance.20 The third step involves conducting an impact analysis to determine the extent to which the CPR improves care, reduces costs, and accurately defines the targeted objective.20 The focus of the current study was on the derivation phase of prescriptive CPRs, those designed to guide treatment decisions. We chose to study the methodological approach to derivation studies of prescriptive CPRs because this type of study has become very common, particularly in the rehabilitation literature3-14 and to a lesser extent in the nonrehabilitative literature.15-19 More importantly, we found no consensus-based guidelines for designing these types of studies or for judging their quality. Finally, derivation studies are much more common than validation or impact studies of prescriptive CPRs,23 which suggests a more urgent need for study in this area. Guidelines or checklists have been used to improve reporting standards and quality of data collection in a number of study designs24,25 and are generally used to reduce design and reporting biases.25,26 Guidelines have been developed to improve the reporting of randomized controlled trials (Consolidated Standards of Reporting Trials) 27 or the reporting of diagnostic accuracy studies (Standards for Reporting of Diagnostic Accuracy [STARD])28 and to ascertain the quality of diagnostic accuracy studies included in a meta-analysis (Quality Assessment of Diagnostic Accuracy Studies [QUADAS]),29 systematic literature reviews and meta-analyses (Quality of Reporting of Metaanalyses),30 observational studies (Meta-analysis of Observational Studies in Epidemiology),31 observational studies in epidemiology (Strengthening the Reporting of Observational Studies in Epidemiology),32 and online survey studies (Checklist for Reporting Results of Internet ESurveys).33 Each guideline was developed to improve

Journal of Manipulative and Physiological Therapeutics January 2010

reporting of data to improve transparency and the publication of data used in clinical practice.25 Recently, 2 systematic literature reviews have been published that have retrospectively evaluated derivation and derivation-validation studies of prescriptive CPRs.34,35 Both identified a notable lack of methodological consensus for creation of CPRs, and both identified significant methodological weaknesses in a number of the prescriptive-based studies. Although both articles used a novel quality checklist to score the articles within the study, to our knowledge, there are no consensus-based guidelines or reporting checklists available for the creation of CPRs for diagnosis, treatment prescription, or prognosis at any phase in the development process (derivation, validation, or clinical impact). Subsequently, the purpose of this study was to develop a consensus-based checklist for prospective use, during the creation of prescriptive, derivation-based CPRs. We selected derivation-based CPRs as an attempt to provide the most significant impact in reporting processes with an assumption that improvements in these forms of CPRs will lead to subsequent improvements in validation studies and impact analyses.

METHODS Development of the CPR Checklist We developed the CPR checklist following the first 3 procedures outlined by Streiner and Norman36 for development of quality assessment tools. Their process outlines 5 critical steps, as follows: (1) preliminary conceptual decisions; (2) item generation; (3) assessment of face validity; (4) field trials; and 5) generation of a refined instrument, allowing for the generation, refinement, and potential adaptation of a tool, created during a continuous improvement process. For our study, our work team took responsibility for the generation of an initial checklist, whereas a Delphi group of experts was responsible for adaptation and final approval of the checklist.

Preliminary Conceptual Decisions We determined the following 5 requirements for the checklist for prescriptive CPRs. The checklist had to (1) be multifaceted; (2) be easy to use; (3) allow for 3 possible selections for each item in the checklist: “present,” “not present,” or “unclear”; (4) have internally consistent concepts among clinicians and researchers; and (5) be able to be completed in an expedient fashion. Initially, we outlined 8 major sections represented by the checklist that we considered critical to a well-designed prescriptive CPR, as follows: (1) sample and participants,22,37,38 (2) outcome measures,22,28,29,38-42 (3) quality of tests and measures,22,28,29,37,38 (4) statistical assumptions,20,22,38,43 (5) responsiveness and validity,1,20,37,38,44-46 (6) potential impact,38,44,47 (7) sensibility of use,38,44,47 and

Journal of Manipulative and Physiological Therapeutics Volume 33, Number 1

(8) validation.1,20,38,47 We concluded that the section for “sample and participants” should reflect a generalizable population that should receive this intervention. The “outcome measures” section should accurately measure true change in patient outcome; the selected tool should exhibit appropriate validation for the population assessed. Similar to the QUADAS29 and STARD28 tools, the “quality of tests and measures” should reflect appropriate safeguards against bias within the study design. The “statistical assumptions” section was selected because of the multivariable nature of the data required to identify optimal predictor variables. Additional measurement requirements such as “responsiveness and validity” would ensure that the measures used in the CPR have acceptable psychometric properties. “Potential impact” required that the “idea” for the study was one that addressed an appropriate costly or debilitating problem within health care. The “sensibility of use” section required that the CPR be easy to use in a clinical setting and did not require a sophisticated measure to identify appropriate candidates. Lastly, “validation” reflected the maturation of a derivation tool using a prospective study involving patients outside the original group. An underlying assumption of our checklist was that it should appropriately discriminate aspects of a well- and poorly designed prescriptive CPR study. Although we did not assume that the tools used would involve overall quality scoring for the instrument, we concurred that a general assessment of each eventual line item would be discriminative and useful in the prospective stage of designing a study.

Item Generation One author (CC) produced an initial list of items for each section based on previous checklists (STARD28 and QUADAS29) and suggestions derived from previous studies reflecting requirements for CPRs.1,20,22,37-47 In essence, the decision to create a preexisting checklist for the Delphi team to rework was based on efficiency and the desire to improve construct validity. At the time of the study, there were no preexisting prospective or retrospective tools in the literature to evaluate any form of CPR, specifically a derivation prescriptive study. It was the decision of the authorship team that the unique aim of this study to capture derivation prescriptive item criteria demanded a pretool from which to work. Language for each item was generated to reflect a finding of “present,” “not present,” or “unclear.” Items were generated to match the 8 major sections of (1) sample and participants; (2) outcome measure; (3) quality of tests and measures; (4) statistical assumptions; (5) responsiveness and validity; (6) potential impact; (7) sensibility of use; and (8) validation. Upon completion, the document was forwarded to the other authors of this manuscript (JB, RP, PS, EH, DLR) who revised the proposed document and assisted in

Cook et al Checklist for Reporting CPRs

revising the initial checklist. Using consensus among the investigators, selected items were discarded, others were condensed or combined with others, and finally those accepted were refined to reflect the nature of prescriptive studies. Only items that met consensus by all members of the team were included in the final version of the initial checklist (Table 1).

Assessment of Face Validity and Consensus Assessment Assessment of face validity and consensus assessment was performed using a 3-round Delphi survey. A Delphi survey is a series of sequential questionnaires designed to distill and obtain the most reliable consensus from a group of experts.48 This method is useful in situations where frequent clinical or practical judgments are encountered, yet incomplete published research exists to provide evidence-based decision making.48-50 Because identification of content experts is essential, we targeted first or last authors of published studies that examined or developed prescriptive CPRs. We reasoned that, if an individual published one or more prescriptive CPR articles in the peer-reviewed literature, this person could provide useful feedback in a consensus-based format. This strategy has been used successfully in other Delphi studies.51,52 Study selection was initiated with the aid of the computerbased search engines of PubMed, which included Medline (February 1966 to October 2007) and Cinahl (February 1982 to October 2007). The search included the terms clinical rule, treatment, intervention, outcome, and rehab. The most recent 75 authors (chronologically) were listed in alphabetical order and assigned a number. A random number generator then selected 25 potential participants. No optimal number for Delphi participants exists53; thus, we concluded that 25 respondents would be the maximum number for processing within our electronic medium. In addition, based on past experiences,54,55 we expected the response rate to provide a smaller but capable percentage. All 25 authors, none of which were authors of the manuscript, were contacted using the contact information provided on each publication (the e-mail address) and were invited to participate in the Delphi study. Following Dillman's 56 and Lopopolo's57 guidelines, 2 consecutive follow-up reminders were delivered at 10 and 20 days after the initial invitation to participate was provided. After completion of the invitation series, 9 participants (36%) responded and agreed to participate in the study. In 6 situations, a return message indicating an outdated or incorrect e-mail address was received through our e-mail system. The remaining 10 participants did not answer the e-mail requests. All 9 participants' names and contact information remained confidential to all study organizers throughout the study with the exception of the primary contact person (CC), who initiated each wave of the Delphi round.

31

32

Cook et al Checklist for Reporting CPRs

Table 1. Original quality checklist for prescriptive, derivation-based CPRs

Journal of Manipulative and Physiological Therapeutics January 2010

Journal of Manipulative and Physiological Therapeutics Volume 33, Number 1

Cook et al Checklist for Reporting CPRs

Table 1 (continued)

Delphi Procedure A typical Delphi survey approach consists of 3 rounds of questionnaires that respondents consecutively answer in a timely fashion.58 At the end of the third round of most Delphi studies, consensus decisions and conclusions are generated among respondents. A flowchart identifying our present 3-round sequence is presented in Figure 1. Expedited approval of the study was granted by the Institutional Review Board at Duke University in Durham, NC. Data were collected using DADOS-Survey, which allows online access to all survey questions for targeted respondents. DADOS-Survey 59 is a Web-based, open-source survey application developed at Duke University Medical Center. A description of the development, design, technical features, and usability testing of DADOS-Survey can be found in a previously published report.27 Compared with most Web survey software applications, one advantage of using DADOS-Survey is that the program completely satisfies the Checklist for Reporting Results of Internet ESurveys guidelines for reporting results from Web-based surveys.33 Another advantage is that data collected in DADOS allow easy transferal to data managing system while maintaining the confidentiality of the respondents. Round I provided Delphi participants the opportunity to evaluate and modify, in an open-ended fashion, the items developed by the investigators. The participants were provided a series of open-ended questions specific to the overall survey topic.60 More specifically, round I consisted of a definition of the purpose of the study and an operational definition of a prescriptive, derivation-based CPR, the presentation of the checklist, followed by the opportunity to remove items, modify the language of the work team's checklist, merge selected items, and suggest new items for each subsection. Respondents were given open text

opportunities to voice concerns, make suggestions, or revise current language. Data obtained from round I were summarized into a modified checklist by CC, JM, and EH. Round II allowed the Delphi participants to view the newly created modified checklist from round I, including the clarification and correction of terminology, and to vote (score) on each of the items of the checklist.57 Scoring choices included (1) very important, (2) important, (3) unimportant, and (4) very unimportant. Each Delphi participant was given the opportunity to comment and score all items (new and old). Upon completion, the scores were tallied and graphically analyzed and formatted in DADOS-Survey for round III by CC. Round III of the survey comprised the same list and grading scale as round II, with additional tables, graphs, and Delphi participant comments for each checklist item. The graphic information portrayed the percentage of total respondents that selected each possible score for the given item in round II. The respondents were instructed to rescore each item with the scale after viewing all respondents' scoring results from round II. Consequently, each respondent had a second opportunity to score each item, this round allowing each participant to score each item after viewing the summarized scores of the other respondents from round II.

Data Analysis For each descriptive identifier, the scores were divided from round III into 2 categories: (1) important and (2) unimportant. For this study, the tally of scores “very important” and “important” represented “important,” whereas the tally of the scores “unimportant” and “very unimportant” represented “unimportant.” We established a priori that consensus was obtained and each item would be

33

34

Cook et al Checklist for Reporting CPRs

Journal of Manipulative and Physiological Therapeutics January 2010

Fig 1. Flowchart of Delphi process.

Journal of Manipulative and Physiological Therapeutics Volume 33, Number 1

retained for the final checklist if 75% or greater of the respondents58 scored the item as “important,” a scoring mechanism that has been used previously in a number of studies.54,55,58 Upon termination of round III, a final checklist based exclusively on Delphi participants' consensus assessment was completed. The Delphi work team that consisted of the authors of this study who were not members of the Delphi panel then defined each item for better understanding. The definitions of each item are available in Appendix A.

RESULTS All 9 randomly selected Delphi participants completed rounds I, II, and III (100%). All but 2 participated in clinical practice (mean, 11 years; SD = 5.9), and the group had participated in research for an average of 8.3 years (SD = 3.6). Participant backgrounds included 4 physicians, 2 epidemiologists, and 3 physical therapists and resided in 4 different continents. Most (6) practiced in academic settings, but research interests and practice backgrounds varied between orthopedics (4), cardiopulmonary medicine (1), rehabilitation (1), pharmacology (1), and pain (1) (1 was not reported). In total, the group was responsible for 31 independent publications on CPRs (1 not reported) with a range of 1 to 7, reflecting diagnostic, prescriptive, and prognostic CPRs. None of the members corresponded or knew of one another's participation during the Delphi process. For round I, the checklist was condensed by the participants from 34 items (originally drafted by the investigative team) to 25 items. To accomplish this, Delphi participants made recommendations for removal or merger of 10 items; and the work team used this information to modify the existing checklist. The 10 items included the following: item 2—multiple settings, characteristic of all environments of appropriate care, are represented in the study (eg, teaching, rural, urban); item 4—the sample size and heterogeneity were large enough to allow appropriate subclassifications or groupings of homogenous patients from the heterogeneous sample; item 7—the outcome measure has a reasonable and well constructed cutoff score; item 10—the outcome measure used a global rating of patient status assessment that, if long term, is limited by retrospective recall bias and, if short-term, provides a well-defined and consistent minimally detectable important clinical change that is not variable among populations; item 13— the tests and measures were independent of one another (limited collinearity); item 25—the predictors were reflective and demonstrated face validity toward the construct of the dysfunction targeted; item 28—accuracy of the CPR leads to a minimal of +LR of 5 or greater or an error or misclassification rate of less than 20%; item 29—CPR derivations that examine critical or high-risk populations provide findings that have higher sensitivity than specificity

Cook et al Checklist for Reporting CPRs

and higher levels of true positives than false negatives (allows greater number of correct treatment interventions than incorrect interventions); and the validity-oriented items 33—a prospective follow-up randomized controlled validation study has determined that the CPR does lead to demonstrable improvements in patient outcomes and 34— the prospective follow-up randomized controlled validation study was performed on a population that was different from the derivation study and was generalizable. One new item was added: First-order interactions were assessed and reported. Selected items were further altered to clarify meaning and content. Round II resulted in scoring each item using the 4 scoring choices of very important, important, unimportant, and very unimportant. Based on the a priori consensus requirement of the choices very important or important 75% or greater of the time, only 2 items would have been removed if round II was the termination of the study: (1) confidence intervals of the regression analyses were reported, and (2) the CPR was designed to aid decision making in a topic that has high impact and can reduce the risk of unnecessary treatments or failure to use the correct treatment. Because the purpose of round II was not to terminate items, these items and the subsequent scores were retained for round III. The final round (round III) generated summated scores in which 5 of the items failed to meet the a priori consensus requirements. In response, the following items were eliminated (Table 1): (1) An R2 strength of association of the accepted predictor variable used in the model was reported. (2) The outcome variables demonstrated appropriate responsiveness to measure change in patient status (3) The CPR was designed to aid decision making in a topic that has high impact and can reduce the risk of unnecessary treatments or failure to use the correct treatment. (4) The selected predictive variables used within the CPR demonstrated clinical sensibility (ie, each seemed to make sense in its relationship to the outcome variable of interest). (5) The calculation of the CPR was not overly complex so that a clinician could quickly and efficiently use it during practice. In total, the revised CPR checklist (QUADCPR) for prescriptive, derivation-based studies includes 23 items associated with the domains of sample and participant, outcome measures, quality of tests and measures, and statistical assumptions. The revised CPR checklist is outlined in Table 2. Definitions of the items are presented in Appendix A.

DISCUSSION The goal of this project was to develop a consensus-based quality checklist for the creation of prescriptive, derivationbased CPRs. The final checklist (QUADCPR) is now available for researchers during the development of

35

36

Cook et al Checklist for Reporting CPRs

prescriptive, derivation-based CPR studies. The final checklist comprises 23 items that encompass 4 disparate constructs (sections), where each item is scored with the scores of “yes,” “no,” or “unclear.” At present, 2 quality checklists have been used during systematic assessment of CPR derivation34 and derivationvalidation studies.35 Both studies indicated that the checklists were not developed specifically for CPRs through consensus methods and had potential limitations as a specific quality measure. Both studies used quality checklist retrospectively to evaluate the overall quality of the CPR studies. In contrast to the methods used by the checklists from the systematic reviews,34,35 we do not advocate the use of the QUADCPR as a retrospective quality checklist. For example, we did not determine the importance of weighting each of the various items and therefore cannot recommend generating a single quantitative quality score with the QUADCPR. Rather, we were interested in identifying all the key components of prescriptive derivation-based CPRs, for prospective use during the developmental phase vs a retrospective use as witnessed in previous articles. The 4 resulting constructs of our checklist that emerged from the Delphi process include (1) sample and participants, (2) outcome measures, (3) quality of tests and measures, and (4) statistical assumptions. Each of these constructs has been identified in past publications as necessities for appropriate CPR development.20,22,28,35-41 In addition to our checklist, others have identified the necessity for prospectively selected consecutive patients22,38 chosen from a generalizable population37,38 to allow spectrum transferability of findings.60 Outcome measures and quality of tests and measures have also been included in other checklists such as QUADAS29 and Consolidated Standards of Reporting Trials.27 Novel items within the QUADCPR include item 15—the predictor tests that were used were reliable and item 8—the outcome measure accurately defined a positive or negative outcome of care and was not influenced by poorly definable sociologic or behavioral characteristics, opinion of the outcome measurer, or specific factors that are indicative of an institution or particular setting. These items will improve the likelihood of predictors that are repeatable among clinicians and outcome measures that are patient based, and not heavily influenced by environmental factors. The construct of statistical assumptions is of interest and has been identified as an area of frequent weakness in prescriptive CPR methodology.23 Derivation-based studies use multivariable regression modeling to identify potential variables for a CPR. In traditional regression analyses, basic assumptions are required such as power, reporting of confidence intervals, report of first-order interactions, and report of significance of the model. Many of these basic assumptions have been included in the QUADCPR and, if followed, should reduce the risk of spurious items or items that have a high degree of unreported interaction with other variables in the model.

Journal of Manipulative and Physiological Therapeutics January 2010

A recent perspective has questioned the use of a derivation phase in identifying variables for a CPR citing the risk of using prognostic variables vs treatment effect modifiers.61 Hancock et al61 have suggested that treatment effect modifiers, more specifically, process variables, should only be captured during a randomized controlled trial. Traditionally, most CPRs have been designed for use during 3 stages of development (derivation, validation using a randomized controlled trial, and impact analysis) and involve a derivation phase that includes a regression analysis. Regression analyses allow assessment of a large group of potential baseline findings to further define appropriate variable selections. As stated earlier, variables captured in a CPR during the derivation phase should not be used in clinical practice until validated in one or more randomized clinical trials. Although mentioned in previous studies,1,20,37,38,44-47 4 constructs were either merged or eliminated during the Delphi process. Responsiveness and validity items were either absorbed into statistical assumptions and outcomes measures or failed to reach consensus. The single item for potential impact and the 2 items for sensibility of use failed to meet consensus in the final round. Validation of the CPR was removed after round I when most Delphi participants indicated this concept did not fit in a derivation-based checklist. Our study used a 3-round Delphi survey to establish consensus and incorporated a number of experts that have published previously on CPRs. Several studies have used the Delphi technique to create standards in quality assessment,53,62-65 including the QUADAS29 and Quality of Reporting of Meta-analyses30 guidelines. The expertise of our group was demonstrated by the variety of backgrounds and the publication record of the group. The QUADCPR is a reflection of the usefulness of a Delphi tool, as significant modification occurred during the 3 rounds, including the additional of 1 item and the removal of 11 items from the original items.

Limitations The primary limitation of the QUADCPR is its focus toward only prescriptive, derivation-based studies. We do not feel the checklist is appropriate for nonprescriptive studies, nor do we feel that the tool is optimal for the design and reporting for diagnostic or prognostic CPRs. Another possible limitation is the size and the identification of the Delphi panel. There is risk of self-selection bias and the possibility that the tool is influenced by those who agreed to participate in the study. Although the Delphi instrument is a qualitative analysis and is immune from the sampling requirements of a randomized design including a representative sample size,53 a larger sample of experts would have a broader perspective and may have led to a different QUADCPR. Ethics and institutional review board guidelines restricted use of the Delphi panel participants' names and required confidentiality at all phases; including

Journal of Manipulative and Physiological Therapeutics Volume 33, Number 1

Cook et al Checklist for Reporting CPRs

Table 2. Revised quality checklist for prescriptive, derivation-based CPRs (QUADCPR)

publication. At present, 2 additional checklists have been used to assess quality of CPRs,34,35 which contain a number of different criteria than that found on the QUADCPR,

although it is important to note that the QUADCPR checklist is designed for use in the developmental phase of a derivation study vs retrospectively. These tools were

37

38

Cook et al Checklist for Reporting CPRs

not used in the initial checklist, as each was published after the completion of the Delphi rounds. Furthermore, the current QUADCPR requires further investigation for reliability and validity and may encounter a number of revisions as additional experts in CPR design review and potentially revise the QUADCPR. Future studies designed to refine the tool should consider a larger, more diverse Delphi group and, potentially, additional comments from the scientific community.

CONCLUSION The QUADCPR is designed to improve the design and reporting standards for prescriptive, derivation-based CPRs. Future studies should address inter- and intraobserver variability and validation of the checklist or further refinement of items and for scoring to allow for quantitative ratings of quality. In addition, based on the recent perspective involving selection of variables from a randomized controlled trial, future studies should investigate the sensitivity of the use of a derivation phase vs no derivation phase.

Practical Applications • Most recent rehabilitation-based CPRs have been associated with prescriptive studies. • The recent proliferation of CPRs has occurred without guidelines for reporting. • Checklists for reporting standards are typically developed using Delphi methodology. • This study used a Delphi method to outline appropriate reporting standards for prescriptive CPRs. • The checklist (QUADCPR) is designed to be used prospectively during development of CPRs.

FUNDING SOURCES AND POTENTIAL CONFLICTS OF INTEREST No funding sources or conflicts of interest were reported for this study.

REFERENCES 1. Beattie P, Nelson R. Clinical prediction rules: what are they and what do they tell us? Aust J Physiother 2006;52:157-63. 2. Brehaut JC, Stiell IG, Visentin L, Graham ID. Clinical decision rules “in the real world”: how a widely disseminated rule is used in everyday practice. Acad Emerg Med 2005;12:948-56. 3. Flynn T, Fritz JM, Whitman J, et al. A clinical prediction rule for classifying patients with low back pain who demonstrate short-term improvement with spinal manipulation. Spine 2002; 27:2835-43. 4. Childs JD, Fritz JM, Flynn TW, et al. A clinical prediction rule to identify patients with low back pain most likely to benefit from spinal manipulation: a validation study. Ann Intern Med 2004;141:920-8.

Journal of Manipulative and Physiological Therapeutics January 2010

5. Hicks GE, Fritz JM, Delitto A, McGill SM. Preliminary development of a clinical prediction rule for determining which patients with low back pain will respond to a stabilization exercise program. Arch Phys Med Rehabil 2005;86:1753-62. 6. Hancock MJ, Maher CG, Latimer J, Herbert RD, McAuley JH. Independent evaluation of a clinical prediction rule for spinal manipulative therapy: a randomised controlled trial. Eur Spine J 2008;17:936-43. 7. Cleland JA, Childs JD, Fritz JM, Whitman JM, Eberhart SL. Development of a clinical prediction rule for guiding treatment of a subgroup of patients with neck pain: use of thoracic spine manipulation, exercise, and patient education. Phys Ther 2007; 87:9-23. 8. Tseng YL, Wang WT, Chen WY, Hou TJ, Chen TC, Lieu FK. Predictors for the immediate responders to cervical manipulation in patients with neck pain. Man Ther 2006;11:306-15. 9. Fernandez-de-las-Penas C, Cleland JA, Caudrado ML, Pareja JA. Predictor variables for identifying patients with chronic tension-type headache who are likely to achieve short term success with muscle trigger point therapy. Cephalgia 2008;38: 264-75. 10. Lesher JD, Sutlive TG, Miller GA, Chine NJ, Garber MB, Wainner RS. Development of a clinical prediction rule for classifying patients with patellofemoral pain syndrome who respond to patellar taping. J Orthop Sports Phys Ther 2006;36: 854-66. 11. Currier LL, Froehlich PJ, Carow SD, et al. Development of a clinical prediction rule to identify patients with knee pain and clinical evidence of knee osteoarthritis who demonstrate a favorable short-term response to hip mobilization. Phys Ther 2007;87:1106-19. 12. Iverson CA, Sutlive TG, Crowell MS. Lumbopelvic manipulation for the treatment of patients with patellofemoral pain syndrome: development of a clinical prediction rule. J Orthop Sports Phys Ther 2008;38:297-309 [discussion 309-12]. 13. Vicenzino B, Smith D, Cleland J, Bisset L. Development of a clinical prediction rule to identify initial responders to mobilization with movement and exercise for lateral epicondylalgia. Man Ther 2008 [Epub ahead of print]. 14. Vicenzino B, Collins N, Cleland J, McPoil T. A clinical prediction rule for identifying patients with patellofemoral pain. Br J Sports Med 2008 [Epub ahead of print]. 15. Alghamdi AA, Davis A, Brister S, Corey P, Logan A. Development and validation of Transfusion Risk Understanding Scoring Tool (TRUST) to stratify cardiac surgery patients according to their blood transfusion needs. Transfusion 2006; 46:1072-4. 16. Van Belle A, Buller HR, Huisman MV, et al. Christopher Study Investigators. Effectiveness of managing suspected pulmonary embolism using an algorithm combining clinical probability, D-dimer testing, and computed tomography. JAMA 2006;295:172-9. 17. Dunning J, Au J, Kalkat M, Levine A. A validated rule for predicting patients who require prolonged ventilation post cardiac surgery. Eur J Cardiothorac Surg 2003;24:270-6. 18. Fischer Walker CL, Rimoin AW, Hamza HS, Steinhoff MC. Comparison of clinical prediction rules for management of pharyngitis in setting with limited resources. J Pediatr 2006; 149:64-71. 19. Fischer JE, Steiner F, Zucol F, et al. Use of simple heuristics to target macrolide prescription in children with communityacquired pneumonia. Arch Pediatr Adolesc Med 2002;156: 1005-8. 20. Childs JD, Cleland JA. Development and application of clinical prediction rules to improve decision making in physical therapist practice. Phys Ther 2006;86:122-31.

Journal of Manipulative and Physiological Therapeutics Volume 33, Number 1

21. Hier DB, Edlestein G. Deriving clinical prediction rules from stroke outcome research. Stroke 1991;22:1431-6. 22. Wasson JH, Sox HC, Neff RK, Goldman L. Clinical prediction rules: applications and methodological standards. New Engl J Med 1985;313:793-9. 23. Toll DB, Janssen LJM, Vergouwe Y, Moons KGM. Validation, updating and impact of clinical prediction rules: a review. J Clin Epi 2008;61:1085-94. 24. Casali PG, Licitra L, Bruzzi P. QUOROM and the search for an updated ‘clinical method’ in the era of evidence-based medicine. Ann Oncol 2000;11:923-5. 25. Smidt N, Rutjes AW, van der Windt DA. Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies. BMC Med Res Methodol 2006;6:12. 26. Altman DG, Schulz KF. Statistics notes: concealing treatment allocation in randomised trials. BMJ 2001;323:446-7. 27. Moher D, Schulz KR, Altman DG, CONSORT GROUP (Consolidated Standards of Reporting Trials). The CONSORT statement; revised recommendations for improving the quality of reports of parallel-group randomized controlled trials. Ann Intern Med 2001;134:657-62. 28. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC, Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD Initiative. Ann Intern Med 2003;138:40-4. 29. Whiting P, Rutjes AV, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25. 30. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomized controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999;354:1896-900. 31. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of Observational Studies in Epidemiology (MOOSE) group. JAMA 2000;283:2008-12. 32. Vandenbroucke JP, von Elm E, Altman DG, et al. STROBE initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Ann Intern Med 2007;147:163-94. 33. Eysenbach G. Improving the quality of Web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res 2004;6:e34. 34. Beneciuk JM, Bishop M, George SZ. Clinical prediction rules for physical therapy interventions: a systematic review. Phys Ther 2009;89:114-24. 35. May S, Rosedale R. Prescriptive clinical prediction rules in back pain research: a systematic review. J Man Manipulative Ther 2009;17:36-45. 36. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use (4th ed.) Oxford: Oxford University Press; 2008. 37. Knottnerus JA. Diagnostic prediction rules: principles, requirements, and pitfalls. Prim Care 1995;22:341-63. 38. Laupacis A, Sekar M, Stiell IG. Clinical prediction rules: a review and suggested modifications of methodological standards. JAMA 1997;277:488-94. 39. McConnochie KM, Roghmann KJ, Pasternack J. Developing clinical prediction rules and evaluating observational patterns using categorical clinical markers. Med Decis Making 1993;13: 30-42. 40. Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol 1997;50:869-79.

Cook et al Checklist for Reporting CPRs

41. Schmitt JC, Di Fabio RP. Reliable change and minimum important difference (MID) proportions facilitated group responsiveness comparisons using individual threshold criteria. J Clin Epidemiol 2004;57:1008-18. 42. Schmitt JC, Di Fabio RP. The validity of prospective and retrospective global change criterion measures. Arch Phys Med Rehabil 2005;86:2270-6. 43. Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariate methods. Ann Intern Med 1993;118: 201-10. 44. Blackmore CC. Clinical prediction rules in trauma imaging: who, how, and why. Radiology 2005;235:371-4. 45. Jaeschke R, Guyatt GH, Sackett DL. Users guide to the medical literature. III. How to use an article about a diagnostic test. What are the results and will they help me? JAMA 1994;271: 703-7. 46. McGinn TG, Guyatt GH, Wyer PC, et al. Evidence based medicine working group. User's guides to the medical literature. XXII. How to use articles about clinical decision rules. JAMA 2000;284:79-84. 47. Reilly BM, Evans AT. Translating clinical research into clinical practice: impact of using prediction rules to make decisions. Ann Intern Med 2006;144:201-9. 48. Powell C. The Delphi technique: myths and realities. J Advanced Nurs 2003;41:376-82. 49. Fink A, Kosecoff J, Chassin M, Brook R. Consensus methods: characteristics and guidelines for use. Am J Pub Health 1984; 74:979-83. 50. Dawson S, Barker J. Hospice and palliative care: a Delphi survey of occupational therapists roles and training needs. Aust Occupational Ther J 1995;2:119-27. 51. Distler O, Behrens F, Pittrow D, et al. EPOSS-Omeract Group. Defining appropriate outcome measures in pulmonary arterial hypertension related to systemic sclerosis: a Delphi consensus study with cluster analysis. Arthritis Rheum 2008;59:867-75 [Erratum in: Arthritis Rheum. 2008;59(8):1202]. 52. Wilson MR, Lee PP, Weinreb RN, Lee BL, Singh K. Glaucoma Modified RAND-Like Methodology Group. A panel assessment of glaucoma management: modification of existing RAND-like methodology for consensus in ophthalmology. Part I: methodology and design. Am J Ophthalmol 2008;145: 570-4. 53. Adler M, Ziglio E. Gazing into the oracle. Bristol (Pa): Jessica Kingsley Publishers; 1996. 54. Cook C, Brismée JM, Fleming R, Sizer PS. Identifiers suggestive of clinical cervical spine instability: a Delphi study of physical therapists. Phys Ther 2005;85:895-906. 55. Cook C, Brismée JM, Sizer PS. Subjective and objective descriptors of clinical lumbar spine instability: a Delphi study. Man Ther 2006;11:11-21. 56. Dillman D. Mail and Internet surveys. The tailored design method. 2nd ed. New York: John Wiley and Sons; 2000. 57. Lopopolo R. Hospital restructuring and the changing nature of the physical therapist's role. Phys Ther 1999;79:171-85. 58. Binkley J, Finch E, Hall J, Black T, Gowland C. Diagnostic classification of patients with low back pain: report on a survey of physical therapy experts. Phys Ther 1993;73:138-50. 59. Shah A, Jacobs DO, Martins H, Harker M, Menezes A, McCready M, Pietrobon R. DADOS-Survey: an open-source application for CHERRIES-compliant Web surveys. BMC Med Inform Decis Mak 2006;6:34. 60. Cleary K. Using the Delphi process to reach consensus. J Cardiopulm Phys Ther 2001;1:20-3. 61. Hancock M, Herbert RD, Maher CG. A guide to interpretation of studies investigating subgroups of responders to physical therapy interventions. Phys Ther 2009;89:698-704.

39

40

Cook et al Checklist for Reporting CPRs

62. Stheeman S, van't Hof M, Mileman P, van der Stelt P. Use of the Delphi technique to determine standards for quality assessment in diagnostic radiology. Community Dent Health 1995;12:194-9. 63. Begg C, Metz C. Consensus diagnoses and “gold standards”. Med Decis Making 1990;10:24-9. 64. Altman R. Criteria for the classification of osteoarthritis of the knee and hip. Scand J Rheumatol Suppl 1987;65:31-9. 65. Grahm B, Regehr G, Wright J. Delphi as a method to establish consensus for diagnostic criteria. J Clin Epidemiol 2003;56: 1150-6.

APPENDIX A. EXPLANATORY CRITERIA FOR EACH ITEM OF THE QUADCPR. THE DESCRIPTORS WERE CREATED BY THE DELPHI WORK TEAM AFTER THE DELPHI PANEL OUTLINED EACH ITEM CRITERION Sample and Participants 1. The setting and location were reflective of a typical population that would receive this intervention. Description: The item reflects the necessity to report the specifics involving the setting and location (eg, acute inpatient hospital or primary orthopedic outpatient practice) in which patients from the study were sampled. Any unique element of the environment (eg, high percentage of patients participating in sports or a military population) should be reported. 2. The investigators clearly described how they estimated the proportion of eligible participants (in terms of specific inclusion and exclusion criteria) that were entered into the study to improve the transferability of the findings. Description: The item reflects the importance of reporting the percentage of the typical population that met the targeted requirements, including the percentages that were excluded or were not enrolled because they did not meet the specific requirements. 3. The sample was appropriately described (age, sex, education, etc) and was reflective of a typical and appropriate patient group that would receive this intervention (limited ascertainment bias). Description: This item reflects the need to report specific descriptive characteristics of the patients so that the reader can ascertain whether these characteristics are similar to a population normally seen with a similar condition. 4. The sampling was prospective and consecutive (retrospective, in cases of rare conditions). Description: The item reflects the need to prospectively and consecutively recruit a targeted population to reduce the bias associated with sampling. Outcome Measure 5. The outcome measure(s) was (were) operationally defined. Description: The item reflects the need to appropriately describe the latent construct of the outcome measure the specific cutoff score used for the study.

Journal of Manipulative and Physiological Therapeutics January 2010

6. The outcome measure has demonstrated reliability, validity, and sensitivity to change. Description: The item reflects the need to describe the psychometric characteristics of the outcome measure using previously reported studies. 7. The outcome measure was administered in a blinded fashion. Description: The item reflects the need to report the mode of administration of the outcome measures (eg, verbal administration, patient filled out in absence of the clinician, telephonically) and whether the therapist applying the intervention was blinded to the measures. 8. The outcome measure accurately defined a positive or negative outcome of care and was not influenced by poorly defined sociologic or behavioral characteristics, opinion of the outcome measurer, or specific factors that are indicative of an institution or particular setting. Description: The item reflects the need to select an outcome measure that is patient driven and not altered significantly by issues that are sociologic or behavioral in nature. Examples include length of stay (which is affected by hospital and physician tendencies), costs (affected by insurance type and provider tendencies), or surgery (often driven by surgeon decision). Quality of Tests and Measures 9. A logical rationale was presented for why the predictor test was selected and how the predictor could identify responders to treatment. Description: The item reflects the need to support the use of the predictor tests by including appropriate reasoning behind the selection of the procedure and to decrease the potential for inappropriate burdening of the patients in the study. 10. The predictor test was performed before the clinical outcome was determined (prospectively). Description: The item reflects the need to prospectively perform the predictor test to reduce the risk of bias. 11. The procedures of the tests and measures were explained in detail. Description: The item reflects the need to describe the procedures in detail for follow-up studies or for use in clinical practice. 12. The procedures of the tests and measures were performed using a method that is clinically consistent, acceptable, and appropriate. Description: The item reflects the need to appropriately perform the test and measure in a fashion that is clinically acceptable and in a manner that was previously reported and validated. 13. The clinicians who performed the predictor tests and measures were blinded to the outcome measure scores. Description: The item reflects the need for clinician blinding to the outcome measures. Knowing the scores may alter the application of the tests and measures, and this item is intended to reduce this risk for bias.

Journal of Manipulative and Physiological Therapeutics Volume 33, Number 1

14. The clinicians who performed the initial and final outcome measures were not the same clinicians who provided the treatment. Description: The item is designed to reduce the risk of internal bias. Clinicians who obtain the outcome measures should not have provided treatment to the patients in the study. 15. The predictor tests that were used were reliable (≥.60 κ and or .70 intraclass correlation coefficient (ICC) either through previous report or through report within the findings of the study. Description: The item reflects the need to use tests and measures that have acceptable reliability in clinical practice. 16. The time interval between the testing or treatment and the outcome measures was appropriate. Description: The item reflects the need to perform the testing and outcome measures within a reasonable timeframe so that the patient condition does not unduly deteriorate or improve secondary to temporal changes as compared with changes likely attributable to the intervention. 17. Equivocal or indeterminable results were reported. Description: The item reflects the need to report indeterminable test results in which a clinician is unable to use for decision making. For example, an indeterminable test result (ie, diagnostic test or outcome measure) is a finding that cannot be clearly interpreted as either negative or positive. Statistical Assumptions 18. Power of the model was prospectively calculated using all preselected predictor variables and targeting 10 to 15 subjects per predictor variable (avoidance of overfitting if likelihood ratios are targeted); or if recursive partitioning analysis is used, the sample is dictated by the required precision around the number of positive cases. Description: The item reflects the need for appropriate a priori power calculations, involving a minimum of 10 to 15 subjects per predictor variable included for the multivariate

Cook et al Checklist for Reporting CPRs

analyses. This item applies whether or not univariate analyses were conducted. 19. First-order interactions were assessed and reported. Description: The item reflects the need to report the potential of a first-order interaction, which is a “new” variable created from the product of 2 predictor variables that have interacted in the regression model. The risk of a first-order interaction is very high in the initial stages of regression modeling but is reduced with stepwise methods. 20. The statistical significance of the model or “fit” was reported. Description: The item reflects the need to report whether the predictor variables used were appropriate in predicting the outcome of interest. If the model was not significant, then the predictor variables may not influence the actual outcome. 21. Confidence intervals of the regression analyses were reported. Description: The item reflects the need to report the width of the confidence intervals that reflect the accuracy of the point estimate (eg, odds ratio) provided for each individual predictor variable. 22. After univariate assessment of predictor variables (or through tree classification), irrelevant predictors were removed before multivariate modeling. Description: The item reflects the need to report the variables that were not associated with the outcome (by an a priori targeted P value) and were therefore not used in further modeling. 23. Statistical results of the CPR (probability scores, likelihood ratios, or risk ratios) were reported using 95% confidence intervals. Description: The item reflects the need to report the width of the confidence intervals that reflect the accuracy of the point estimate (eg, odds ratio) provided for the combined cluster used in the CPR.

41