Applying evidence on outcome measures to hand therapy practice

Applying evidence on outcome measures to hand therapy practice

Applying Evidence on Outcome Measures to Hand Therapy Practice Joy C. MacDermid, BScPT, PhD School of Rehabilitation Science McMaster University Hamil...

140KB Sizes 1 Downloads 69 Views

Applying Evidence on Outcome Measures to Hand Therapy Practice Joy C. MacDermid, BScPT, PhD School of Rehabilitation Science McMaster University Hamilton, Ontario, Canada Clinical Research Lab Hand and Upper Limb Centre St. Joseph’s Health Centre London, Ontario, Canada Career Scientist of the Ontario Ministry of Health Health Research Personnel Development Program

ABSTRACT: Standardized outcome measures can enhance clinical decision making in hand therapy. Processes in which evidence is used to make decisions on individual patients with respect to the patient’s level of impairment and disability as well as the significance of any changes observed after interventions are consistent with an evidence-based approach. Evidence can enhance clinical decision making and provide objective criteria for goal setting and evaluation. The authors review the necessary concepts and approaches to applying evidence on outcome measures using a vignette that describes a patient with rotator cuff pathology who has provided a Disability of the Arm, Shoulder, and Hand (DASH) score during clinical assessment. J HAND THER. 2004;17:165–173.

Paul Stratford, PT, MSc School of Rehabilitation Science McMaster University Hamilton, Ontario, Canada

Using tests and measures in clinical decision making on upper extremity-injured patients is an important component of hand therapy practice. The most recent practice analysis indicated that this activity comprises 20% of a hand therapist’s time and is ranked as the most critical domain of practice.1,2 Practice analysis respondents indicated that their clinical judgment and reasoning were at the level of collecting basic data before they specialized in hand therapy practice. After two years of practice, hand therapists reported competencies that required that they collect complex clinical data and be able to synthesize and interpret multiple, sometimes conflicting sources of data.3 Principles of evidence-based practice can assist the hand therapist to take his or her clinical reasoning to a higher level and provide concrete skills for dealing with complex and conflicting information.1 Tests and measures are performed for one or more of three reasons: to determine a patient’s status at the time of assessment; to predict a subsequent event; and to detect change over time.4 Effective outcome measures must be adept at determining a patient’s status at a given point in time and in assessing change over time. Utilization of reliable and valid outcome measures, in particular, those that allow patients to self-report their own pain/disability experiences, is a well-accepted principle in hand therapy. UnfortuCorrespondence and reprint requests to Joy C. MacDermid, BScPT, PhD, School of Rehabilitation Science, IAHS, 1400 Main Street West, 4th Floor, Hamilton, Ontario, Canada L8S 1C7; e-mail: . doi:10.1197/j.jht.2004.02.005

nately, implementation is low.5,6 Furthermore, hand therapists wanting to understand how to use upper extremity self-report scales in clinical decision making find relatively little attention directed toward this process either in professional textbooks or in continuing education workshops.6 In this article, we review the terminology associated with outcome measures, suggest criteria for evaluating outcome measures, and illustrate how information from outcome measures can be used to enhance clinical decision making within an evidence-based framework. We begin by introducing a vignette that we will refer to throughout the article.

VIGNETTE A male patient, 55 years of age, attends your clinic for the first time with a diagnosis of right shoulder pain from a referring general practitioner. He is currently unable to work because of his injury and has not had previous treatment for his present problem. The patient has an ultrasound report indicating a partial-thickness tear of the supraspinatus tendon. His range of motion is within normal limits and equal bilaterally. You measure his isometric external rotation strength as being 17 Nm on his affected side and 27 Nm on his unaffected side. Recently, your clinic has added the Disability of the Arm, Shoulder, and Hand (DASH) self-report outcome measure to its clinic assessment protocol. The patient’s score is 44 at this visit. You know the DASH is a region-specific measure designed to assess upper April–June 2004 165

extremity functional status, but you are not sure how to apply the results to your patient. You are experienced with writing short- and long-term measurable goals for outcome measures such as range of motion, but DASH scores do not have the same meaning to you. After reading the vignette, consider the following five questions that must be considered during clinical decision making: 1) What is the patient’s status today? 2) What is the error associated with the measured value? 3) How much will the score need to change on your subsequent assessment to demonstrate that a real change has occurred? 4) How much will the score need to change on a subsequent assessment to demonstrate an important amount of change? 5) What DASH score is required to meet your long-term treatment goal? Like the clinician in the vignette, many of our colleagues have difficulty providing confident answers to these five questions. For this reason, the clinicians do not maximize the potential of self-report measures to assist with clinical decision making, rationalization for therapy needs, or justification of the beneficial effects of therapy.

THE JARGON OF CHANGE One of the primary reasons for hand therapists to use outcome instruments is to demonstrate changes in clinical status. Thus, it is important to understand terminology on clinical change. Although much has been written about assessing change, no singularly agreed-on terminology has won the day. In this section, we provide an overview of jargon frequently encountered in the outcome literature. The terms ‘‘sensitivity to change’’ and ‘‘responsiveness’’ play a prominent role in the outcome literature. Some authors give the terms equivalent meanings; others suggest the terms have two distinct interpretations. For example, Liang defines sensitivity to change as ‘‘the ability of an instrument to measure change in the state regardless of whether it is relevant or meaningful to the decision maker. Sensitivity to change is a necessary but insufficient condition for responsiveness.’’7 Liang defines responsiveness as ‘‘the ability of an instrument to measure a meaningful or clinically important change in a clinical state . . . It implies a change that is noticeably, appreciably different that is of value to the patient (or physician). The change may allow the individual to perform some essential task or to perform tasks more efficiently or with less pain or difficulty. These changes also should exceed variation that can be attributed to chance.’’ We find the clarity of Liang’s statements appealing and will apply his terminology throughout our article. In addition to the sensitivity to change and responsiveness jargon conundrum encountered in previous writings, many authors have not explicitly 166

JOURNAL OF HAND THERAPY

declared the type of change they were considering and to whom the change or difference applied. In a landmark monograph, Beaton and colleagues8 proposed a taxonomy for classifying the results of change studies. In brief, the taxonomy considers three dimensions: who, which, and what. The ‘‘who’’ dimension considers whether the results apply at an individual patient level or a group level. The ‘‘which’’ dimension classifies the scores being contrasted. There are three levels: 1) between-person differences; 2) within-person change over time; and 3) between-person differences of within-person change. The ‘‘what’’ dimension considers the type of change being quantified. There are five levels: 1) minimum potential change; 2) minimum change detectable given the measurement error of the instrument; 3) observed change measured by the instrument in a given population; 4) observed change in a population deemed to have improved; and 5) observed change in those deemed to have had an important improvement. The importance of considering these categories cannot be stressed enough when applying the results of outcome measures’ studies to clinical practice. Highlighting the need to be specific when considering change is the work of Goldsmith et al.,9 who demonstrated that the magnitude of change required to be important at the level of an individual patient is substantially greater than the minimal clinically important between-group difference.

FIVE-STEP PROCESS IN EVIDENCE-BASED PRACTICE The same five-step process used in evaluating evidence for treatment effectiveness or prognosis can be used when evaluating outcome measures.10 Practically speaking, it is often true that search strategies are more difficult because the information required may be embedded in a variety of study designs. Nevertheless, the basic evidence based process provides a frame the work on which clinical decision-making can be more evidence-based. These five steps are: 1. Ask an answerable question 2. Search the literature (to obtain the appropriate evidence) 3. Assess the validity of the current research 4. Assess the importance of the evidence obtained 5. Apply the results to the patient

ASK AN ANSWERABLE QUESTION The starting point for selecting the best outcome measure for a specific task begins with forming an answerable question. When making generic issues on the measurement properties of outcome measures

specific to a given clinical problem such as our vignette, we need to consider the criteria that define our clinical problem. These key features may include the type of pathology, intervention, patient, and outcome measures that are the focus of our problem. Based on the vignette and the five generic questions listed previously, we formed more specific questions that we could then use as a basis for searching the literature.

1. What Does the DASH Tell Us about the Patient’s Status Today? We need to understand what a DASH score tells us about a patient presenting with shoulder pain, specifically due to a rotator cuff tear. For example, understanding the scores that we would expect a typical patient with an acute rotator cuff tear to have on this questionnaire would help us understand the severity of this particular patient’s problem. We also need to understand whether the DASH score provides a valid estimate of the concept (disability related to cuff lesion) we want to measure. Therefore, we need to find information on the validity of DASH scores for patients with rotator cuff pathology. We also require descriptive information on the types of scores patients with rotator cuff would typically report for comparison.

2. What Is the Error Associated with the Measured DASH Value? To be confident in a measured DASH value, we need know the measurement error is sufficiently small to allow a meaningful interpretation of the score.

3. How Many Points Does a DASH Score Need to Change on a Subsequent Evaluation before We Could Be Confident That Our Patient Has Experienced a Real Change in Upper Extremity Disability? One reason for reassessing patients is to determine whether they have truly changed. To be reasonably certain that a true change has occurred, the measured changed must exceed the error associated with the measurement process (no. 2 from Beaton’s ‘‘what’’ domain). Accordingly, when formulating a measurable short-term goal, clinicians are required to consider the error associated with the measurement. Often, the term minimal detectable change (MDC) is used to specify this value. The short-term goal would state that the patient would change by an amount equal to or greater than MDC. The ideal reassessment interval is when the typical patient with the characteristics of the patient of interest would change by an amount equal to or greater than MDC.

4. How Many Points Does a DASH Score Need to Change before We Could Confidently Say That an Important Change in Upper Extremity Disability Has Occurred for This Patient? In addition to determining whether a true change has occurred, clinicians and patients are also interested in clinically important change. The comments considered for true change also apply to clinically important change. However, a clinically important change implies that confidence that a patient has made meaningful change.

5. What DASH Score Is Required to Meet Our Long-term Treatment Goals for This Patient? Realistic long-term goals are shaped by the patient’s preference and customary or normative values for persons with characteristics similar to the patient of interest. Again, data on patients who have had varying levels of disability will help us to set realistic goals.

SEARCH THE LITERATURE The second step in selecting or applying an outcome measure is to perform an effective literature search. In contrast to prognosis or effectiveness literature searches, in which it is usually possible to define a single clinical question, multiple considerations must be taken into account when evaluating outcome measures according to the five questions previously mentioned. In an ideal situation, one would obtain studies with data on reliability, validity, and responsiveness specific to patients with rotator cuff pathology. Also, well-designed cohort studies would provide information on the scores for specific outcomes in patients with rotator cuff pathology at various points in their illness experience. If this information were complete and synthesized in the user’s manual, it would assist us greatly. In reality, this comprehensive information will rarely be available even for the most commonly used outcome measures. Therefore, it is often necessary to use information obtained from a variety of sources and, in many cases, extrapolate data taken from patient populations that are somewhat dissimilar to our patient. Fortunately, the DASH has more supporting evidence than many measures and much of this has been reported within a user’s manual, so in this case, we are able to find the necessary data with relative ease and certainty. We designed a two-step search strategy. In the first stage, we searched for studies that used the DASH in patients with rotator cuff pathology. Our first search using PubMed applied the following terms: DASH AND (‘‘rotator cuff’’ OR shoulder AND [tendinitis OR tendonitis]). Three articles were identified.11–13 When April–June 2004 167

the specific information required to answer our questions was not obtained through this search, we conducted a broader search using search terms that focused on measurement terminology. Our second search, once again using PubMed, applied the following terms: DASH AND disability AND (reliability OR validity OR responsiveness OR sensitivity). The latter search yielded 14 articles.14–27 In total, 17 different articles were identified and reviewed by both coauthors and relevant data were extracted. In addition to these articles, the authors’ personal reference lists on outcome measures28–30 and reference lists from relevant publications were used as sources for obtaining additional appropriate articles. Also, we accessed the DASH manual31 and the American Academy of Orthopaedic Surgeons’ web site (www.aaos.org/research/normstdy/main.cfm),32 which contained normative DASH data.

ASSESS THE VALIDITY OF THE RESEARCH Messick states, ‘‘. . .validity is an evolving property and validation is a continuing process.33 Because evidence is always incomplete, validation is essentially a matter of making the most reasonable case to guide current use of the test and to guide future research that will advance understanding of what the test scores mean.’’ When evaluating the validity of an outcome measure, information obtained from multiple studies must be considered. The validity of each study’s research design is assessed and the results from methodologically sound studies are then synthesized into a comprehensive body of evidence concerning the measure. Thus, although studies that comment on an aspect of reliability or validity are necessary, they alone provide insufficient information for judging the merits of a specific outcome measure. To assist in evaluating the validity of a single study, we suggest that the following questions be considered.

be entered into the study over a period, consecutive patient sampling provides the best strategy for ensuring an unbiased sample.

If Rater Bias Is a Potential Issue, Were Assessments Obtained in an Independent Manner? An awareness of other raters’ responses may influence a specific rater’s interpretation of a test result. For this reason, steps should be taken to blind raters from the responses of other raters.

Was There a Reasonable Comparison Standard? The validity of a measure is established by comparing the measure’s results with that of the truth or gold standard. When a true gold standard exists, this comparison is referred to as criterion validity. Unfortunately, for many attributes assessed in clinical practice—pain, functional status, and health-related quality of life—no gold standard exists. In the absence of a gold standard, the validation of a measure draws heavily on construct validity. A construct validation process involves forming theories about the attribute of interest and evaluating the extent to which the test or measure of interest provides results consistent with the theories concerning the attribute. For example, an investigator may form the following theory concerning upper extremity functional status: The functional status of persons who are off work because of their problem is lower than the functional status of persons who are able to work with their problem. The outcome measure would be administered and the results analyzed to determine the extent to which the findings supported the constructed theory. The validity of a measure is enhanced as the number and diversity of comparison standards supporting its application grows.

Were Subjects Sampled in an Unbiased Manner?

Has the Measure Been Shown to Be Superior to Competing Outcome Measures?

The magnitude of reliability and validity coefficients is influenced by the variability among the subjects composing the sample. As the subject variability increases, the magnitude of the reliability and, in most instances, validity coefficients also increase. A decrease in subject variability has the opposite effect. For this reason, it is essential that subjects composing a study be sampled in an unbiased manner. When an abundance of subjects are available at the time of investigation, random sampling is the best method of protecting against sampling bias. When subjects will

All else being equal, it is reasonable to believe that clinicians are interested in using the best measure from a number of competing measures. The identification of the best measure requires a head-tohead comparison study consisting of the measure of interest and the relevant competing measures. In addition to presenting the relevant validity coefficients, the head-to-head comparison study should formally test whether the observed differences among the competing measures are statistically significant and clinically important.

168

JOURNAL OF HAND THERAPY

RELIABILITY AND VALIDITY COEFFICIENT QUESTIONS

To What Extent Does the Measure Display Convergent Validity?

In addition to the generic questions posed previously, answers to the following specific questions are needed to judge the rigor and clinical usefulness of outcome measures. After each question, we provide a brief summary of the DASH’s measurement property obtained in most instances from a synthesis of investigations retrieved by the searches described above.

Convergent validity examines the extent to which a measure’s result is consistent with the result of another measure that is believed to be assessing the same attribute. Typically, correlation coefficients are applied to quantify convergent validity. DASH scores correlated highly with other upper extremity functional status measures (0.77–0.89), the ability to work (0.77), and ratings of pain (0.65–0.72). Also, DASH change scores demonstrated high correlations with self-rated change (0.76), change in function (0.69), and change in pain (0.65).14,17,26,30,31

To What Extent Does the Measure Display Internal Consistency? Internal consistency is relevant when tests or measures contain multiple items that are summed to form a total score. Conceptually, an internal consistency coefficient—the most frequently reported is coefficient alpha—can be thought of as the average correlation, corrected for test length, of all possible split-half correlation coefficients. A split-half correlation coefficient is obtained by dividing the test items into halves and calculating the correlation between the halves. The internal consistency coefficient is used to estimate the confidence in a patient’s score at a single point in time. This is illustrated subsequently in the ‘‘Importance’’ section. The internal consistency for the DASH is in the order of 0.96 and exceeds the recommended level of internal consistency (0.90–0.94) for decisions concerning individual patients.31

To What Extent Does the Measure Display Test–Retest Reliability? Test–retest reliability studies provide information about the stability of persons’ responses over time in persons who truly remain unchanged. The Intraclass correlation coefficient is frequently used to quantify test–retest reliability.34 The test–retest reliability coefficient is applied to define the minimal level of detectable change in scale points. This application is illustrated in the ‘‘Importance’’ section. Estimates of test–retest reliability for the DASH typically exceed 0.92.14,31

To What Extent Does the Measure Display Interrater Reliability? Some outcome measures require the involvement of a rater. In such cases, information concerning interrater reliability is needed. The intraclass correlation coefficient is used to quantify interrater reliability.34 Because the DASH is a self-report measure, interrater rater reliability does not come into play.

To What Extent Does the Measure Display Known Group Validity? Known group validity explores the extent to which a measure is capable of distinguishing distinct groups who are known to possess different levels of the attribute of interest. Analysis of variance and ttests are usually applied to evaluate known group validity. The DASH has been shown to be able to discriminate between the following groups with upper extremity problems: persons who are able to perform all their activities of daily living and those who are not; persons who are able to work and persons who are not able to work; and persons working without restrictions and persons working with restrictions.31

To What Extent Does the Measure Display Discriminant Validity? Measures are designed to fulfill a specific purpose. The DASH was conceived to assess upper extremity functional status. Discriminant validity examines the extent to which a measure selectively assesses the attribute of interest rather than a general concept. To establish discriminant validity, a contrast is required. For example, one would expect to find a higher correlation between the DASH and a competing upper extremity functional status measure than between the DASH and a measure of mental health or general well-being. DASH scores correlated more highly with 36-item short form health survey (SF-36) physical function (0.65–0.73) and pain scores (0.56–0.70) than with SF36 mental health scores (0.29–0.38).14,17,26,30,31

To What Extent Does the Measure Display Factorial Validity? Some measures contain multiple subscales. For example, the Western Ontario Rotator Cuff scale (WORC)29,30,35 contains five subscales that address April–June 2004 169

different domains such as physical symptoms, lifestyle, recreation/sports, work, and emotions. A factorial analysis might support these subscales as distinct factors, if the items fit together in factors consistent with the subscales. The DASH was designed to measure a single-concept upper extremity disability and therefore should demonstrate or ‘‘load’’ on one factor. Exploratory factor analysis has shown that all DASH items load on a single factor (e.g., factor loading values >0.40).

ASSESS THE IMPORTANCE OF THE FINDINGS Given a measure has been shown to have adequate cross-sectional and longitudinal validity indexes, the next step is to consider the importance of the findings. To judge the importance of an outcome measure’s ability to achieve its desired goal, the reliability and validity coefficients presented in the previous section must be translated into scale points. Referring to the vignette, we frame the five questions posed previously in terms of DASH points: 1) What does the patient’s initial DASH score of 44 represent mean in terms of the extent of upper extremity disability? 2) What is the error associated with today’s DASH score? 3) How much must the patient’s DASH score change on a future evaluation for us to be confident a real change has occurred? 4) How much must the patient’s DASH score change on a future evaluation for us to be confident a clinically important change has occurred? 5) How much must the patient’s DASH score change on a future evaluation for us to be confident that the patient’s has achieved a level of disability consistent with his long-term goal of return to work?

What Does the Patient’s Score Mean? For a measure to be useful, its value must have meaning to us. This develops with experience—just as it did for our more traditional, impairment-based measures such as grip, sensory thresholds, or range of motion. Certainly, experience using self-report measures increases our insight into score expectations and progress through the same experiential methods we used to develop skill in interpreting impairment measures. However, in an evidencebased practice approach, we also use evidence to the interpretation of scores. Methods of defining the meaning of a patient’s score from data include comparing the score to normative data or to data obtained from groups with specific problems and levels of disability. Normative data for the DASH are published28 and presented on the American Academy of Orthopaedic Surgeons’ web site32 and within 170

JOURNAL OF HAND THERAPY

the DASH user’s manual.31 The data are presented as standardized scores with a mean of 50 and a standard deviation 10. The standard score for a 55 year old is 51, which equals a DASH score of 12/100. The interpretation is that the average 55 year old in the general population reports a DASH score of 12. Atroshi et al.26 reported a mean DASH score of 43 for patients awaiting shoulder surgery compared with a mean score of 35 for nonsurgical patients. Skutek et al.12 reported a mean preoperative DASH score of 49 for persons with rotator cuff tears awaiting surgery. Beaton et al.14,31 reported a mean DASH score of 50.7 for persons unable to work because of their upper extremity problem compared with a mean score of 26.8 for persons able to work with an upper extremity problem. The vignette patient’s DASH score of 44 is consistent with a person reporting substantial disability due to rotator cuff pathology.

How Confident Am I in the Reported Rating for This Patient? To answer this question, one must know the amount of error associated with a measured value. The standard error of measurement (SEM) of relevance here should not to be confused with the standard error of the mean, which has the same abbreviation. Whereas the standard error of the mean reflects the variability of a distribution of samples (estimated by dividing the sample standard deviation by the square root of the sample size), the SEM of interest to us is a measure of the error associated with a reported value. This SEM reflects measurement error by conveying how much a score is likely to vary with repeated measurements of the same subject. It could be determined by taking a large number of measurements on the same subjects and finding out the standard deviation of the scores. For practical reasons, it is difficult to determine SEM in this direct method. Therefore, it is usually estimated using mathematical calculations from reliability research studies. Although there are several methods for calculating the SEM, the most popular method is as follows: SEM ¼ ðsample standard deviationÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1  reliability coefficient When one is interested in estimating the error associated with a score at a single point in time, coefficient alpha is usually the reliability coefficient of interest. One SEM is associated with the 68% confidence interval. To obtain higher confidence levels, the SEM can be multiplied by z-values associated with different confidence levels. For example, 1.65 is the z-value associated with the 90% confidence level and 1.96 is the z-value associated with the 95% confidence level. When considering

a patient’s true score, it must be realized that the measured value is only a representation of the true score and is subject to measurement error. The patient’s true score is most likely to lie within a defined confidence interval. The clinician’s certainty in making a statement on a patient’s score is described by a specified confidence level. The SEM is influenced by the both the reliability of the measurement (represented by the relevant reliability coefficient) and the variability of that population (reflected by the standard deviation). Although the DASH has been shown to have high reliability coefficients across a broad number of conditions, the standard deviation can be expected to vary between different conditions and even with conditions over time as the level of disability changes. The confidence around that SEM is described with a level of certainty that depends on probabilities. Z-scores allow us to account for these as related to a normal curve. The greater confidence we want to have we have defined the region in which the true values falls, the greater will be the area of the curve we must consider, the greater the Z-score, and thus, the ‘‘wider’’ the confidence interval. When faced with a lack of data indicating SEMs appropriate to our patient’s condition and level of disability, we extrapolate from more generic data. A SEMcross-sectional of 4.4 points has been reported for the DASH. We apply the subscript ‘‘cross-section’’ to distinguish the SEM for a patient’s score at a single point in time from the SEM derived from a change score. Multiplying the SEMcross-sectional of 4.4 points by the z-value of 1.65 associated with a 90% confidence level yields a value of 7.3 points. The interpretation is that at the time of assessment, there is a 90% chance that a patient’s true score is within 7.3 points of the measured score. For the vignette patient with a reported DASH score of 44, the true score is likely to fall between 39.5 and 48.5.

What Amount Indicates That the Patient Has Truly Changed? The method used most frequently for estimating whether a patient has truly changed applies the reliability coefficient obtained from a test–retest reliability study. In this case, the term SEMtest–retest is applied to distinguish this SEM from the one reported previously from a cross-sectional study. To obtain the minimal level of detectable change at a specified confidence (MDCCL), SEMtest–retest is multiplied by the z-value associated with the confidence level of interest and by the square root of two. For example, MDC at a 90% confidence level, designated MDC90, is obtained as follows: pffiffiffi MDC90 ¼ SEMtestretest 3 z-value 3 2

Estimates of MDC90 for the DASH vary, with 11 being a typical value. The interpretation of MDC90 is that 90% of stable patients are likely to display a difference on retest less than the value of MDC90, not that one is 90% certain that a patient has truly changed. Thus, for the vignette patient, a change of 11 or more DASH points is required to be reasonably certain that a true change has occurred.

What Amount Indicates That the Patient Has Changed an Important Amount? In addition to considering true change, clinicians are interested in determining whether a patient has changed an important amount. The term clinically important difference (CID) is often used to describe this quantity. Like beauty, what constitutes the CID may be in the eyes of the beholder. Clearly, many factors affect what is considered important, including cost and the risk and extent of an adverse event. Moreover, estimates of the magnitude of the CID may vary depending on the perspective of the person (e.g., patient, clinician) or group (e.g., society, payer) offering an opinion. Finally, there is evidence to suggest that the magnitude of a clinically important difference varies depending on a patient’s level of disability. What is important to appreciate at this point in time is that the magnitude of the CID is likely to vary depending on the circumstances, and that the investigative methods used to estimate this quantity are in their infancy. Despite this, a reasonable CID for the DASH is estimated as a change of approximately 15 points.31

What Amount of Change Is Consistent with the Patient’s Goal? Setting a long-term goal for a patient requires familiarity with the outcome measure of interest and what level of disability is consistent with reaching this goal. In this case, what level of disability on the DASH is consistent with return to work? Data obtained by Beaton on scores for working and working patients help us make a reasonable prognosis in this regard.14

INTERPRET THE FINDINGS FOR THIS PATIENT Using the data we collected from our literature search (Table 1), we are able to use our DASH score to define the patient’s current level of disability and the changes we would expect with our intervention (also based on evidence). We are able to use this evidence in prognosis and goal setting. Demonstrating our evidence-based practice approach when communicating April–June 2004 171

TABLE 1. Interpretation of DASH Scores Attribute Normative value (55 year old) 90% confidence in a score at an instance in time Minimal detectable change (MDC90) Clinically important difference

DASH Score 12 DASH points 67.3 DASH points 610 to 14 DASH points ;15 DASH points

DASH = Disability of the Arm, Shoulder, and Hand.

with health care payers can enhance our credibility and demonstrate clear criteria for deciding whether hand therapy intervention has made an important contribution to the patient’s disability and participation in life. An example of how this information might be incorporated is included in the conclusion to our vignette. Form letters with elements and references can easily be produced and then customized to the individual patient to enhance the practical aspects of applying evidence-based practice. Mr. X is a 55-year-old man with a partial tear of the right rotator cuff documented both by clinical examination and ultrasound. This pathology is associated with substantial impairment and disability. Mr. X has range of motion within normal limits for his age and when compared with his left side. His shoulder strength is decreased, and this is most evident in isometric shoulder external rotation where he is able to produce only 17 Nm on the right as compared with 27 Nm on the left. These strength levels are consistent with the loss of strength resulting from a compromised supraspinatus tendon.36 His upper extremity disability (DASH score = 44) is well below normal (DASH = 12) and is consistent with the level of disability experienced by other patients with rotator cuff pathology.12 Based on these physical findings and evidence from a Cochrane Collaboration systematic review37 suggesting that exercise and mobilizations are effective in rehabilitation of rotator cuff disease, the following treatment plan is recommended (specific details would be appended): 1. Mobilization of the shoulder to restore shoulder rotation movements 2. Strengthening program to restore the tensile strength and integrity to the supraspinatus tendon, but also to include general upper extremity reconditioning/endurance. This program will include adjunctive modalities as needed and progress activity tolerance once shortterm goals have been achieved as Mr. X works toward a goal of return to work. The short-term treatment goals are: 1. Strength will increase from 17 Nm to 22 Nm 2. Compliance and proficiency with exercise program within three sessions 172

JOURNAL OF HAND THERAPY

3. A detectable change in upper extremity disability as indicated by a DASH score no greater than 31 in three weeks. The following long-terms goals are expected (by 12 weeks): 1. Strength of 27 Nm on the right side 2. A DASH score indicating an important change in upper extremity disability (>15 points). Note that the 12-week DASH is expected to fall within the range of 20 to 25, which is consistent with scores obtained by other patients who were able to return to work. 3. Return to work at full duties.

REFERENCES 1. Muenzen PM, Kasch MC, Greenberg S, Fullenwider L, Taylor PA, Dimick MP. A new practice analysis of hand therapy. J Hand Ther. 2002;15:215–25. 2. Roth LP, Dimick MP, Kasch MC, Fullenwider L, Mullins P. Practice analysis of hand therapy. J Hand Ther. 1996;9:203–12. 3. Kasch MC, Greenberg S, Muenzen PM. Competencies in hand therapy. J Hand Ther. 2003;16:49–58. 4. Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality of life. Ann Intern Med. 1993;118:622–9. 5. Michlovitz SL, LaStayo PC, Alzner S, Watson E. Distal radius fractures: therapy practice patterns. J Hand Ther. 2001;14:249– 57. 6. MacDermid JC. Outcome evaluation in patients with elbow pathology: issues in instrument development and evaluation. J Hand Ther. 2001;14:105–14. 7. Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care. 1990;28:632–42. 8. Beaton DE, Bombardier C, Katz JN, Wright JG. A taxonomy for responsiveness. J Clin Epidemiol. 2001;54:1204–17. 9. Goldsmith CH, Boers M, Bombardier C, Tugwell P. Criteria for clinically important changes in outcomes: development, scoring and evaluation of rheumatoid arthritis patient and trial profiles. OMERACT Committee. J Rheumatol. 1993;20:561–5. 10. Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. Evidence-based Medicine. How to Practice and Teach EBM, 2nd ed. Toronto: Churchill Livingstone, 2000. 11. Skutek M, Zeichen J, Fremerey RW, Bosch U. [Outcome analysis after open reconstruction of rotator cuff ruptures. A comparative assessment of recent evaluation procedures.] Unfallchirurg. 2001;7:480–7. 12. Skutek M, Fremerey RW, Zeichen J, Bosch U. Outcome analysis following open rotator cuff repair. Early effectiveness validated using four different shoulder assessment scales. Arch Orthop Trauma Surg. 2000;120:432–6. 13. Robinson CM, Page RS. Severely impacted valgus proximal humeral fractures. Results of operative treatment. J Bone Joint Surg [Am]. 2003;85:1647–55. 14. Beaton DE, Katz JN, Fossel AG, Wright JG, Tarasuk V, Bombardier C. Measuring the whole or the parts? Validity, reliability, and responsiveness of the Disabilities of the Arm, Shoulder and Hand Outcome Measure in different regions of the upper extremity. J Hand Ther. 2001;14:128–46. 15. Gay RE, Amadio PC, Johnson JC. Comparative responsiveness of the disabilities of the arm, shoulder, and hand, the carpal tunnel questionnaire, and the SF-36 to clinical change after carpal tunnel release. J Hand Surg [Am]. 2003;28:250–4. 16. Germann G, Wind G, Harth A. [The DASH (Disability of Arm– Shoulder–Hand) Questionnaire—A New Instrument for Eval-

17.

18.

19.

20.

21.

22.

23.

24.

25.

uating Upper Extremity Treatment Outcome.] Handchir Mikrochir Plast Chir. 1999;31:149–52. Gummesson C, Atroshi I, Ekdahl C. The Disabilities of the Arm, Shoulder and Hand (DASH) outcome questionnaire: longitudinal construct validity and measuring self-rated health change after surgery. BMC Musculoskel Disord. 2003;4:11. Hudak PL, Amadio PC, Bombardier C. Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG). Am J Ind Med. 1996;29:602–8. MacDermid JC, Tottenham V. Responsiveness of the Disability of the Arm, Shoulder, and Hand (DASH) and Patient-Rated Wrist/Hand Evaluation (PRWHE) in evaluating change after hand therapy. J Hand Ther. 2004;17:18–23. Offenbacher M, Ewert T, Sangha O, Stucki G. Validation of a German version of the ÔDisabilities of Arm, Shoulder and HandÕ questionnaire (DASH-G). Z Rheumatol. 2003;62:168– 77. Padua R, Padua L, Ceccarelli E, et al. Italian version of the Disability of the Arm, Shoulder and Hand (DASH) questionnaire. Cross-cultural adaptation and validation. J Hand Surg [Br]. 2003;28:179–86. SooHoo NF, McDonald AP, Seiler JG III, McGillivary GR. Evaluation of the construct validity of the DASH questionnaire by correlation to the SF-36. J Hand Surg [Am]. 2002; 27:537–41. Veehof MM, Sleegers EJ, van Veldhoven NH, Schuurman AH, van Meeteren NL. Psychometric qualities of the Dutch language version of the Disabilities of the Arm, Shoulder, and Hand questionnaire (DASH-DLV). J Hand Ther. 2002;15:347–54. Rosales RS, Delgado EB, Diez de la Lastra-Bosch I. Evaluation of the Spanish version of the DASH and carpal tunnel syndrome health-related quality-of-life instruments: crosscultural adaptation process and reliability. J Hand Surg [Am]. 2002;27: 334–43. Dubert T, Voche P, Dumontier C, Dinh A. [The DASH questionnaire French translation of a trans-cultural adaptation.] Chir Main. 2001;20:294–302.

26. Atroshi I, Gummesson C, Andersson B, Dahlgren E, Johansson A. The disabilities of the arm, shoulder and hand (DASH) outcome questionnaire: reliability and validity of the Swedish version evaluated in 176 patients. Acta Orthop Scand. 2000;71:613–8. 27. Navsarikar A, Gladman DD, Husted JA, Cook RJ. Validity assessment of the Disabilities of Arm, Shoulder, and Hand questionnaire (DASH) for patients with psoriatic arthritis. J Rheumatol. 1999;26:2191–4. 28. Hunsaker FG, Cioffi DA, Amadio PC, Wright JG, Caughlin B. The American Academy of Orthopaedic Surgeons outcomes instruments: normative values from the general population. J Bone Joint Surg [Am]. 2002;84:208–15. 29. MacDermid JC, Faber KJ, Drosdowech D. Responsiveness of self-report measures following rotator cuff surgery. J Shoulder Elbow Surg. In press 2004. 30. Getahun T, MacDermid JC, Patterson SD. Concurrent validity of patient rating scales in assessment of outcome after rotator cuff repair. Journal of Musculoskeletal Research. 2000;4: 119–27. 31. Solway S, Beaton DE, McConnell S, Bombardier C. The Dash Outcome Measure User’s Manual, 2nd ed. Toronto: Institute for Work and Health, 2002. 32. American Academy of Orthopaedic Surgeons. American Academy of Orthopaedic Surgery. Available at: www.aaos. org/research/normstdy/main.cfm. 2003. Accessed Dec 20, 2003. 33. Messick S. Validity. Educational MeasurementPhoenix, AZ: American Council on Education, Oryx Press, 1993, p 13–104. 34. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8. 35. Kirkley A, Alvarez C, Griffin S. The development and evaluation of a disease-specific quality-of-life questionnaire for disorders of the rotator cuff: the Western Ontario Rotator Cuff Index. Clin J Sports Med. 2003;13:84–92. 36. MacDermid JC, Ramos J, Drosdowech D, Faber K, Patterson S. The impact of rotator cuff pathology on isometric and isokinetic strength, function and quality of life. J Shoulder Elbow Surg. In press 2004. 37. Green S, Buchbinder R, Hetrick S. Physiotherapy interventions for shoulder pain. Cochrane Database Syst Rev. 2003: CD004258.

April–June 2004 173