Journal of Clinical Epidemiology 55 (2002) 922–928
Comparison of the responsiveness of the Barthel Index and the Motor Component of the Functional Independence Measure in stroke The impact of using different methods for measuring responsiveness Dennis Wallacea, Pamela W. Duncanb,*, Sue Min Laic a Rho, Inc., 100 Eastown Drive, Chapel Hill, NC, USA Department of Health Services Administration, Brooks Center for Rehabilitation Studies at the University of Florida, Health Science Center, P.O. Box 100185, Gainesville, Florida 32610, USA; VA Rehabilitation Outcomes Research Center of Excellence, Malcolm Randall VA Medical Center, 1601 SW Archer Road, Gainesville, Florida 32608-1197, USA c Department of Preventive Medicine and Center on Aging, University of Kansas Medical Center, Kansas City, KS, USA Received 2 January 2000; received in revised form 3 January 2001; accepted 4 January 2002
b
Abstract Two disability measures frequently used to assess the effects of interventions on stroke recovery are the Barthel Index (BI) and the motor component of the Functional Independence Measure (FIM® Instrument). This study compared multiple measures of responsiveness of these instruments to stroke recovery between 1 and 3 months. Data on a 1- to 3-month change in the Instruments were obtained for 372 subjects who improved or maintained function on the modified Rankin Scale (MRS), using a subset of 459 eligible patients with confirmed stroke as defined by WHO criteria recruited from 12 participating hospitals in the Greater Kansas City area. Subjects were excluded because of death, early withdrawal from the study, missing MRS, or outcome data (57) decline on MRS (26), or inability to improve on MRS (4). Techniques used to assess responsiveness were: area under the ROC curve, Guyatt’s effect size, paired t-statistics, standardized response mean, Kazis effect size, and mixed model adjusted t-statistic. The FIM® Instrument and BI show little difference in responsiveness to change. The different responsiveness measures are generally consistent with this conclusion, with no measure clearly superior to the others. Large differences in the responsiveness measures were obtained within an instrument depending on the populations used (changers only or both changers and those who maintained function). Results also suggest responsiveness assessments are likely to be affected by time frame and phase of rehabilitation over which the responsiveness of a measure is determined. © 2002 Elsevier Science Inc. Keywords: Stroke; Responsiveness measures; Rehabilitation; Outcomes
1. Introduction Stroke is a major cause of mortality and morbidity in the United States, particularly among persons over 55 years of age. Acute stroke occurs in over 700,000 individuals each year, with over 80% of these persons likely to survive, many with residual neurologic difficulties. About 4,400,000 stroke survivors are alive today [1–4]. As acute care for stroke continues to improve, the number of individuals surviving stroke with residual deficits is likely to increase over the next decade. Older adults strongly value the ability to be independent in activities of daily living, and stroke survivors are usually deprived of this ability. The neurologic impairments resulting from stroke disable the patient in varying degrees with * Corresponding author. Tel.: 352-392-6507; fax: 352-392-9958. E-mail address:
[email protected] (P.W. Duncan).
respect to performing the essential of everyday life—basic activities of daily living (ADL). Furthermore, individuals are also limited in their ability relative to instrumental activities of daily living that represent the fundamental skills necessary to live independently in the community [5,6]. One major objective of stroke rehabilitation as well as a major focus of ongoing research on stroke recovery is increasing independence of stroke survivors in ADL. Furthermore, assessment of ADL outcomes in individuals who survive stroke is necessary for appropriate clinical management and evaluation of outcomes for quality management of rehabilitation services and for research. The Agency for Health Care Policy and Research Post-Stroke Rehabilitation panel recommended that clinicians use well-validated, standardized instruments to ensure reliable documentation progress over time in levels of disability and functional independence [7]. Two instruments recommended by the panel for
0895-4356/02/$ – see front matter © 2002 Elsevier Science Inc. All rights reserved. PII: S0895-4356(02)00 4 1 0 - 9
D. Wallace et al. / Journal of Clinical Epidemiology 55 (2002) 922–928
assessing disability/activities of daily living are the Barthel Index (BI) and the Motor Component of the Functional Independence Measure (FIM® Instrument). Both instruments have been demonstrated to be valid and reliable measures of functional outcomes in stroke patients [8–11]. Frequently, when the FIM® Instrument and BI are applied in clinical settings, longitudinal measures are obtained on the same subjects over time to assess patterns of recovery in achieving independence in ADL. Although reliability and validity are sufficient to ensure usefulness of instruments for defining cross-sectional differences among persons, Kirshner and Guyatt and Guyatt et al. [12,13] suggest that another property, responsiveness, is essential for instruments designed to measure longitudinal change over time. Responsiveness is defined as the ability of an instrument to detect clinically important changes over time, even if those changes are small [12,13]. Although health outcomes generally agree on the need to measure responsiveness, no general consensus has been reached on the best way to measure responsiveness, and multiple measures have been proposed [14–19]. Also, although the reliability and validity of the FIM® Instrument and BI are well established, less information is available on the responsiveness of these instruments. [8,11]. The objectives of this study are to assess the responsiveness of the BI and FIM® Instrument for evaluating recovery from stroke over the 1- to 3-month poststroke period, and to assess the impact of different methods for assessing responsiveness on instrument comparison. The data for the study were collected as a part of the Kansas City Stroke study, an epidemiologic study of a cohort of stroke survivors, who lived in the community and were independent in activities of daily living prior to their stroke.
2. Methods 2.1. Participants The participants in this study are 459 individuals who sustained an eligible stroke and were recruited for the Kansas City Stroke Study. Case ascertainment for the Kansas City Stroke Study started in August of 1995 and ended in September of 1999. The eligible study participants were recruited from any of 12 participating hospitals in the Greater Kansas City area. Eligible stroke patients were identified by (1) a review of daily admission records; (2) referrals from physicians, clinical nurse specialists, or therapists on medical, neurology, and rehabilitation units; and (3) review of discharge codes. To be accepted into this study, the subject had to have a confirmed eligible stroke as defined by WHO criteria. The stroke was confirmed by clinical assessment and/or by a CT/MRI scan. A stroke was defined according to the World Health Organization (WHO) criteria as “rapid onset and of vascular origin reflecting a focal disturbance of cerebral function, excluding isolated impairments of higher function and persisting longer than 24 hours” [20]. Trained nurses/ physical therapists reviewed medical records and inter-
923
viewed both patients and physicians to determine whether the patient was eligible and consented for enrollment. Subjects were excluded if they were: (1) less than 18 years of age; (2) stroke onset greater than 14 days; (3) stroke due to subarachnoid hemorrhage; (4) hepatic failure; (5) renal failure; (6) NYHA III/IV heart failure (i.e., patients with cardiac disease resulting in inability or marked limitation to carry on any physical activity without discomfort); (7) not expected to live 6 months; (8) lived in a nursing home prior to stroke; (9) unable to take care of own affairs prior to stroke; (10) lethargic, obtunded, or comatose; and (11) patient lived more than 70 miles from the participating hospital. The patients were evaluated using a variety of standardized assessments at enrollment and followed at 1, 3, and 6 months poststroke by a study nurse/physical therapist at home or at a chronic care facility. Each study nurse/physical therapist received at least 2 weeks of training in the administration of the measures. All study nurses and physical therapists received certification in administration of National Institutes of Health Stroke Scale (NIHSS) [21] in the FIM Instrument [22]. Assessments included in this study are baseline demographics, stroke severity measured by using the Orpington Prognostic Scale [23], the Modified Rankin Index (MRI) [25], the BI [25], and the FIM® Instrument [22]. The results presented here use only the 1- and 3-month outcomes. Only those 372 subjects having BI and FIM® Instrument measures at both 1 and 3 months and who did not decline in Rankin level between 1 and 3 months are included in the analyses. By excluding subjects whose Rankin level declined between 1 and 3 months allows us to more objectively demonstrate the responsiveness of the Barthel and FIM® Instrument in which meaningful changes are not compromised by both improvement and deterioration between two given time points. 2.2. Assessment instruments Stroke severity was measured with the Orpington Prognostic Scale (OPS) within 3 to 14 days of stroke [23]. The OPS is a weighted measure that screens for motor deficits, sensory loss, balance, and cognition. The OPS ranges for stroke severity are minor stroke—less than 3.2, moderate stroke—3.2 to 5.2 inclusive, and major stroke—greater than 5.2. The Modified Rankin Scale (MRS) [24] was used to characterize change in outcome associated with stroke recovery. The MRS is a six-point ordinal scale range from 0, no symptoms at all to 5, severe disability, as summarized in Table 1. The MRS for each patient was assigned by a research nurse or therapist after a comprehensive assessment of that patient using standardized measures. In this study the MRS served two purposes: as an eligibility criteria for inclusion in these analyses, and as a tool for defining clinically important differences. The BI and motor component of the Functional Independence measure (FIM® Instrument) were used as measures of disability characterized by limitations in independence in ADL. The BI is a 10-item instrument that measures a person’s level of functional independence in ADL [25]. Items
924
D. Wallace et al. / Journal of Clinical Epidemiology 55 (2002) 922–928
Table 1 Modified Rankin scale 0 1 2 3 4 5
No symptoms at all No significant disability despite symptoms; able to carry out all usual duties and activities Slight disability; unable to carry out all previous activities but able to look after own affairs without assistance Moderate disability requiring some help, but able to walk without assistance Moderate severe disability; unable to walk without assistance and unable to attend to own bodily needs without assistance Severe disability; bedridden, incontinent, and requiring constant nursing care and attention
carry different weights with two items rated on a two-point scale of 0 and 5, six items rated on a three-point scale of 0, 5, and 10, and two items rated on a four-point scale of 0, 5, 10, and 15. Item scores are summed to obtain a total score. Consequently, the scale ranges for 0 to 100, but each individual’s score must be a multiple of 5. For all items, higher scores are associated with a greater degree of independence. The FIM® Instrument is an 18-item instrument that measures a person’s disability in terms of burden of care. Each item is rated on a scale ranging from 1 (total assistance) to 7 (complete independence). Again, scores are summed to obtain a total motor score (13 items for a score ranging from 13 to 91) and total cognitive score (five items for a score ranging from 5 to 35). Only the motor score was used for this study. 2.3. Responsiveness measures Measures of responsiveness proposed in the literature fall into two major classes. The first class compares two groups— those who change on the underlying outcome of interest, and those who do not. The second class assesses change in a single population. We examined two methods that use two populations—effect size as defined by Guyatt et al. [16], and area under the Receiver Operator Characteristic (ROC) curve as defined by Deyo and Centor [15]. Single-population methods that we examined included the effect size measure proposed by Kazis et al. [17], the paired t-test, the standardized mean ratio proposed by Liang et al. [14], and t-test based on the linear mixed model proposed by Wallace et al. [19]. To define the groups needed for the two population methods, we used the results from the MRS. Subjects who improved by at least one level on the MRS between month 1 and month 3 were defined as changers, while those who did not change were defined as nonchangers, with subjects who worsened on the MRS between month 1 and month 3 excluded from the analysis. For the single-population methods, separate analyses were generated for all available subjects and for those subjects defined as changers. Each of the responsiveness measures is briefly summarized below. Deyo and Centor [15] have suggested health status outcome measures can be assessed in a manner analogous to diagnostic tests using receiver operator characteristic ROC
curves. Conceptually, magnitude of change in the outcome measure is related to the probability of correctly assigning a subject to a true dichotomous outcome of improving or not improving on the underlying health condition. For a change of any particular magnitude on the outcome measure (BI or FIM® Instrument in this study), subjects having a change of that magnitude or greater would be classified as changing on the underlying health status parameter and subjects having a lesser magnitude of change on the BI or FIM® Instrument would be classified not changing on the underlying health status parameter. Obviously, for any particular magnitude of cut point, the population will have true and false positives and true and false negatives. One measure that summarizes the performance of the diagnostic test in terms of sensitivity and specificity of the test is the area under the ROC curve. Deyo and Center have suggested that this summary measure can be used to compare the responsiveness of comparative outcome measures [15]. For this study, the area under ROC curve obtained using a logistic model comparing subjects who improved by one or more levels on the Rankin vs. those who exhibit no change on the Rankin between 1 and 3 months. In comparing multiple measures, the measure with the largest area under the ROC curve is considered to be most responsive. In developing a measure of responsiveness consistent with the definition outlined earlier, Guyatt et al. [16] suggested that an appropriate measure was the ratio of a minimum meaningful clinical difference in subjects who do change to the standard deviation of change in a stable population of individuals over the same time period. To estimate the magnitude of minimum clinically meaningful change for this study, we identified a level of change based on achieving specificity of 0.8 for predicting change of one level compared to no change in Rankin between 1 and 3 months. For the FIM® Instrument, the value was 11 while for the Barthel it was 16. The standard deviation in the denominator was based on subjects who were stable on the Rankin between 1 and 3 months. If multiple measures are examined in a single study, the measure with the largest ratio is considered to be most responsive. One of the simplest measures of the statistical strength of a change in a single population when observations are taken at two time points is the test statistic from the paired t-test. If multiple measures of changes are made on the same set of subjects over the same time frame, the measure that generates the largest test statistic is judged to be most responsive. For this study, we computed the paired t-statistic based on pre/postmeasures of subjects who exhibit clinical change. We generated separate analyses for all subjects and subjects who changed on the Rankin between 1 and 3 months. One limitation of this method is that it does not account well for the score variability (often improvement) that may occur in apparently stable subjects [26]. Liang et al. [14] proposed a measure called the Standardized Response Mean (SRM), which is a slight modification of the paired t-statistic. The SRM is computed as the ratio of
D. Wallace et al. / Journal of Clinical Epidemiology 55 (2002) 922–928
mean change from month 1 to month 3 to standard deviation of change over the same time period. For this study, we generated separate analyses for all subjects and for only subjects who exhibited a positive change on the Rankin between 1 and 3 months. Kaxis et al. [17] have proposed an effect size measure that differs slightly from the two-population effect size of Guyatt effect size measure and the Liang single-population SRM, but has many of the same characteristics. The effect size is computed as the ratio of mean change from month 1 to month 3 to standard deviation of the month 1 scores. Again, we generated separate analyses for all subjects and for subjects who changed positively on the Rankin between 1 and 3 months. A modification of the paired t-statistic is the t-statistic from an appropriate linear contrast from a linear mixed model [19]. For this study, we developed a linear mixed model for each outcome with time treated as a categorical variable, controlling for stroke severity and baseline level of the outcome of interest. The responsiveness index is generated as the test statistic for the linear contrast that generates the difference between 1 and 3 month outcome measures averaged across the three severity levels. In comparing two outcome measures, the square of the ratio of these two statistics is used to generate a relative efficiency index. This relative efficiency is a measure of the ratio of sample sizes using the two measures that would be needed to achieve comparable power in a longitudinal study. Again, we computed separate analyses based on all subjects and on subjects who changed positively on the Rankin between 1 and 3 months.
925
change controlling for stroke severity. The t-statistics from these contrasts were used as the final measure of responsiveness. The test statistic for the linear contrast was based on a pooled effect averaged across the three severity levels. 3. Results Comparison of responsiveness to change was based on a subset of 372 of the 459 subjects from the Kansas City Stroke Study Cohort. These subjects had FIM® Instrument and BI scores at both 1 and 3 months and either improved or remained constant in Rankin score over that time frame. Of the 87 subjects excluded from these analyses, 56 were excluded because they had missing values for the MRS at both months 1 and 3 (19), at month 1 only (1), or month 3 only (36); 26 were excluded because they declined on the MRS between months 1 and 3; four were excluded because they had an MRS value of 0 at month 1 and thus were unable to improve, and one was excluded because of missing BI and FIM® Instrument values at month 3. The subjects used in the analyses were primarily Caucasian (78.2%) and balanced between males and females, with 52.4% of the cohort being female. The majority of the subjects experienced mild or moderate strokes as defined by the Orpington Prognostic Scores with 38.7% having mild stroke, 50.8% having moderate stroke, and 10.5% having severe stroke. Subjects ranged in age from 42 to 103 with a mean of 69.7 years of age and a standard deviation of 11.6. The results shown in Table 2 comparing demographic and clinical characteristics of subjects who improved in Rankin level between months 1 and 3 and those
2.4. Statistical analyses Descriptive statistics (means for continuous measures and proportions for categorical measures) were used to characterize the study sample, with separate measures computed to subjects who changed on the Rankin between 1 and 3 months and those who did not. These descriptive statistics were used to compute the Kazis effect size measure and Liang Standardized Response Mean SRM. The month 1 to month 3 change in MRS was used to define subjects as either changers or nonchangers, with subjects who changed positively classified as changers and who were stable on the MRS classified as nonchangers. Separate logistic regression models were developed with change on the FIM® Instrument and change on the BI as the predictors of the dichotomous outcome of change vs. no change. These models were used to generate ROC curves with the area under the ROC curve obtained by the trapezoidal rule. Two logistic models based on baseline to 1-month Rankin change status as the outcome measure, and change in the FIM® Instrument and change in the BI separately as explanatory variables were also generated. The output from these models was used to estimate clinical meaningful difference as described above. Finally, two linear mixed models (one for the FIM® Instrument, and one for the BI) with the structure outlined in the description of the responsiveness measures used to assess
Table 2 Comparison of demographic, clinical and stroke characteristics of subjects who improve in Rankin score and maintain Rankin score between 1 and 3 months Rankin change status Parameter
Improve (N 154)
Maintain (N 218)
P-value for difference between groups
Age 68.4 (11.3) 70.7 (11.6) .065 Gender, male 78 (50.6%) 99 (45.4%) .32 Race, White 120 (77.9%) 171 (78.4%) .90 Marital status, married 114 (52.3%) 96 (62.3%) .055 Severity: .54 Minor 58 (37.5%) 86 (39.5%) Moderate 81 (52.6%) 108 (49.5%) Major 15 (9.7%) 24 (11.0%) One-month FIM instrument 69.6 (20.7) 67.8 (22.3) .43 One-month BI 72.5 (28.7) 71.3 (28.7) .71 Prior stroke 31 (20.1%) 50 (13.4%) .66 Prior TIA 27 (21.6%) 20 (13.0%) .035 High systolic blood pressure (155 mmHg) 77 (50.0%) 103 (47.2%) .31 High diastolic blood pressure (90 mmHg) 58 (37.7%) 70 (32.1%) .19 Diabetes mellitus 50 (32.5%) 82 (37.6%) .52 Congestive heart failure 10 (6.5%) 20 (13.3%) .10 Prior atrial fibrulation 12 (7.8%) 35 (16.1%) .052 Myocardial infarction 30 (19.5%) 39 (17.9%) .91 Current smoker 41 (26.6%) 42 (19.3%) .097
926
D. Wallace et al. / Journal of Clinical Epidemiology 55 (2002) 922–928
who remained constant suggest that the groups are similar in demographic characteristics, stroke severity, clinical attributes, and month 1 FIM® Instrument and BI levels. Computation of most of the responsiveness measures considered in this study is straightforward. The original method proposed by Guyatt, which is measured as the ratio of a minimal clinically important difference to the standard deviation in subjects who do not change clinically, is easy to understand conceptually. To date, however, health services researchers have not developed a consensus as to the best way to measure the “minimal clinically important difference” in either a generic or disease-specific framework. For this study we used an operational definition that established a minimum clinically important change as a change of one unit on the Rankin between months 1 and 3. To establish levels of minimal clinically important change on the BI and FIM® Instrument, we attempted to establish a level at which a person exhibiting that change or greater would have little probability of failing to have an MRS change of one unit. Consequently, we established a level using the ROC curve from a logistic regression model associated with a specificity of at least 0.8. For the BI, the change was 16 units and for the FIM® Instrument, the change was 11 units. Mean BI and FIM® Instrument scores and standard deviations in those scores at months 1 and 3 as well as scores for changes between months 1 and 3 are presented in Table 3 separately for those subjects who change on the MRS and those who do not. The mean scores at month 1 for both the FIM® Instrument and the BI are comparable for those subjects who improve on the MRS between months 1 and 3 and those who are stable. However, subjects who change on the MRS exhibit substantially more change on both the FIM® Instrument and the BI than do those subjects who are stable on the MRS, suggesting that both indices do have the ability to measure change in this stroke population. Consequently, comparing their sensitivity in measuring change is reasonable. Tables 4 and 5 present the responsiveness measures for the FIM® Instrument and BI for the six responsiveness measures examined in this study. Table 4 contains the results for the two methods that explicitly compare two populations—the ROC curve method of Deyo and Centor, and
Table 3 FIM Instrument and BI scores for stable and improving subjects based on Rankin
Measure
Time period
FIM Instrument 1 Month 3 Months Change BI 1 Month 3 Months Change
Subjects with Rankin change of Subjects without 1 or more levels Rankin change All subjects Mean
SD
Mean
SD
Mean SD
69.6 79.1 9.55 72.5 86.2 13.8
20.7 15.6 10.2 28.7 21.5 16.0
67.8 71.3 3.55 71.4 77.1 5.78
22.2 21.1 8.5 30.4 28.2 12.4
68.1 73.8 5.02 71.2 79.7 7.67
21.9 27.1 15.9 30.0 27.1 15.9
Table 4 Comparison of BI and FIM instrument responsiveness to change methods based on explicit comparison of two populations (one defined as changing and one defined as stable) Outcome Measure
BI
FIM Instrument
ROC curve Guyatt effect size
0.650 1.29
0.675 1.29
Guyatt’s effect size measure. Table 3 contains the results for the four methods based on a single population—the paired t-statistic, the Standardized Response Mean of Liang, Kazis’ Effect Size measure, and the Adjusted t-statistic from the Mixed Model. For these last models, separate measures are generated based on all subjects, and based only on subjects who changed one or more units on the MRS between months 1 and 3. although the magnitudes of the different effect size measures are difficult to interpret directly, all are unitless numbers with larger magnitudes for a given responsiveness measure indicating more responsiveness to change. Generally, the differences in the FIM® Instrument and BI are relatively small. For the two population measures, the ROC curve indicates a slight difference favoring the FIM® Instrument, and the Guyatt effect size measure indicates that the two are equivalent. For the single population measures using all subjects, all measures show the BI as being more responsive by a small amount. For those measures using only subjects who change by one level or more on the Rankin, all measures show the FIM® Instrument to be slightly more sensitive.
4. Discussion Reliable measurement tools are needed to help researchers assess new stroke treatments and clinicians to evaluate patterns of stroke recovery in the treatment setting. Guyatt has argued that scales that are used in an evaluative sense to measure progress of patients over time must also exhibit a property that he called responsiveness or sensitivity to change. A number of articles have addressed the general issue of responsiveness or sensitivity to change [1,27–31], yet
Table 5 Comparison of BI and FIM instrument responsiveness to change methods based on a single population (changing) Responsiveness measure
Subjects
BI
FIM instrument
Paired t-test
All Changers All Changers All Changers All Changers
12.1 10.7 0.63 0.86 0.31 0.48 10.9 10.4
12.0 11.6 0.62 0.94 0.28 0.46 10.6 11.0
Liang standardize response mean Kazis effect size Mixed model adjusted t-statistic
D. Wallace et al. / Journal of Clinical Epidemiology 55 (2002) 922–928
no consensus has been reached on a standard measure of responsiveness. One objective of this study was to compare the results for five measures that have been widely used in the literature and a method recently used by Wallace et al. [19] to compare responsiveness for measures of functional performance after stroke. Our results indicate that the different responsiveness measures did not agree completely in their responsiveness ranking of the BI and FIM® Instrument. In general, though, both outcome scales were able to demonstrate change based on the different responsiveness measures. Furthermore, differences in the magnitudes of the responsiveness measure were relatively small, suggesting that the BI and the FIM® Instrument motor scale differ little in their responsiveness to recovery in stroke patients between months 1 and 3. These results generally agree with those of Granger et al. [22], who found the FIM® Instrument to be sensitive to change in stroke measures and van der Putten et al. [11], who found little difference in responsiveness in the BI and FIM® Instrument motor scale. Although the magnitudes of the responsiveness measures for the BI obtained in this study appear to depart somewhat from those found by von Bennekom et al. [30] in an earlier study of stroke patients, one substantive difference in the two studies was in length of period of evaluation. The study by von Bennekom et al. [30], examined an early response period (2 weeks) compared to the 1- to 3-month response period in this study. The ROC measure (0.65 vs. 0.59) and Guyatt effect size (1.29 vs. 1.36) were generally similar in the two studies. However, the Kazis effect size measure (0.48 vs. 1.49) and the Liang effect size measure (0.86 vs. 1.77) were substantially larger in the Bennekom study and the Paired t-test (10.6 vs. 9.1) was larger in this study. As suggested by Bennekom et al. these differences highlight the difficulty in assessing responsiveness measures. They also suggest that the responsiveness of an instrument may be related to the timing of the measurements and phase of rehabilitation. One of the primary uses of responsiveness measures for stroke rehabilitation researchers is in the design of clinical trials for measuring treatment effectiveness. Typically, the magnitude of most responsiveness measures is related to the number of subjects needed in a clinical trial to achieve a particular power to detect differences in change across time between two treatment groups. Measures with greater responsiveness indices provide greater study power, thereby allowing a study to be completed with fewer subjects. One important result of this study is the comparison of responsiveness measures in different groups of subjects. For either the FIM® Instrument or the BI, the results in Table 5 indicate that substantially different conclusions about the power to detect differences in change would have been obtained for the different subject pools used in this study. Typically, substantially fewer subjects would have been used if only the subjects who changed on the MRS were included in the analysis. In general, effect sizes are larger for the subjects who exhibit change than for all subjects combined, with the
927
differences in the t-tests accounting for the substantially larger number of subjects when the changers and stable subjects are combined. If the incorrect group were selected for assessing which instrument to use and for computing sample sizes for a study, the study could be substantially overor underpowered. Consequently, these results highlight the importance of choosing an appropriate comparison group when using responsiveness results for study design. One limitation of this study is that analyses focused only on the responsiveness of the measures to improvement in stroke patients rather than to either improvement or decline. The number of subjects who declined in this study (n 26) was insufficient to develop reliable estimates of the responsiveness to decline, but responsiveness of an instrument to decline is an important consideration that should be assessed in future studies. The findings of this study may also be limited by the decision to define the minimally important clinical change as a change of one unit on the MRS. Some stroke researchers may feel that smaller changes are clinically important or that this definition may fail to capture important, and possibly large changes within an MRS level (e.g., from the bottom to top of MRS level 3). This concern highlights the difficulty in defining a clinically important difference in the absence of a gold standard for recovery from stroke, and suggests that further development of such standard by the stroke rehabilitation community would be valuable. In conclusion, this study suggests that the FIM® Instrument and BI have similar responsiveness to change in a group of patients recovering from stroke between 1 and 3 months poststroke. BI is reasonably easy, and requires lesser amount of time to be administered while the FIM requires training and certification, and may take slightly longer time for administration. The different responsiveness indices are generally consistent with this conclusion; small differences were found in the different responsiveness measures, but this study provided little evidence that would lead to selection of one of the measures as a “best measure.” More importantly, larger differences in the responsiveness measures were obtained within an instrument, depending on the particular population used in the analysis. For computing a responsiveness measure consistent with the original Guyatt definition, a population based on persons who truly exhibit clinical change is appropriate. However, when using responsiveness as a measure of power for designing clinical trials, the population used to compute responsiveness should resemble the population to be studied during the trial.
Acknowledgments This study was funded by the Department of Veterans Affairs Rehabilitation Research and Development and the University of Kansas Claude D. Pepper Older Americans Independence Center funded by the National Institute of Health (P60 AG 14635-02). Participating facilities in the Kansas City area included: Baptist Medical Center, Depart-
928
D. Wallace et al. / Journal of Clinical Epidemiology 55 (2002) 922–928
ment of Veteran Affairs Medical Centers in Kansas City and Leavenworth, Liberty Hospital, Medical Center of Independence, Mid-American Rehabilitation Hospital, Rehabilitation Institute, Research Medical Center, St. Luke’s Hospital, St. Joseph Health Center, Trinity Lutheran Hospital, and University of Kansas Medical Center.
[14] [15]
[16] [17]
References [1] Brott T, Bogousslavsky J. Treatment of acute ischemic stroke. N Engl J Med 2000;343:710–722. [2] Thorvaldsen P, Kuulasmaa K, Rajakangas AM, Rastenyte D, Sari C, Wilhelmsen L. Stroke trends in the WHO MONICA project. Stroke 1997;28:500–6. [3] American Heart Association. Heart and stroke facts. Dallas: American Heart Association; 2000. [4] Broderick J, Brott T, Kothari R, Miller R, Khoury J, Pancioli A, Gebel J, Mills D, Minneci L, Shukla R. The Greater Cincinnati/Northern Kentucky stroke study: preliminary first-ever and total incidence rates of stroke among blacks. Stroke 1998;29:415–21. [5] Duncan P. Clinical and measurement issues in selecting stroke outcome measures in clinical trials. In Goldstein L, editor. Restorative neurology. Advances in the pharmacotherapy of recovery after stroke. Mt. Kisco, NY: Futura Publishing; 1998. [6] Pedersen PM, Jorgensen HS, Nakayama H, Raaschou HO, Olsen TS. Comprehensive assessment of activities of daily living in stroke. The Copenhagen Stroke Study. Arch Phys Med Rehabil 1997;78:161–5. [7] Greshan GE, Duncan PW, Stason WB, Post-Stroke Rehabilitation Guideline Panel. Post stroke clinical guideline No. 16. (AHCPR Publication No. 95–0662). Rockville, MD: U.S. Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research; 1995. [8] Kidd D, Stewart G, Baldry J, Johnson J, Rossiter D, Petruckevitch A, Thompson AJ. The functional independence measure: a comparative validity and reliability study. Disabil Rehabil 1995;17:10–4. [9] Shah S, Vanclay F, Cooper B. Improving the sensitivity of the Barthel Index for stroke rehabilitation. J Clin Epidemiol 1989;42:703–9. [10] Duncan PW, Lai SM, van Culin V, Huang L, Clausen D, Wallace D. Development of a comprehensive assessment toolbox for stroke. Clin Geriatr Med 1999;15:885–915. [11] van der Putten JJMF, Hobart JC, Freeman JA, Thompson AJ. Measuring change in disability after inpatient rehabilitation: comparison of the responsiveness of the Barthel Index and the Functional Independence Measure. J Neurol Neurosurg Psychiatry 1999;66:480–4. [12] Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis 1985;38:27–36. [13] Guyatt GH, Deyo RA, Charlson M, Levine MN, Mitchell A. Respon-
[18] [19] [20]
[21]
[22]
[23] [24]
[25] [26] [27]
[28]
[29]
[30]
[31]
siveness and validity in health status measurement: a clarification. J Clin Epidemiol 1989;42:403–8. Liang MH. Evaluating measurement responsiveness. J Rheumatol 1995;22:1191–2. Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chronic Dis 1986;39:897–906. Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 1987;40:171–8. Kazis LE, Anderson JJ, Meenan RF. Effects sizes for interpreting changes in health status. Med Care 1989;27:S178–89. Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol 1997;50:239–46. Wallace D, Duncan PW, Lai SM. Are the FIM Instrument motor scale and Barthel Index responsive to stroke recovery? Stroke 2000;31:301. World Health Organization. 1993. Proposal for the multi national monitoring and determinants of cerebrovascular disease project WHO/MNC82 Rev. 1. Geneva: WHO. Brott T, Adams HP, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg V. Measurement of acute cerebral infarction: a clinical examination scale. Stroke 1989;20:864–70. Granger CV, Hamilton BB, Keith RA, Zielizuy M, Sherwin FS. Advances in functional assessment for medical rehabilitation. Top Rehabil 1986;1:59–74. Kalra L, Crome P. The role of prognostic scores in targeting stroke rehabilitation in elderly patients. J Am Geriatr Soc 1993;41:396–400. van Swieten JC, Kaiudstaal PF, Visser MC, Schouten HJA, van Gigin J. Interobserver agreement for assessment of handicap in stroke patients. Stroke 1988;19:604–7. Mahoney FI, Barthel DW. Functional evaluation: The Barthel Index. Md State Med J. 1965;14:61–5. Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Control Clin Trials 1991;12:142S–58S. Hagen KB, Smedstat LM, Uhlig T, Kvien TK. The responsiveness of health status measures in patients with rheumatoid arthritis: comparison of disease-specific and generic instruments. J Rheumatol 1999; 26:1474–80. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol 1997;50:79. de Bruin AF, Diederiks JPM, de Witte LP, Stevens FCJ, Philipsen H. Assessing the responsiveness of a functional status measure: the Sickness Impact Profile versus the SIP68. J Clin Epidemiol 1997;50:529–40. van Bennekom CAM, Jelles F, Lankhorst GJ, Bouter LM. Responsiveness of the Rehabilitation Activities Profile and the Barthel Index. J Clin Epidemiol 1996;49:39–44. Katz JN, Larson MG, Phillips CB, Fosssel AH, Liang MH. Comparative measurement sensitivity of short and longer health status instruments. Med Care 1992;30:917–25.