Predicting patient scores between the functional independence measure and the minimum data set: Development and performance of a fim-mds “crosswalk”

Predicting patient scores between the functional independence measure and the minimum data set: Development and performance of a fim-mds “crosswalk”

48 Predicting Patient Scores Between the Functional Independence Measure and the Minimum Data Set: Development and Performance of a FIM-MDS “Crosswal...

1017KB Sizes 0 Downloads 47 Views

48

Predicting Patient Scores Between the Functional Independence Measure and the Minimum Data Set: Development and Performance of a FIM-MDS “Crosswalk” Brent C. Williams, MD, Yi Li, MS, Brad

E. Fries, PhD, Reg L. Warren, PhD

ABSTRACT. Williams BC, Li Y, Fries BE, Warren RL. Predicting patient scoresbetween the Functional Independence Measure and the Minimum Data Set: development and performance of a FIM-MDS “crosswalk.” Arch Phys Med Rehabil 1997;78:48-54. Objective: The functional status of rehabilitation patients is often measured using the Functional Independence Measure (FIM) in acute rehabilitation settings or the Minimum Data Set (MDS) in nursing homes. Because the relationship between the two instruments is unknown, preventing comparison of rehabilitation patients in different types of settings, a translation formula (“crosswalk’ ‘) between items and subscalesfrom the FIM and the MDS was developed and tested. Design and Outcome Measures: Using definitions recommended by an expert panel, MDS items were chosen and rescaled (termed “Pseudo-FIM(E)” items) to correspond to FIM items. The empiric relationships between Pseudo-FIh4(E) and FIM scores were then measured using paired FIM-MDS assessments. Setting and Patients: 173 rehabilitation patients admitted to six nursing homes. Results: Pseudo-FIM(E) items could be defined for 12 of the 18 FIM items (8 motor and 4 cognitive items). Mean FIM and Pseudo-FIM(E) scores were not significantly different @ > .30) for 5 of the 12 items. Mean scoresfor the remaining 7 items and for motor and cognitive subscaleswere similar but statistically significantly different (p <.05). Intraclass correlation coefficients between the FIM and Pseudo-FIM(E) motor and cognitive subscaleswere both Xl. Conclusions: FIM and MDS items can be used to predict item and subscale scores between the two instruments with reasonable accuracy. This capability will enhance efforts to compare case-mix between acute rehabilitation and nursing home rehabilitation patients, thus making feasible comparisons of the effectiveness (degree of improvement among similar patients) and efficiency (cost of care to obtain a given degree of improvement) of rehabilitation care in different types of settings. 0 1997 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation From the Department of Internal Medicine, University of Michigan Medical Center (Dr. Williams); the Geriatric Research, Education, and Clinical Center, Ann Arbor Veterans Affairs Medical Center (Drs. Williams, Fries): the Department of Biostatistics (Mr. Li) and Institute of Gerontology, School of Public He&b (Dr. Fries), University of Michigan, Ann Arbor; and The Polaris Group, Boston, MA (Dr. Warren). Submitted for publication February 9, 1996. Accepted in revised form July 20, 1996. Supported in part by a grant from NovaCare, Inc. Reprint requests to Brent C. Williams, MD, University of Michigan Medical Center, 3 116 Taubman Center, 1500 East Medical Center Drive, Ann Arbor, MI 48109.0376. 0 1997 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation 0003-9993/97/7801-3888$3.00/O

Arch

Phys Med

Rehabil

Vol78,

January

1997

I

N RECENT YEARS, the provision of rehabilitation services in settings other than acute rehabilitation facilities has grown rapidly, as have concerns over rapidly rising health care costs. As a result, the ability to measure and compare the clinical characteristics of individuals and populations of patients receiving rehabilitation services will become increasingly important. For example, clinical uncertainty regarding the potential benefits of rehabilitation servicesfor individual or groups of patients creates incentives to prescribe and provide more servicesin the hope of maximal attainable improvement for patients. At the same time, third party payors are likely increasingly to require demonstrated clinical benefit at the lowest cost. This tension emphasizes the need to measure objectively the rehabilitationrelevant clinical characteristics of patients and to document clinical improvement (or lack thereof) with treatment, to enable the quantification of both the effectiveness and efficiency of rehabilitation services.Also, cost containment pressures and the persistence of cost-based reimbursement for most rehabilitation services have spawned the rapid growth of rehabilitation services in settings lesscostly than acute hospitals, such as nursing homes. To compare the effectiveness and efficiency of rehabilitation services in alternative settings, methods of assessingpatients’ characteristics across settings are needed. In acute rehabilitation facilities (hospital-based and freestanding), the Functional Independence Measure (FIM) is the most widely used instrument for determining patients’ rehabilitationrelevant clinical status.’ The FIM is an IS-item assessmentinstrument that measures patients’ capacities in activities of daily living (ADL; eg, eating, bathing, dressing, toileting), continence, and cognitive functions such as communication and problem solving. Facilities that demonstrate the ability to assess patients reliably using standardized examples are qualified to report FIM results to the Uniform Data System (UDS), which collects assessmentinformation from more than 400 facilities nationwide.’ Several studies have demonstrated the interrater reliability of the FIM.’ Based on analytic methods similar to factor analysis, the FIM items are considered to represent two dimensions, labeled motor (ADL, continence, indoor mobility, climbing stairs) and cognitive (memory, comprehension, expression, problem solving, social interaction) domains.3 In nursing homes in the United States, however, there has been different progress in measurement. Since 1990, virtually all US nursing homes have been required by federal law to implement an extensive, uniform assessmentand care planning process for all patients-the Resident Assessment Instrument (RAI). The core of the RAI is the Minimum Data Set (MDS), an instrument with more than 300 items that includes information on ADL, continence, communication, sensory function, cognition, behavioral problems, and other areas. The high interrater reliability of the MDS when used by trained assessors has been demonstrated.4 From combinations of MDS items, scalesfor ADL’ and cognition6 have been developed and validated. With the recent and projected rapid growth of nursing home rehabilitation service$ increasing numbers of rehabilitation patients are being assessedroutinely using the MDS. Few nursing

PREDICTING

FIM AND

homes that provide rehabilitation services have used the FIM; widespread use of the FIM among nursing home rehabilitation patients may be challenged because of the substantial and comprehensive assessment efforts already being carried out through the MDS. The purpose of our study was to develop a “crosswalk” between items and scales derived from FIM and MDS assessments for nursing home rehabilitation patients. As defined here, a crosswalk is a translation formula that predicts scores for items or scales from one instrument using information for items or scales from the other instrument. There are a variety of useful applications of a crosswalk between the FIM and the MDS for individual and/or groups of rehabilitation patients. For example, the long-term progress of patients transferred from acute rehabilitation facilities to nursing homes could be measured by comparing MDS scores obtained in the nursing home with FIM scores obtained in the acute rehabilitation facility. A crosswalk would also allow the comparison of groups of rehabilitation patients in acute rehabilitation and nursing home settings. Thus, the baseline characteristics, amount of services received, and response to therapy could be compared between rehabilitation patients in acute rehabilitation facilities and nursing homes. This type of information will prove useful to payors, patients, and clinicians in defining settings and care plans that maximize the effectiveness (functional improvement) and efficiency (minimal resource utilization) of rehabilitation services.

MDS

SCORES,

Table

1: Definitionally

Comparable

FIM Item (7 Levels1

Eating Transfers: bad, chair, wheelchair Transfers: tub or shower Transfers: toilet Toileting Bathing Dressing-Upper body Dressing-Lower body Grooming Walk/wheelchair Stairs Bladder management Bowel management Memory

Expression Comprehension Problem solving Social

interaction

METHODS A survey of the definitional comparability of selected FIM and MDS items was completed by seven experts in rehabilitation assessment. Expert panel members were nationally recognized academic researchers in rehabilitation assessment (see Acknowledgment). Individuals were selected based on successful research and publication records in rehabilitation assessment and/or on the recommendations of other expert panel members. All experts had experience with at least one of the assessment instruments in the study. All individuals who were contacted agreed to participate. These experts were asked which items (or groups of items) from the MDS were most comparable to each of the 1X FIM items. Respondents were then asked to determine, for specific pairs of analogous MDS and FIM items (eating, bathing, transferring, toileting, dressing, grooming, bowel/bladder incontinence, comprehension, expression, problem solving) which levels of corresponding items represented plausibly similar degrees of functional limitations. For example, the survey asked: “Indicate for each performance level of the item from one instrument the level(s) of performance for the other instrument that could describe ANY patients whose functional status is described by the other instrument, INCLUDING THE MOST AND THE LEAST LIMITED INDIVIDUALS.” This last phase was important because the dimensions of measurements and numbers of response levels are different between analogous items from the two instruments. For example, while most FIM items assess the percent effort provided by the patient to accomplish a task, many analogous items from the MDS measure the number of times during the past 7 days a patient required a given level of assistance to perform a task. All FIM items have seven response levels, whereas the number of levels for MDS items ranges from two to five. Table 1 lists the 18 FIM items and potentially analogous MDS items, and the number of levels for each MDS item. For the FIM items “Memory” and “Social Interaction,’ ’ survey respondents were asked to assess the degree to which each of several potentially related MDS items was definitionally included in the FIM item (not, possibly, probably, or definitely included; not sure). To test the relationships proposed by the panel, simultaneous

49

Williams

FIM and MDS Items No. Levels for MDS ItemlsI

MDS Itemfd

Eating Transferring

5 5

(Included Toileting Bathing Dressing

in definition

(Including

in dressing)

of toileting) 5 5 5

Personal Hygiene Locomotion

5 5

Bladder

continence

4

Bowel continence Short-term memory Long-term memory Recognize staff names/faces Location of own room Making self understood Ability to understand others Cognitive skills for daily decision making At ease interacting with others Pursues involvement in life of facility Coveruopen conflict with and/or repeated criticism of staff Failure to eat or take medications, withdrawn from self-care or leisure activities

4 2 2 2 2 4 4 4 2 2 2 2

FIM and MDS assessments were gathered on a sample of nursing home rehabilitation patients. Six nursing homes that provide substantial amounts of rehabilitation services were enrolled for the data collection. Within 72 hours of admission to rehabilitation, the FIM was administered by therapists and nurses, with individual items assessed by the appropriate type of health professional. For example, in most cases a nurse evaluated patient function in bowel and bladder management; and physical, occupational, and speech therapists evaluated the remaining FIM items. The MDS was administered by a nurse for each patient, at admission to the nursing home and at specified intervals thereafter. Paired FIM and MDS assessments were included in the study for all patients for whom the two types of assessment occurred within (2) 2 weeks. In this fashion, a total of 173 paired assessments were collected between September, 1994, and March, 1995. The study was approved by our University Institutional Review Board. Raters in the facilities were trained and evaluated on the use of the FIM, according to the same qualification procedure for participation in the Uniform Data System (Uniform Data System for Medical Rehabilitation FIM Credentialing Examination, 1994).’ In brief, training included a minimum of 4 hours of classroom training including video tape review. Evaluation consisted of comparing assessors’ FIM ratings for written patient scenarios with previously determined responses. Eighty percent point-to-point agreement was required for credentialing. Each facility then maintained 80% credentialing rate across clinicians. Although no measurement of reliability was taken for the MDS, the interrater reliability of the MDS in research and nonresearch settings has been documented.’ Specifically, the mean (range) Spearman Brown intraclass correlation coefficient for the MDS items used in our study was found to be .82

(..57-.98). Arch

Phys

Med Rehabil

Vol78,

January

1997

50

PREDICTING

FIM

AND

Comparisons of the responses for corresponding FIM and MDS items and scalesproceeded systematically. First, the level of correspondence of individual FIM and MDS items was examined. For the FIM items for which there was a single analogous MDS item, the number of patients for whom the FIM and MDS scoresfell within a plausible range was determined. The plausible range included all levels considered reasonable by at least two of the seven expert panel respondents. For the two FIM items that included information from more than one MDS item (memory and social interaction), crosstabulations of the responses for the FIM item and each potentially corresponding MDS item, and the sum of the corresponding MDS items, were examined. Because the FIM uses 7 levels for all items, we first resealed the levels of MDS items to a range from 1 to 7. MDS level values were resealed by two alternative methods. In the first method, levels were defined as the mean of the plausible FIM levels provided by the expert panel. For example, if the expert panel suggested that level 3 of a 5-level MDS item (eg, limited assistancein eating) plausibly corresponded to FIM levels 3 or 4 (moderate or minimum assistance),then the MDS level 3 was defined as 3.5. Expert panel opinion was defined as agreement between two or more panel members. On virtually every item, however, the majority of panel members agreed on level assignments. For the FIM memory item, the sum of the corresponding MDS items (all dichotomous) was employed; level values for this item were assigned to result in evenly distributed levels ranging from 1 to 7. Values for MDS items resealed using this method are shown in the Appendix. In the second method of resealing MDS items, level values for MDS items were defined as the mean of the observed FIM levels for each MDS level, using the first 76 paired assessments as a development data set. This empirically derived scale was then applied to the last 97 paired assessments(validation set), and the degree of congruity was determined. By definition, this resulted in closer relationships between FIM and MDS scores than for comparisons based on expert panel recommendations. The resealed MDS items based on the expert panel recommendations and those based on observed relationships in development data set are referred to as Pseudo-FIM(E) and PseudoFIM(0) items, respectively. The mean scores for individual FIM and analogous PseudoFIM items among all patients were compared using Wilcoxon rank sum tests, since the distributions of several items were highly skewed. Items from each instrument were grouped into motor (6 ADL and two continence items) and cognitive (memory, expression, comprehension, and problem solving) subsets, consistent with previous research demonstrating these two dimensions among FIM items.” A third subscale,termed the ADL subscale, summed values for the six ADL items only (eating, transferring, toileting, bathing, dressing, grooming) to eliminate disparities arising from definitional incongruities in the continence items. The validity of this approach to subscaledevelopment was measured by determining Cronbach alpha internal consistency-reliability coefficients for each subscalefor the two instruments. In exploratory settings, Cronbach alpha above 0.7 is acceptable; where measurements of individuals are of interest, alpha reliability should be above 0.9.” Scores for items from the subscaleswere summed to define “subscale scores.” Comparison of FIM and Pseudo-FIM subscale scores among patients was first carried out using t tests. Where the distributions of subscale scores were skewed, findings from t tests were verified (in every case) with Wilcoxon rank sum (nonparametric) tests. The comparability of subscale scores across the entire range of scores was examined using Spearman Brown intraclass correlation coefficients.‘2”3 The Arch

Phys

Med

Rehabil

Vol78,

January

1997

MDS

SCORES,

Williams

Spearman Brown intraclass correlation coefficient is generally used to determine the reliability of ordinal measures and provides a more conservative estimate of correlation than either simple correlation or percent agreement. An intraclass correlation of 0.4 or higher is interpreted as adequate reliability; a value of .7 or higher is considered excellent reliability.” The ability to predict subscalescores among individuals was measured by determining the percent of variance explained by alternative subscale scores, using ordinary least squares (OLS) linear regression. Statistical analyses were carried out using SAS/PC.14 RESULTS The mean age (SD) of the 173 patients in the total study population was 80.6 (7.2) years. Sixty-eight percent were women. Rehabilitation diagnoses were varied, with 25% of patients having stroke, and 16% having hip fracture as the primary rehabilitation diagnosis. The remaining patients had a variety of diagnoses, including cardiac diseases,amputations, and gastrointestinal diseases.The median (10th to 90th percentile) time between the FIM and MDS assessmentswas 9 (1 to 12) days. Based on responses from the expert panel and item definitions, we identified the FIM items for which definitionally similar MDS items exist. For 10 FIM items, a single MDS item closely matched the definition of the corresponding FIM item (table 1): eating, transferring bed to chair, toileting, bathing, grooming, bladder incontinence, bowel incontinence, problem solving, expression, and comprehension. For three FIM itemsclimbing stairs, transfer to and from the tub/shower, and transfer to and from the toilet-no corresponding MDS items were judged to exist. Five of the seven experts believed that the FIM item walk/wheelchair and the analogous MDS item locomotion were measured along conceptually different dimensions (number of feet travelled at a given level of assistance or percent patient effort for the FIM versus the number of episodes of assistance during the last 7 days for the MDS items) and so could not meaningfully be compared. Therefore, walking/locomotion was excluded from further analyses. A composite FIMbased dressing measure was defined as the more dependent of the two FIM dressing items (upper and lower body dressing) to emulate the single MDS dressing item. In the FIM, the items for bladder and bowel incontinence are defined as the more dependent of two variables-the level of human assistancerequired for management (including with assistive devices) or the frequency of incontinence-whereas the MDS items refer only to the frequency of incontinence, with the level of human assistance required for management of assistive devices included in toileting. Therefore, patients whose MDS assessmentindicated the presence of bowel or bladder devices but who did not require human assistance to maintain continence were reclassified as dependent. According to the expert panel, the definitions of two FIM items-memory and social interaction-“probably” or “definitely” included information from several dichotomous MDS items (table 1). The FIM variable social interaction is defined as “skills related to getting along and participating with others in therapeutic and social situations, (and) represents how one deals with one’s own needs together with the needs of others.“* Patients are rated along a 7-point scale as to the type and percentage of time they are engaged in appropriate interaction. The expert panel rated four MDS items out of a possible 13 as probably or definitely included in the definition of the FIM item social interaction (table 1). These items were all dichotomous and had extremely low (<5%) prevalences in the study population, whereas 65 patients (38%) had FIM scores indicating the need for some human assistancein social interaction. Therefore, no

PREDICTING

Table

2: Agreement Between Pseudo-FIM(E)

FIM

AND

MDS

SCORES,

Individual FIM and MDS-Based Items (n = 173)

51

Williams

Table

3: Comparison

of FIM and Pseudo-FIM(E)

Scores

Mean (SD) Score Generic

Item

Description Eating Transferring Toileting Bathing Dressing+ Grooming

% Observations

Plausible Range* 74.1 51.4 54.9 48.1 58.4 42.8

in

% Observations

Item Bladder Bowel Memory Expression Comprehension Problem solving

in

Plausible Range 51.4 57.8 NAt 68.4 72.4 53.2

* Plausible range defined by expert panel (see text for details). t Measure of plausible crosswalk ranges was not possible because the MDS-based item for memory was a composite of three MDS variables. t FIM dressing defined as more dependent of two items dressing upper body and dressing lower body.

MDS item or combination of items was found to adequately represent the FIM variable social interaction. This item was therefore dropped from further analyses. Thus, there were a total of 13 of the 18 FIM items for which an analogous MDS item was defined. The two FIM items for dressing (upper and lower body) were combined to create a single dressing item by taking the more dependent of the two, resulting in 12 FIM-based items with analogous MDS-based counterparts. A single MDS item provided the most analogous item for all FIM items except memory; for this item, information from 4 MDS items was combined to form an analogous item. Among the 10 FIM items (including dressing as a single item) for which a single comparable MDS item was identified, from 43% to 70% of patients had paired FIM and MDS items within the plausible range as defined by the expert panel (table 2). The lowest levels of agreement were for grooming (43%), bathing (48%), and toileting (55%); the highest levels of agreement were for comprehension (72%), expression (68%), and eating (74%). For the most part, patients who were classified as very independent or very dependent were classified similarly in the two instruments. In general, crosswalk values defined as implausible by the expert panel occurred for middle levels of limitations. For the midrange of dependency levels, patterns of mismatched crosswalk values could be classified into three general types. For eating, grooming, and dressing, mismatched patients were most often classified as more dependent on the MDS than on the FIM. For toileting, bladder incontinence, bowel incontinence, expression, and problem solving, patients with mismatches were most often classified as less dependent on the MDS. For transferring, bathing, dressing, and comprehension, mismatches occurred in both directions, with a substantial proportion of misclassified patients classified as more dependent either on the MDS or the FIM. These patterns were not different among patients for whom the time between FIM and MDS assessments was below or above the median. The absolute differences of mean item scores was less than 0.8 for 11 of the 12 FIM-Pseudo-FIM(E) item pairs (table 3). Mean scores for 5 of the 12 FIM items were not statistically different (Wilcoxon’s rank sum test > .30) from analogous MDS items resealed according to the expert panel recommendations (Pseudo FIM(E) items): transferring, toileting, expression, comprehension and problem solving. Mean scores were statistically significantly lower (implying greater dependence) for FIM items as compared to their Pseudo-FIM(E) counterparts for dressing and memory, and statistically significantly higher (more independent) in the FIM than Pseudo-FIM(E) items for 5 items-eating, bathing, grooming, bladder control, and bowel control. As expected, item scores were more similar when comparing FIM with MDS-based items resealed according to empirically

Item Generic

FIM Item

Eating Transferring Toileting Dressing Bathing Grooming Bladder continence Bowel continence Memory Expression Comprehension Problem solving *Analogous mendations t Wilcoxon

P v&let

Pseudo-

Description

5.6 3.2 3.3 2.9 3.3 4.9 4.7 5.3 4.7 5.5 5.4 4.6

FIM(E) Item*

(1.7) (1.7) (1.8) (1.6) (1.7) (1.7) (2.6) (2.3) (2.1) (1.8) (1.7) (2.0)

5.0 3.3 3.2 3.3 2.7 3.7 3.9 4.5 5.3 5.9 5.7 4.9

items from the MDS re-scaled (see Methods for details). ranked sum test.

using

(1.3) (1.3) (1.3) (1.2) (1.0) (1.3) (2.5) (2.5) (2.2) (1.1) (1.1) (1.5) expert

<.OOOl .28 .28 .OOOl .OOOl .OOOl .OOOl .0006 .OOOl .31 .79 .44 panel

recom-

derived formulas (Pseudo-FIM(0) items; data not shown). Specifically, mean item scores were not significantly different 0, >.05) between FIM and Pseudo-FlM(0) scores for 8 items: eating, transferring, toileting, dressing, grooming, bladder control, memory, and problem solving. Cronbach alpha internal consistency-reliability coefficients were between .85 and .94 for the two subscales derived from FIM and Pseudo-FIM(E) items (table 4). Although there were strong relationships between the FIM and Pseudo-FIM(E) scales and subscales, there was some bias created by the differences. Mean values of subscales were 3.8 points higher (less impaired) for FIM than Pseudo-FIM(E) motor subscales (p = .002), 2.0 points higher (less impaired) for FIM than Pseudo(E) ADL subscales, and 1.8 points lower (more impaired) for FIM than Pseudo-FIM(E) cognitive subscales @ = .006) (table 4). Mean subscale values were not different (p > 0.3) between FIM and Pseudo-FIM(0) items (data not shown). Subscale scores were highly correlated across the range of values (fig 1). There was a “ceiling effect” for the cognitive subscale, likely due to the relatively low prevalence of cognitive impairment. For the motor, ADL, and cognitive subscales, Spearman Brown intraclass correlation coefficients were greater than 0.8 and Pearson’s correlation coefficients were between .72 and .78. Thus, 53% of the variance in the FIM motor subscale was explained by Pseudo-FIM(E) motor and ADL subscale scores, and 58% of the variance in the FIM cognitive subscale was explained by Pseudo-FIM(E) cognitive subscale scores (table 5). These results were essentially identical to those obtained comparing FIM and Pseudo-FIM(0) subscales. Table

4: Subscales

for FIM and Pseudo-FIM(E)

Items

Mean (SD) Subscale

FIM

Pseudo-FIM(E)

P Value’

Motort [Cronbach

33.3 (11.6) L.891

29.6 (9.4) I.861

.0002

alpha1

ADL [Cronbach

23.2 (8.3) 1.891

21.2 (6.3) L.921

.Ol

alpha1

Cognitivet ICronbach

20.1 (7.2) 1.941

21.9 (5.1) I.851

,006

alpha1

* Student’s t test for equality of means between subscales. t Motor subscale summed 6 ADL and 2 continence scale summed the 4 items for memory, expression, problem solving.

Arch

Phys

Med Rehabil

FIM

and

Pseudo-FIM

items; cognitive comprehension,

suband

Vol78,

1997

January

60 -

PREDICTING

FIM AND

0

o* *o o* a 0 0

00

R2= .53

0 0 0 o* 8

a *o 0 0 * 0 0 *mo

0

0

00

00

0

0 0

0 00 0

0 0 0

0

0

0

00

0

0 0

00

0 0

00

0 0 0 0 0 000 0

0

0

0

0 0

0

0

0 0

subscales.

0 0

00

0

00

0

Fig 1. (A) Motor

0

0

0

0 0

0

00

00

0

0 0

0

0 0

0 0

0 0

IB) Cognitive

subscales.

DISCUSSION The FIM and the MDS are two assessment instruments commonly used among rehabilitation patients, but are usually used in different types of rehabilitation facilities. The former is used primarily in hospital-based or freestanding rehabilitation facilities (“acute” rehabilitation), while the latter is used in virtually all nursing homes, including the large and growing number of patients receiving skilled rehabilitation services (‘ ‘subacute” rehabilitation) in nursing homes. With increasing numbers of rehabilitation patients receiving care in lower-cost settings outside acute hospitals, the ability to compare rehabilitation patients across settings is of interest to rehabilitation hospitals, nursing homes, and payors. A “crosswalk” between the FIM and the MDS would allow direct comparisons of the functional status of rehabilitation patients in different settings. Such information is useful in several contexts. For example, there is substantial interest among providers and payors in developing prospective payment systems for rehabilitation services.15%i6 In doing so, case-mix definitions will undoubtedly rely on a combination of functional status and diagnostic information. To be applicable, such definitions must be uniformly determined across alternative rehabilitation settings. Also, with comparable information on rehabilitation patients in alternative settings, direct comparisons of the effectiveness (degree of improvement among similar patients) and efficiency (cost of care to obtain a given degree of improvement) of care will be possible.

Arch

Phys

Med Rehabil

Vol78,

January

1997

MDS

SCORES,

Williams

To our knowledge, ours is the first study to compare FIM and MDS assessments on the same population of patients. Several conclusions may be drawn. Strong relationships were demonstrated between FIM- and MDS-based item and subscale scores. For most comparisons, the degree of agreement fell within commonly accepted standards of accuracy. Where differences were statistically significant, the actual degree of difference was often small (less than 10% of the total item or subscale range). This was not surprising, since statistical tests are designed to detect differences rather than similarities. Furthermore, observed differences between FIM and MDS-based (“Pseudo-FlM(E)“) scores are likely to be overestimates, because some of the observed differences are necessarily due to imperfect interrater reliability rather than to differences in the instruments themselves. Only the latter type of difference will affect the ability to predict “true” scores from one instrument to the other. Imperfect interrater reliability can account for observed differences between FIM and Pseudo-F&I(E) scores in two ways. First, FIM assessments were carried out largely by skilled therapists, whereas MIX assessments were carried out by nurses. Nurses may have assessed patients differently than therapists because of differing task expectations; therapists may rate similar patients’ functional status lower than nurses because of higher expectations of “normal” performance, given the primarily therapeutic focus of physical therapists as compared to the primary supportive focus of nursing.‘7 Thus in contexts where the goal is to compare FIM scores from one setting with MDS-based scores from another setting, it may be desirable to have the assessments performed by the same type of health professional in the two settings. Second, imperfect within-profession interrater reliability necessarily accounted for some of the observed differences between the two instruments, but could not be directly measured in the study. Our findings suggest that predicting the performance of groups of patients on individual items or on groups of items (subscales) from FIM- to MDS-based assessments (or vice versa) is feasible. In predicting group performance on individual items, although statistically significant differences were observed for several items, the absolute differences in group means for the two instruments were within 0.5 points (on a 7-point scale) for 6 and within 0.8 points for 11 of the 12 items. This degree of group item differences may be acceptable for comparsons of dependence among relatively large (eg, > 100) groups of rehabilitation patients. Group ratings on subscales demonstrated overall remarkable similarity and were within commonly accepted standards of reproducibility. Mean subscale score differences were therefore minimal, with mean score differences of 5% to 8% of the scale range. Predicting the performance of individual patients on specific items (reflected in the relatively wide range of values from one item for each given value from the comparable item from the other instrument) or groups of items (reflected in modest explained variance in subscale scores) from one instrument to the other should be undertaken with caution. The estimates obtained here may have been improved if additional assessor training had been carried out. These findings imply, however, that the primary instrument (FIM or MDS) should be used in contexts Table

5: Comparison

Subscale Motor ADL Cognitive

of Motor and Cognitive Subscale and Pseudo-FIM(E) Items Spearman-Brown Correlation Coefficient .81 .82 31

Scores

for FIM

Pl?XSOllS Correlation Coefficient

R2 Explained Variance

.72 .72 .78

.63 .63 .60

PREDICTING

FIM AND

in which knowing the score of an individual patient on a particular instrument is important (eg, measuring the change in functional status of an individual patient from a known baseline). Some of the observed differences between FIM- and MDSbased scores are likely due to “true” differences in the instruments themselves. Item definitions are not perfectly consistent (eg, inclusion of bowel/bladder assistive devices in bowel/bladder items in the FIM and in toileting in the MDS), item level definitions differ (eg, percent effort contributed by the patient in the FIM vs number of episodes of assistance in the past 7 days in the MDS), and level values must differ for some patients when comparing two items with different numbers of levels (eg, seven vs five) These differences between the instruments undoubtedly account for some of the discordant values observed between the two instruments. It is therefore somewhat remarkable that the net impact of these differences was small enough to allow meaningful comparisons between patients assessed using different instruments. Further work is necessary to validate the findings reported here. In carrying out such efforts, our findings highlight several necessary steps in defining (coding) values from one instrument to the other. First, 5 of the 18 FIM items should not be included in MDS-based FIM emulations. Second, MDS items for bowel and bladder continence are most analogous to the continence subcomponent of the analogous FIM items, and comparisons with the final (composite) FIM item should be undertaken with caution. Third, the MDS item for dressing is best approximated by the more dependent of the FIM items for dressing upper body and dressing lower body. Fourth, the choice of equivalent scale values from the MDS (four- or five-point) scales to the FIM (seven-point) scales is crucial. Finally, efforts to measure the comparability of IJIM- and MDS-based cognitive subscales should be undertaken among populations with more severe cognitive impairment than that observed in our study. In conclusion, by applying these results, program directors, rehabilitation organizations, policymakers, and payors will be able to compare the functional status of groups rehabilitation patients between acute (hospitals) and chronic (nursing homes) settings, using data from two different measurement instruments, along two separate dimensions-motor and cognitive function. That is, by using the information in table 1 and the Appendix, a common set of 12 items can be developed from either FIM or MDS assessments. Thus, from information from either instrument alone, a group of item and subscale (motor and cognitive) scores can be derived for groups of patients that provide meaningful estimates of scores that would have resulted for the comparable items and two subscales if the patients had been assessed using the other instrument. Two specific types of applications of the results will be most useful. First, the functional status of patients currently receiving rehabilitation services in the two types of settings may be compared. For example, by comparing the functional status at admission of acute rehabilitation patients with nursing home rehabilitation patients, the financial effects and clinical implications of implementing a single prospective payment system for both types of rehabilitation services can better be estimated. Also, by using information from serial assessments of functional status, the effectiveness (improvement in functional status) and efficiency (functional improvement per unit cost) of rehabilitation services in high-cost hospital and lower-cost nursing home settings can be compared. This information will prove crucial as rehabilitation organizations, payors, and policymakers attempt to identify the most effective and least costly settings in which to deliver skilled rehabilitation services. Second, the functional status of groups of nursing home rehabilitation patients who have previously had FIM assessments may be followed over time. For example, rehabilitation organizations and policy makers could compare the outcomes of rehabilitation

MDS

SCORES,

53

Williams

between patients who are transferred to nursing homes with those whose entire rehabilitation episode occurs in acute rehabilitation settings. This would provide useful information on the effects of transferring patients from acute rehabilitation to nursing home rehabilitation settings and on the ability to identify the most appropriate conditions for patient transfer between types of rehabilitation settings. Important cautionary notes regarding these types of applications of a FIM-MDS crosswalk are that: (1) results should only be applied to large groups of patients; (2) interpretations of information on changes in functional status with time based on assessments with either the FIM or the MDS (or a combination of the two) should focus on relatively large differences in functional status, because lower limit of each instrument’s sensitivity to changes in functional status has not been determined; and (3) the accuracy (mainly interrater reliability) of each instrument should be verified for settings where the information will be used to influence patient care, quality assessments, or reimbursement policies. Acknowledgment: The authors are grateful to Byron Hamilton, MD, for substantive and conceptual advice: to the seven members of the expert panel for their time and expertise (Katherine Berg, PT, PhD; Pam Duncan. PT. PhD: Carol Frattali. PhD: Helen Hoenie. OT. MD. MPH; Audrey Holland, PhD; Hillary Siebens; MD; and Margaret Stineman, MD); to the NovaCare clinicians who participated in data gathering; to Andrzej Gale&i MD, PhD, and Larry Gruppen, PhD, for statistical assistance; to Lynn Anslow MHSA for data management. References

1. Fiedler RC, Granger CV, Ottenbacher KJ. The uniform data system for medical rehabilitation: report of first admissions for 1994. Am J Phys Med Rehabil 1996; 75: 125-9. 2. Hamilton BB, Laughlin JA, Fiedler RC, Granger CV. Interrater reliabilitv of the 7-level Functional Independence Meassure (FIM). . Stand J kehabil Med 1994;26:115-9. 3. Granger CV, Hamilton BB, Linacre JM, Heinemann AW, Wright BD. Performance profile of the Functional Independence Measure. Am J Phys Med Rehabil 1993;72:84-9. 4. Hawes C. Phillius CD. Mor V. Fries BE. Morris JN. MDS data should be used ior research. Gerontologist’ 1992;32:563-4. 5. Williams BC, Fries BE, Foley WJ, Schneider D; Gavazzi M. Activities of daily living and costs in nursing homes. Health Care Financing Review 1994; 15:117-35. 6. Morris JN. Fries BE. Mehr DR. Hawes C. Phillius C. Mor V. et al. MDS Cognitive Performance Scale. J Get&to1 1$94149:M174:82. 7. Wolk S, Blair T. Trends in medical rehabilitation. Reston (VA): American Rehabilitation Association, 1994. 8. Uniform Data System for Medical Rehabilitation FIM Credentialing Examination, Version 4.0. Buffalo (NY): State University of New York at Buffalo, 1994. 9. Hawes C, Morris JN, Phillips CD, Mor V, Fries BE, Nonemaker S. Reliability estimates for the Minimum Data Set for nursing home resident assessment and care screening (MDS). Gerontologist 1995; 35:172-g.

10. Linacre M, Heinemann AW, Wright BD, Granger V. Hamilton BB. The structure and stability of the functional independence measure. Arch Phys Med Rehabil-1994;75:127-31. 11. Nunnallv JC. Psvchometric theorv. 2nd ed. New York: McGrawHill, 1978. . 12. Fleiss JL. The design and analysis of clinical experiments. New York: John Wiley and Sons, 1986. 13. Winer BJ. Statistical principles in experimental design. New York: McGraw-Hill, 1962. 14. SAS Release 6.03. Cary (NC): SAS Institute, 1988. 15. Harada N, Kominski, Sofaer S. Development of a resource-based patient classification scheme for rehabilitation. Inquiry 1993; 30:54-63. 16. Stineman M, Escarce J, Goin J, Hamilton B, Granger C, Williams S. A case-mix classification sytem for medical rehabilitation. Med Care

1994;32:366-79.

-

17. Adamovich BLB. Pitfalls in functional assessment: a comparison of FIM ratings by speech language pathologists and nurses:Nemo Rehab 1992; 2(4):42-5 1.

Arch

Phys

Med Rehabil

Vol78,

January

1997

54

PREDICTING

FIM

AND

MDS

SCORES,

Williams

APPENDIX: CODING SCHEMES FOR MDS-BASED PSEUDO-FIM(E) ITEMS CORRESPONDING FIM ITEMS, AND PERCENTAGE OF PATIENTS BY LEVEL FIM Item Eating-Self

Transfers: Bed, Chair, Wheelchair

Transferring-Self

Toileting

Toilet Use-Self

Performance

Bathing-Self

Bathing

Personal Hygiene-Self

Bowel

Mgmt

Bowel

Memory

3 3.5 8

4 2.5 4

5 1.01 5.

10

2 4.5 12

3 3.5 37

4 2.5 31

5 1.01 10

k6.0 11

2 4.5 8

3 3.5 33

4 2.5 88

5 1.01 10

;6.0 1

2 4.5 8

3 3.5 18

4 2.5 59

5 1.01 14

t6.0 9

2 4.5 10

3 3.5 42

4 2.5 32

5 LO] 8

k6.0 17

2 4.5 16

3 3.5 37

4 2.5 24

5 1 .O] 6

:6.5 45

2 4.5 8

3 3.0 6

4 2.5 3

5 1 .O] 38

:6.5 58

2 4.5 3

3 3.0 5

4 2.5 3

5 1 .O] 30

k7.0 58

2 5.5 8

3 4.0 10

4 2.5 16

5 1.01 9

F6.5 86

2 5 11

3 3 2

4 1 .O] 1

t6.0 61

2 5.0 25

3 3.5 13

4 1.51 0

t6.5 39

2 5.0 32

3 3.0 26

4 1 .O] 3

Performance

Performance

Continencet

Continence?

Memory*

Expression

Make Self Understood

Comprehension

Problem

2 4.5 30

Performance

Grooming

Bladder

Impairment

Performance

Dressing-Self

Mgmt

Increasing

/6.0 54 i6.0

Performance

More dependent of Dressing Upper Body, Dressing Lower Body

Bladder

MDS Item Levels (Resealed MDS Item = Pseudo-FIM[E]* levels) Percentage of Patients

MDS Item

Eating

Ability

solving

Cognitive

to Understand

Others

Skills for Daily Decision

Making

* Pseudo FIM(E) values defined as mean value of FIM for given level of MDS item deemed plausible for details). t Patients classified as dependent if device or appliance used. $ Pseudo-FIM item defined as sum of 4 MDS dichotomous items (see Methods for details).

Arch

Phys Med

Rehabil

Vol79,

January

1997

TO SELECTED

-+

by expert panel (see Methods