Rasch analysis of a new stroke-specific outcome scale: the stroke impact scale1

Rasch analysis of a new stroke-specific outcome scale: the stroke impact scale1

950 ARTICLES Rasch Analysis of a New Stroke-Specific Outcome Scale: The Stroke Impact Scale Pamela W. Duncan, PhD, FAPTA, Rita K. Bode, PhD, Sue Min...

540KB Sizes 0 Downloads 50 Views

950

ARTICLES

Rasch Analysis of a New Stroke-Specific Outcome Scale: The Stroke Impact Scale Pamela W. Duncan, PhD, FAPTA, Rita K. Bode, PhD, Sue Min Lai, PhD, MS, MBA, Subashan Perera, PhD, Glycine Antagonist in Neuroprotection Americas Investigators ABSTRACT. Duncan PW, Bode RK, Min Lai S, Perera S, Glycine Antagonist in Neuroprotection Americas Investigators. Rasch analysis of a new stroke-specific outcome scale: the Stroke Impact Scale. Arch Phys Med Rehabil 2003;84:950-63. Objectives: To assess multiple psychometric characteristics of a new stroke outcome measure, the Stroke Impact Scale (SIS), using Rasch analysis, and to identify and remove misfitting items from the 8 domains that comprise the SIS. Design: Secondary analysis of 3-month outcomes for the Glycine Antagonist in Neuroprotection (GAIN) Americas randomized stroke trial. Setting: A multicenter randomized trial performed in 132 centers in the United States and Canada. Participants: A total of 696 individuals with stroke who were community-dwelling and independent prior to acute stroke. Interventions: Not applicable. Main Outcome Measures: Rasch analysis was performed using WINSTEPS, version 3.31, to evaluate 4 psychometric characteristics of the SIS: (1) unidimensionality or fit (the extent to which items measure a single construct), (2) targeting (the extent to which the items are of appropriate difficulty for the sample), (3) item difficulty (the ordering of items from least to most difficult to perform), and (4) separation (the extent to which the items distinguish distinct levels of functioning within the sample). Results: (1) Within each domain, most of the items measured a single construct. Only 3 items misfit the constructs and were deleted (“add and subtract numbers,” “get up from a chair,” “feel emotionally connected”) and 2 items (“handle money,” “manage money”) misfit the combined physical domain. These items were deleted to create SIS, version 3.0. (2) Overall, the items are well targeted to the sample. The physical and participation domains have a wide range of items that capture difficulties that most individuals with stroke experience in physical and role functions, while the memory, emotion, and

From the Brooks Center for Rehabilitation Studies, University of Florida, Gainesville, FL (Duncan); North Florida/South Georgia Veterans Health System, Gainesville, FL (Duncan); Rehabilitation Services Evaluation Unit, Rehabilitation Institute of Chicago, Chicago, IL (Bode); Department of Physical Medicine & Rehabilitation, Northwestern University Medical School, Chicago, IL (Bode); and Department of Preventive Medicine and Center on Aging, University of Kansas Medical Center, Kansas City, KS (Lai, Perera). Supported by the University of Kansas Medical Center Claude D Pepper Older Americans Independence Center (grant no. AG-96-003), and the American Heart Association Pharmaceutical Roundtable for Outcomes Research. Financial and material support for the GAIN Americas trial (protocol GLYA3OO2) was also provided by GlaxoWellcome Inc. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated. Reprint requests to Pamela W. Duncan, PhD, FAPTA, Brooks Center for Rehabilitation Studies, University of Florida Health Sciences Center, PO Box 100185, Gainesville, FL 32610-0185. 0003-9993/03/8407-7193$30.00/0 doi:10.1016/S0003-9993(03)00035-2

Arch Phys Med Rehabil Vol 84, July 2003

communication domains include items that capture limitations in the most impaired patients. (3) The order of items from less to more difficult was clinically meaningful. (4) The individual physical domains differentiated at least 3 (high, average, low) levels of functioning and the composite physical domain differentiated more than 4 levels of functioning. However, because difficulties with communication, memory, and emotion were not as frequently reported and difficulties with hand function were more frequently reported, these domains only differentiated 2 (high, low) to 3 (high, average, low) strata of patients. Time from stroke onset to administration of the SIS had little effect on item functioning. Conclusion: Rasch analysis further established the validity of the SIS. The domains are unidimensional, the items have an excellent range of difficulty, and the domain scores differentiated patients into multiple strata. The activities of daily living/ instrumental activities of daily living, mobility, strength, composite physical, and participation domains have the most robust psychometric characteristics. The composite physical domain is most able to discriminate difficulty in function in individuals after stroke, while the communication, memory, and emotion domain items only capture limitations in function in the more impaired groups of patients. Key Words: Rehabilitation; Stroke; Treatment outcome. © 2003 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation HE STROKE IMPACT SCALE (SIS) is a new strokespecific outcome measure that is a comprehensive measure T of health outcomes. The measure was developed from the 1

perspective and input of stroke patients, caregivers, and health professionals with stroke expertise. The SIS, version 2.0, includes 64 items and assesses 8 domains (strength, hand function, activities of daily living [ADLs]/instrumental activities of daily living [IADLs], mobility, communication, emotion, memory and thinking, participation). Four of the domains can be combined to produce a composite physical domain score. Rather than rating performance on SIS items as dependent or independent, performance is self-reported according to the difficulty experienced by the subject. The original psychometric examination of the SIS was performed in 91 individuals with mild and moderately severe strokes. The individuals were enrolled from the Kansas City Stroke Study and assessed at 1 month and 3 months poststroke. The assessment was based on traditional psychometric methods and focused on internal consistency, validity, and reliability.1 The psychometric properties of the SIS, however, needed to be assessed in a larger and more diverse group of stroke survivors. Such a group would provide the opportunity to apply a more complex psychometric technique, such as Rasch analysis.2 Rasch analysis, based on a mathematical model developed by Rasch3 and whose applications in the human sciences have recently been described,4 has been used to aid in construction and validation health status questionnaires for neurology,5

951

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan Table 1: Eligibility Criteria

Inclusion criteria Aged 18 years or older. Symptoms consistent with acute stroke and present at the time of study treatment. Treatment can be initiated within 6 hours of symptom onset. Limb weakness present. If both arm and leg are affected, there must be drift within 10 seconds for the arm and 5 seconds for the leg; if only 1 limb is affected, the limb must touch the bed within 10 seconds for the arm or 5 seconds for the leg. Previously independent (modified Rankin score ⱕ1). Written informed consent given by the subject or a legally authorized representative. If female, must be of non-childbearing potential, or if of childbearing potential, with negative pregnancy test at screen and confirmation of adequate contraception use. Exclusion criteria Obtunded, or responding only with reflex motor or autonomic effects, or totally unresponsive, flaccid, areflexic. Symptoms rapidly improving and likely to resolve completely within 24 hours. Diagnosis or suspicion of subarachnoid hemorrhage. Known serious life-threatening illness likely to lead to death in the next 3 months. Symptoms consistent with severe congestive heart failure. Presence of malignant hypertension. Known history of significant renal impairment. Known history of significant hepatic disease. Participation in a clinical trial with an investigational drug or internal device within the past 3 months. Previous treatment with gavestinel. Unlikely to be available for follow-up.

physical medicine,6-12 some specifically for stroke samples,13-15 and for orthopedics,16 rheumatology,17 endocrinology,18 and geriatrics.19 Evaluation of the SIS with Rasch analysis provides psychometric information that is not provided with traditional analyses. Rasch analysis compares the response patterns of individuals to the entire sample to estimate person “ability” and item “difficulty.” It is a probability model that converts the ordinal scores obtained by summing item scores into interval measures.20 While the ordinal raw scores used in traditional analyses are typically used as if they were interval in nature, the measures produced by Rasch analysis are on an equalinterval scale that is common to both persons and items. Rasch analysis uses these equal-interval measures to assess multiple psychometric characteristics: (1) unidimensionality or fit (the extent to which items measure a single construct); (2) targeting (the extent to which items are of appropriate difficulty for the sample); (3) item difficulty (the ordering of items from least to most difficult to perform); and (4) separation (the extent to which the items distinguish distinct levels [strata] of functioning within the domain). One major benefit of Rasch analysis is its focus on the hierarchy of items (from easy to perform to difficult to perform) in each domain. Having this hierarchy aids in the understanding of progress in the recovery of functional status. A good stroke-specific outcome measure would possess all the features listed above. The objectives of this study were to examine the psychometric characteristics of the SIS with Rasch analysis, and to identify and remove misfitting items from the domains to improve the measurement of stroke impact. METHODS Study Data Set The psychometric characteristics of the SIS were assessed among participants enrolled in the Glycine Antagonist in Neuroprotection (GAIN) Americas Trial,21 a multicenter randomized double blind placebo-controlled clinical trial. A total of 1605 individuals with ischemic or hemorrhagic stroke from 132 centers in the United States and Canadian were randomized and treated. The SIS substudy began after the main trial

was in progress and included 696 patients. Participants had a stroke that caused limb weakness (with or without other deficits) and were functionally independent prior to the stroke. The eligibility criteria for the GAIN Americas study are outlined in table 1. The main trial conclusively demonstrated that the investigational drug did not improve functional outcomes.21 Therefore for this assessment of the SIS (N⫽696), we pooled the intervention (n⫽346) and placebo subjects (n⫽350). The Stroke Impact Scale The SIS, version 2.0, includes 64 items grouped into 8 domains: strength, memory and thinking, emotion, communication, hand function, ADLs/IADIs, mobility, and participation. Four of the domains (strength, hand function, ADLs/

Table 2: Demographic Baseline Characteristics Characteristic

N⫽696

Mean age ⫾ SD (y) Median (y) Mean NIHSS score ⫾ SD Median Gender: female, n (%) Race, n (%) White Black Hispanic Asian Other OCSP subtype of stroke, n (%) Total anterior circulation stroke syndrome Partial anterior circulation stroke syndrome Lacunar stroke syndrome Posterior cerebral circulation syndrome Unknown (nonischemic/no stroke)

68.6⫾12.5 70 12.1⫾5.9 11 310 (45) 586 (84) 49 (7) 17 (2) 38 (5) 6 (1) 222 (32) 272 (39) 92 (13) 24 (3) 86 (12)

NOTE: Not all column percentages sum to 100 due to rounding. Abbreviations: NIHSS, National Institutes of Health Stroke Scale; OCSP, Oxfordshire Community Stroke Project.

Arch Phys Med Rehabil Vol 84, July 2003

952

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan Table 5: Item Difficulty Ranges by SIS Domain

Table 3: Fit Statistics for Deleted Items SIS Domains and Items

Range of Logits at Middle Category

Infit Mean Square

Memory: add and subtract numbers Mobility: get up from chair Participation: feel emotionally connected

SIS Domain

1.51 1.50 1.56

IADLs, mobility) can be aggregated to produce a composite physical domain.1 Persons responded to items in each domain using a 5-point rating scales: for strength, the rating categories range from “no strength at all” to “a lot of strength”; for emotions and participation, they range from “none of the time” to “all of the time”; for the remaining domains, they range from “extremely difficult/cannot do at all” to “not difficult at all.” Administration of the SIS The SIS was administered during face-to-face interviews with the patient or a proxy at 1 and 3 months poststroke. The interviews were conducted by study coordinators who received instruction at an investigators’ meeting and reviewed an administration guide. Prior to administration of the SIS, the patient was asked to follow a 3-step command. If the patient was unable to comply for any reason, including cognitive impairment or aphasia, then the investigator asked a family member to complete the interview as a proxy for the patient. Analysis Rasch analysis2,22 was performed using WINSTEPS,a version 3.3l.23 The items in each domain and in the composite physical domain were analyzed separately by time since onset and for the 2 assessments combined. The following discussion provides an overview of the psychometric characteristics of the SIS that were evaluated with the Rasch analysis. Unidimensionality The items in each domain should assess a single dimension or construct. For example, if one wants to measure mobility, the domain should not measure emotion, communication, or cognition. If a domain combines items from several different dimensions, the interpretation of scores would be difficult. In Rasch analysis, lack of unidimensionality is reflected in poor fit statistics.22 The assessment of fit is similar conceptually to a chi-square analysis where the observed response is compared with the response predicted by the model. Items have high infit statistics when they do not measure the same dimension or

Strength Memory Emotion Communication ADL/IADL Mobility Hand function Participation Composite physical

⫺0.58 ⫺0.24 ⫺0.71 ⫺0.67 ⫺1.63 ⫺2.66 ⫺0.67 ⫺0.96 ⫺1.59

to to to to to to to to to

Range of Logits Across All Categories

⫺3.80 ⫺2.00 ⫺2.10 ⫺2.00 ⫺2.75 ⫺4.16 ⫺2.10 ⫺2.20 ⫺2.80

0.62 0.24 0.91 0.74 1.45 1.89 0.62 0.70 1.04

to to to to to to to to to

4.00 2.13 2.10 2.22 2.63 3.48 2.10 1.90 2.32

construct as the other items in the set. A moderately stringent criterion2 for acceptable fit statistics on this rating scale would be mean square values less than 1.3. The mean square is the ratio of the observed response to the predicted response and the ideal value is 1.0. Because Rasch analysis is a probability model it allows for some variation from expectation. The fit criteria used in this study resulted in only items with no more than 30% variation from ideal being retained in each domain. Item Difficulty (Hierarchy) Items that are used to represent a construct should form a hierarchy of difficulty, ranging from the least difficult for the respondent to perform to the most difficult to perform. For example, it is easier to climb 1 flight of stairs than climb several flights of stairs or to get dressed than to do heavy household chores. In Rasch analysis, both personal ability and item difficulty are expressed as a logit, the natural logarithm of the odds of a person being able to perform a particular task. The odds of a person being able to perform a particular task is the ratio of the probability of not being able to perform the task to the probability of being able to perform the task. Logits of greater magnitude represent increasing item difficulty. When instruments are developed using a conceptual hierarchy of items, the empirical ordering produced by the Rasch analysis can be compared with the theoretical ordering and the result can be treated as evidence of construct validity. Person and Item Separation To describe the reliability of the instrument for the sample, Rasch analysis produces an index that indicates the number of distinct strata of persons discerned within each domain. The

Table 4: Person and Item Separation Statistics Persons

Items

SIS Domain

Average Domain Measure

SE of Measurement

Separation Index

Separation Reliability

SE of Measurement

Separation Index

Separation Reliability

Strength Memory Emotions Communication ADL/IADL Mobility Hand function Participation Composite physical

.15 .98 .82 .87 .24 .10 .19 ⫺.02 .03

.90 .65 .47 .62 .43 .57 .63 .43 .32

2.08 2.26 1.53 1.78 2.85 3.47 1.71 2.15 4.40

.81 .84 .70 .76 .89 .92 .74 .82 .95

.05 .04 .03 .04 .04 .04 .04 .03 .03

10.40 3.78 12.71 10.94 24.00 29.56 9.68 15.83 20.47

.99 .93 .99 .99 1.00 1.00 .99 1.00 1.00

Abbreviation: SE, standard error.

Arch Phys Med Rehabil Vol 84, July 2003

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

953

Fig 1. Hierarchical order of items for the strength measure and item map.

separation index (G) is similar conceptually to a t test between 2 groups. In a separation index, the numerator is the variance in the person measures for the group and the denominator is the average error in estimating these measures. The larger the index, the more distinct levels of functioning can be distinguished in the measure. A person separation index of 1.50 represents an acceptable level of separation, an index of 2.00 represents a good level of separation, and index of 3.00 represents an excellent level of separation. The number of distinct strata that can be distinguished in the sample is computed using the formula: number of distinct strata⫽(4G⫹1)/3.2,24 With an index of 1.50, one can discern 2 strata (high, low); with an

index of 2.00, one can discern 3 strata (high, average, low); and with an index of 3.00, one can discern 4 strata (high, above average, below average, low). The separation reliability is the ratio of the “true” (observed minus error) variance to the obtained variation. The smaller the error, the higher the ratio will be. It ranges from .00 to 1.00 and is interpreted the same as the Cronbach ␣. An index of 1.50 is comparable to a coefficient of .70 (acceptable), an index of 2.00 is comparable to a coefficient of .80 (good), and an index of 3.00 is comparable to a coefficient of .90 (excellent). Item separation index and reliability use the same criteria and are interpreted the same as person separation index and reliability. Arch Phys Med Rehabil Vol 84, July 2003

954

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

Fig 2. Strength item map across rating categories. NOTE. Each “#” is 12.

In addition to estimates of person and separation, the average domain measures and item calibrations are compared to determine the extent to which the set of items is at the appropriate level of difficulty (targeted) to capture the limitations in functions of the sample. In the calibration, the average difficulty level across items and rating scale categories is fixed at zero; thus, an average domain measure of zero would represent perfect targeting of the items in terms of their difficulty for the sample. The more the average domain measure differs from zero, the more the set of items is mistargeted. An average domain measure of .50 would indicate slight mistargeting and average measure of 1.00 would indicate more substantial mistargeting. Positive average measures indicate items that are too easy for the sample and negative average measures indicate items that are too hard for the sample. If the sample is representative of the population, this information is useful for determining whether easier or more difficult items should be included in the SIS to capture the full range of limitations in function. Arch Phys Med Rehabil Vol 84, July 2003

RESULTS A total of 696 subjects completed an SIS questionnaire at 1 or both time points. A total of 1264 SIS questionnaires (640 month 1 assessments, 624 month 3 assessments) were included in the Rasch analysis. Table 2 summarizes the baseline characteristics of the subjects. Unidimensionality As might be expected from an existing instrument that has already been evaluated psychometrically, very few items from the 8 SIS domains misfit the constructs (fit statistics, ⬎1.3) they were intended to measure. Three items from 3 SIS domains misfit (table 3). Two items (handle money, manage finances) misfit the composite physical domain. Targeting The average measures by domain, presented in table 4, indicate whether the items were at the appropriate level of difficulty for the sample. As evidenced by average measures

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

955

Fig 3. Hierarchical order of items for the memory and thinking measure and item map.

approaching zero (⫺.02 to .24), the individual physical domains, the composite physical function domain, and the participation domain capture limitations in functions in this sample. However, the memory, emotion, and communication domains with high average measures (.82–.98) did not capture major limitations in these functions in this sample. In other words, the items are too easy and would only detect difficulty in a group of stroke survivors very impaired in these domains. The standard errors (SEs) of measurement for both the person measures and item difficulty estimates are also presented in table 4; the comparison of average measures and difficulty estimates taking SEs of measurement into account confirm these findings.

Item Difficulty (Hierarchy) The figures illustrate the hierarchical order of items from the WINSTEPS output for the strength, memory, emotion, communication, ADL/IADL, mobility, hand function, participation, and composite physical domains. Rasch analysis of rating scale data produces multiple estimates of item difficulty, 1 for each transition point from 1 rating category to the next. To simplify the illustration of the hierarchies, only the difficulty level at the middle rating category is shown in the figures. The range of item difficulty estimates at the middle rating category and across all categories is presented in table 5. When the difficulty level across categories is used, the actual range of Arch Phys Med Rehabil Vol 84, July 2003

956

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

Fig 4. Hierarchical order of items for the emotion measure and item map.

difficulty covered by the sets of items is apparent. The order of items on the y axis from bottom to top show their increasing level of difficulty (fig 1). The criterion for significant differences in item difficulty estimates is 2 SEs of measurement. The distribution of patient measures is represented on the x axis (fig 1). Because the hierarchy of item difficulties did not differ significantly across time, the SIS data from administrations at 1 and 3 months poststroke were combined. The map in figure 1 shows that the strength domain spreads the sample well, with some individuals clustering at the top and bottom of the scale, representing those who cannot perform any of the tasks at all (floor effect) or can perform all the tasks with no difficulty at all (ceiling effect). Arch Phys Med Rehabil Vol 84, July 2003

While the range of item difficulty shown in this figure is only slightly greater than 1 logit, the range across categories is almost 5 logits. This increased range is illustrated in figure 2. In this figure, the item difficulty levels are shown separately for each transition point between rating scale category from “no strength at all” to “a lot of strength.” In some categories, the 4 strength items cover a good portion of the range of strength in this sample. In the middle category shown in figure 1, all items are at least 2 SEs (.10) apart, indicating significantly different difficulty levels. The order of items on the y axis shows that individuals have a greater probability of reporting difficulty in strength of the upper than the lower extremity.

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

957

Fig 5. Hierarchical order of items for the communication measure and item map.

The map in figure 3 shows that the items in the memory domain were, for the most part, easy to endorse and most individuals did not report difficulty with them. Although in the middle category the items fell within a narrow range of difficulty (⫺24 to 24), the range across categories was more than 4 logits. However, not all of the items differed significantly in difficulty level (2 SEs of measurement⫽.08). “Remembering the day of the week” and “remembering things that people just told you” are essentially of equal difficulty, as are “solve problems,” “remember to do things,” “concentration,” and “add and subtract numbers.” However, the order of item difficulty is as expected: “thinking quickly” is the most difficult item and “remembering things that people just told you” is the easiest item. The order is clinically logical in that one would expect patients to have less difficulty with short-term memory than executive skills.

The map in figure 4 shows that a large proportion of individuals did not report much difficulty endorsing the emotion items. However, across categories the emotion items have an acceptable range of difficulty (⬎2 logits). Two pairs of items have essentially equal difficulty levels (2 SEs of measurement⫽.06): “feel life is worth living,” “don’t feel like a burden to others,” “smile and laugh daily,” and “not nervous.” The order of the items was logical; individuals in this sample were more likely to deny “having nobody to be close to” and less likely to report “enjoying things as much as they ever had.” The map in figure 5 shows that most individuals in this sample reported little difficulty with the communication domain. There was a good spread in item difficulty (approximately 4.25 logits across categories) and no overlap in item difficulty (2 SEs of measurement⫽.08). The hierarchy of items is Arch Phys Med Rehabil Vol 84, July 2003

958

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

Fig 6. Hierarchical order of items for the ADL/IADL measure and item map.

clinically logical; “calling someone (initiating the call) on the phone” is more difficult than simply “talking on the phone” and “participating in a conversation” is more difficult than “understanding what is said.” The map in figure 6 shows a good range of item difficulty in the ADL/IADL domain (4 logits across categories). All the items’ difficulties were at least 2 SEs (.08) apart. The order of item difficulty matched clinical expectation, with self-care tasks being easier to perform than household activities and other IADLs. The items in the mobility domain (fig 7) had the widest range of difficulty (almost 7 logits across categories) of all the domains and all the item’s difficulties were at least 2 SEs (.08) apart. As would be expected clinically, balance and transfer items were the easiest to perform, and walking fast and stair climbing were the hardest to perform. Arch Phys Med Rehabil Vol 84, July 2003

The map in figure 8 shows that, across items, 38% of the individuals reported major difficulty in hand function, while only 12% reported no difficulty. The items in the hand function domain did not overlap in difficulty (2 SEs of measurement⫽.08) and have a good range of difficulty (4 logits across categories). The map in figure 9 shows that the participation domain had a good range in item difficulty (⬎4 logits across categories). Approximately 8% of individuals reported difficulty in participation and approximately 7% reported major difficulty. There was little overlap in item difficulty; “participating in active recreation” and “participating in work activities” were within 2 SEs (.06). The items had a logical sequence of difficulty, with “participation in social activities” being easier to endorse than “active recreation” or “work activities.”

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

959

Fig 7. Hierarchical order of items for the mobility measure and item map. Abbreviation: w/o, without.

The map in figure 10 shows that the composite physical domain was normally distributed in this sample. The range of item difficulty was wide, 5 logits across categories. However, there were many items that were of similar levels of difficulty. Because multiple domains were included in this hierarchy, it was more difficult to describe the ordering of items from easier to harder. In general, because the ADL/IADL and mobility items had the greatest spread in item difficulty, they represent the extremes in this item distribution, with balance and continence the easiest to perform and stair climbing and household activities the hardest to perform. Person and Item Separation The person and item separation indices for each domain are also reported in table 4. As evidenced by separation indices

greater than 2, the results demonstrated a good spread relative to error and reliability for the person measures in 5 of the 8 domains and for the composite physical domain. The values for memory, emotion, and communication domains were only in the acceptable range because of the ceiling effect in those domains and those for the hand function domain were only acceptable because of the floor effect in that domain. That is, few individuals reported difficulty in memory and communication or frequent feelings of poor psychosocial well-being, and many reported difficulties with hand function activities. The communication, emotion, and hand function domains differentiated 2 (high, low) strata of patient functioning. Strength, memory, and participation differentiated 3 (high, average, low) strata of patient functioning. However, the ADL/IADL and Arch Phys Med Rehabil Vol 84, July 2003

960

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

Fig 8. Hierarchical order of items for the hand function measure and item map.

mobility domains differentiated 3 to 4 (high, above average, below average, low) strata and the composite physical domain discriminated more than 4 strata of patients. The items within each domain had excellent separation indices and had excellent internal consistency (range, .93–1.00) (table 4). The values indicate that each domain covers a useful range of item difficulty that can be appropriate for measuring persons with a wide range of functional ability. DISCUSSION Rasch analysis had been widely used for the validation of self-report outcome measures. This study used Rasch analysis to assess further the validity of a new stroke-specific outcome measure, the SIS. This is the first study that has validated a Arch Phys Med Rehabil Vol 84, July 2003

stroke outcome measure on a large group of stroke survivors with a broad range of stroke severity (National Institute of Health Stroke Scale score range, 2–20). It included individuals with both ischemic and hemorrhagic stroke from a variety of clinical sites. The results of this data analysis indicate that most of the items in the 8 domains of SIS, version 2.0, measured the intended constructs and were unidimensional. Only 3 items misfit the rest of the items in their domain. Examination of the items that misfit the construct measured by the remaining items suggests some possible reasons for lack of coherence. In the memory domain, for example, perhaps “add and subtract numbers” measures higher executive function of computational skills more than it does memory. In the combined composite

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

961

Fig 9. Hierarchical order of items for the participation measure and item map.

physical domain, “handle money” and “manage finances” appear to measure cognitive skills while the rest of the items measure motor skills. In mobility, “get up from chair without support” may be a measure of lower-extremity strength more than mobility. In participation domain, “feel emotionally connected” appears to measure emotion more than social participation. In each domain, the empirical ordering of items by difficulty was consistent with expectation regarding the theoretical ordering of task difficulty and therefore is supportive of the construct validity of the SIS. An examination of the use of the rating scales across domains using criteria suggested by Linacre25: (1) at least 10 cases per category, (2) regular distribution of category use, (3) monotonically increasing average measures across category, (4) category outfit mean square values less than 2, (5) mono-

tonically increasing step calibrations, (6) step calibration differences greater than 1.4, (7) step calibration differences less than 5, and (8) category coherence levels greater than 1.4. Across SIS rating scales, there were a sufficient number of cases in each category, the average measures increase monotonically across categories, and there was no misfit within any categories. Many of the scales exhibited step-calibration inversions in the middle categories, suggesting the possibility of collapsing the categories into 3 levels, “no difficulty,” “some difficulty,” and “a great deal of difficulty.” However, because we wanted to continue to develop this instrument to assess change over time, we opted to keep 5 levels of scoring. The original objectives for development of a new stroke outcome measure were to select items that represented funcArch Phys Med Rehabil Vol 84, July 2003

962

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

Fig 10. Hierarchical order of items for the physical function measure and item map.

tions beyond basic ADLs. As such, the functions should cover a sufficient range of difficulty and the items should be able to discriminate between groups of patients with different abilities. The Rasch analysis demonstrated that the items from the individual physical domains as well as the items from the composite physical domain targeted the abilities of the subjects extremely well (average measures, .03–.24). The ADL/IADL and the mobility domains and the combined physical domain had the best (widest) range of item difficulty (separation indices of 24.00, 29.56, and 20.47, respectively; table 4). Because difficulties with communication, memory, and emotion were not as frequently reported by the sample as were problems with physical function, and because difficulties with hand function were more frequently reported overall, the items for theses domains did not adequately target the abilities of this sample, the items had narrower ranges of item difficulty, and they discriminated only among 2 to 3 strata of patients. In contrast, the other physical domains discriminated among 3 to 4 strata of patients and the composite physical domain discriminated more than 4 strata of patients. Arch Phys Med Rehabil Vol 84, July 2003

The results of this analysis can be used to describe the typical performance of persons scoring at various points along the continuum. For example, someone scoring 1 standard deviation (SD) below the mean would typically not be able at all to cut his/her food with a knife and fork, do light housekeeping, go shopping, clip his/her own toenails, or do heavy household chores. Such persons would have a lot of difficulty bathing themselves and dressing their upper body, some difficulty controlling their bladder and getting to the toilet in time, and a little difficulty controlling their bowels. On the other hand, someone scoring 1 SD above the mean would typically find doing heavy household chores somewhat difficult; and clipping their own toenails, going shopping, and doing light housekeeping a little difficult but would have no difficulty at all with the remaining activities. As a result of this analysis, we deleted 5 items from 4 of the domains in SIS, version 2.0. These deletions improved the dimensionality of these scales and, despite containing fewer items, did not sacrifice person separation.

RASCH ANALYSIS OF A NEW STROKE-SPECIFIC OUTCOME SCALE, Duncan

The analysis of the combined physical domain (fig 10) revealed considerable overlap in items, suggesting that a much shorter combined physical scale could be developed. A subsequent article will report on the development of such a scale, the SIS-16. Implications for Selection of Outcome Measures The results of this analysis of the SIS for a sample of patients enrolled in a clinical trial of stroke patients have several implications for selection of outcome measures. First, the most commonly reported deficits are in the physical domains. Hand function is the most disabling motor deficit. Limitations in role functions (participation) are also common. Therefore the primary outcomes for clinical trials should be the physical domains and participation. Unless clinical trials specifically recruit patients with communication, memory, or emotional deficits, these domains should not be primary outcome measures. The majority of patients will not report difficulty with these items and the questions on these SIS domains are not appropriate for the minimally impaired. CONCLUSIONS Based on this analysis, we deleted 5 items in the SIS, version 2.0, and have created SIS, version 3.0. The ADL/IADL, mobility, strength, composite physical, and participation domains have the most robust psychometric characteristics. The domains are unidimensional, have good reliability, and have a wide range of items that capture the difficulties that most individuals with stroke experience in physical and role functions. Clearly, the composite physical domain is the most robust. In other words, there is an excellent range of item difficulty and a bell-shaped distribution of reported difficulties in physical function with little evidence (3% or less) of floor or ceiling effects. The communication, memory, and emotion domains are also unidimensional and have good reliability. However, the items in these domains are easy and only capture limitations in the most impaired individuals. Although the SIS, version 3.0, has limitations in the communication, memory, and emotion domains, it taps into domains of function not fully evaluated by current stroke outcome measures. These domains may provide additional information for research and clinical purposes that capture the full impact of stroke. In conclusion, the SIS, version 3.0, has good psychometric characteristics and will be useful in future clinical practice and research. The composite physical domain items may be most able to discriminate difficulty in function. This domain will be abbreviated in a future analysis to reduce redundant items, and to develop a shorter version of the SIS to assess physical function. References 1. Duncan PW, Wallace D, Lai SM, Johnson D, Embretson S, Laster U. The stroke impact scale version 2.0. Evaluation of reliability, validity, and sensitivity to change. Stroke 1999;30:2131-40. 2. Wright BD, Masters ON. Rating scale analysis: Rasch measurement. Chicago: MESA Pr; 1982. 3. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: MESA Pr; 1980. 4. Bond T, Fox C. Applying the Rasch model: fundamental measurement in the human sciences. Mahwah: Lawrence Erlbaum Associate; 2001.

963

5. Tsuji T, Liu M, Sonoda S, Domen K, Chino N. The stroke impairment assessment set: its internal consistency and predictive validity. Arch Phys Med Rehabil 2000;81:863-8. 6. Silverstein BJ, Fisher WP Jr, Kilgore KM, Harvey RF, Harley JP. Applying psychometric criteria to functional assessment in medical rehabilitation. I. Defining interval measures. Arch Phys Med Rehabil 1992;73:507-18. 7. Grimby G, Andren E, Holmgren E, Wright B, Linacre JM, Sundh V. Structure of a combination of functional independence measure and instrument activity measure items in community-living persons: a study of individuals with cerebral palsy and spina bifida. Arch Phys Med Rehabil 1996;77:1109-14. 8. Linacre JM, Heinemann AW, Wright BD, Granger GV, Hamilton BB. The structure and stability of the functional independence measure. Arch Phys Med Rehabil 1994;75:127-32. 9. Granger C, Linn R. Biologic patterns of disability. J Outcome Meas 2000;4:595-615. 10. Heinemann AW, Linacre JM, Wright BD, Hamilton BB, Granger CV. Relationships between impairment and physical disability as measured by the functional independence measure. Arch Phys Med Rehabil 1993;74:566-73. 11. Heinemann AP, Linacre JM, Wright BD, Hamilton BB, Granger CV. Measurement characteristics of the Functional Independence Measure. Top Stroke Rehabil 1994;1:1-15. 12. Kucikdeveci A, Yavuzer G, Tennant A, Suldur N, Sonel B, Arasil T. Adaptation of the modified Barthel Index for use in physical medicine and rehabilitation in Turkey. Scand J Rehabil Med 2000;32:87-92. 13. Bernspång B, Fisher AG. Differences between persons with right or left cerebral vascular accident on the Assessment of Motor and Process Skills. Arch Phys Med Rehabil 1995;76:1144-51. 14. Roth E, Heinemann AW, Lovell L, Harvey R, McGuire J, Diaz S. Impairment and disability: their relation during stroke rehabilitation. [published erratum appears in Arch Phys Med Rehabil 1998; 79:471]. Arch Phys Med Rehabil 1998;79:329-35 15. Wright BD, Segal ME, Heinemann AW, Schall RR. Rasch analysis of a brief physical ability scale for long-term outcomes of stroke. State Art Rev Phys Med Rehabil 1997;11:385-96. 16. Tesio L, Granger CV, Fiedler RC. A unidimensional pain/disability measure for low-back pain syndromes. Pain 1997;69:269-78. 17. Wolfe F, Kong S. Rasch analysis of the Western Ontario McMaster Questionnaire (WOMAC) in 2205 patients with osteoarthritis, rheumatoid arthritis, and fibromyalgia. Anti Rheum Dis 1999;58: 563-8. 18. Wiren L, Whalley D, McKenna S, Wilhelmsen L. Application of a disease-specific, quality-of-life measure (QOL-AGHDA) in growth hormone-deficient adults and a random population sample in Sweden: validation of the measure by Rasch analysis. Clin Endocrinol 2000;52:143-52. 19. Doble S, Fisher AG. The dimensionality and validity of the Older Americans Resources and Services (OARS) Activities of Daily Living (ADL) Scale. J Outcome Meas 1998;2:4-24. 20. Wright BD, Linacre JM. Observations are always ordinal: measurement, however, must be interval. Arch Phys Med Rehabil 1989;70:857-60. 21. Sacco RL, DeRosa JT, Haley EC Jr, et al. Glycine antagonist in neuroprotection for patients with acute stroke: GAIN Americas: a randomized controlled trial. JAMA 2001;285:1719-28. 22. Wright BD, Stone MH. Best test design: Rasch measurement. Chicago: MESA Pr; 1979. 23. Wright BD, Linacre JM. Winsteps: a Rasch model computer program, version 3.31. Chicago: MESA Pr; 2001. 24. Fisher WP Jr. Reliability statistics. Rasch Meas Trans 1992;6:238. 25. Linacre JM. Investigating rating scale category utility. J Outcome Meas 1999;3:103-22. Supplier a. Winsteps, PO Box 811322, Chicago, IL 60681-1322.

Arch Phys Med Rehabil Vol 84, July 2003