Reliability and Responsiveness of the Activities of Daily Living Computerized Adaptive Testing System in Patients With Stroke

Reliability and Responsiveness of the Activities of Daily Living Computerized Adaptive Testing System in Patients With Stroke

Accepted Manuscript Reliability and responsiveness of the Activities of Daily Living Computerized Adaptive Testing system in patients with stroke Ya-C...

224KB Sizes 1 Downloads 48 Views

Accepted Manuscript Reliability and responsiveness of the Activities of Daily Living Computerized Adaptive Testing system in patients with stroke Ya-Chen Lee , MS Wan-Hui Yu , MS Yu-Fen Lin , MS I-Ping Hsueh , MA Hung-Chia Wu , MS Ching-Lin Hsieh , PhD PII:

S0003-9993(14)00347-5

DOI:

10.1016/j.apmr.2014.04.025

Reference:

YAPMR 55832

To appear in:

ARCHIVES OF PHYSICAL MEDICINE AND REHABILITATION

Received Date: 19 December 2013 Revised Date:

11 March 2014

Accepted Date: 24 April 2014

Please cite this article as: Lee Y-C, Yu W-H, Lin Y-F, Hsueh I-P, Wu H-C, Hsieh C-L, Reliability and responsiveness of the Activities of Daily Living Computerized Adaptive Testing system in patients with stroke, ARCHIVES OF PHYSICAL MEDICINE AND REHABILITATION (2014), doi: 10.1016/ j.apmr.2014.04.025. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1

3 4 5 6 7

Running head: Reliability and responsiveness of the Activities of Daily Living Computerized Adaptive Testing system Title: Reliability and responsiveness of the Activities of Daily Living Computerized Adaptive Testing system in patients with stroke

RI PT

2

Title Page

Authors: Ya-Chen Lee, MS; Wan-Hui Yu, MS; Yu-Fen Lin, MS; I-Ping Hsueh, MA; Hung-Chia Wu, MS; Ching-Lin Hsieh, PhD

Authors’ affiliations: Ya-Chen Lee, Wan-Hui Yu, Yu-Fen Lin, I-Ping Hsueh & Ching-Lin

9

Hsieh: School of Occupational Therapy, College of Medicine, National Taiwan University

SC

8

and Department of Physical Medicine and Rehabilitation, National Taiwan University

11

Hospital, Taipei, Taiwan. Hung-Chia Wu: Department of Physical Medicine and

12

Rehabilitation, E-Da Hospital, Kaohsiung, Taiwan.

13

Acknowledgment of financial support:

14

This study was supported by a research grant from the National Health Research Institutes

15

(NHRI-EX102 -1007PI) and the E-Da Hospital (EDAHT 102008 & EDAHT 103024).

TE D

M AN U

10

16

Financial Disclosure Statement:

18

No party having a direct interest in the results of the research supporting this article has or

19

will confer a benefit on us or on any organization with which we are associated. We certify

20

that all financial and material support for this research (eg, NHRI or EDAHT grants in

21

Taiwan) are clearly identified in the title page of the manuscript.

23

AC C

22

EP

17

Address for correspondence and reprints: I-Ping Hsueh, School of Occupational Therapy,

24

College of Medicine, National Taiwan University, 4F, No 17, Xuzhou Rd, Taipei 100,

25

Taiwan.

26

Fax: +886-2-23511331

27

Tel: +886-2-33668174

28

E-mail: [email protected]

ACCEPTED MANUSCRIPT

1

Reliability and responsiveness of the Activities of Daily Living Computerized Adaptive

2

Testing system in patients with stroke Abstract

4

Objective: To examine the intra-rater reliability, inter-rater reliability, and responsiveness of

5

the Activities of Daily Living Computerized Adaptive Testing system (ADL CAT) in patients

6

with stroke.

7

Design: One repeated-measures design (at an interval of 7 days) was used to examine the

8

intra-rater reliability and inter-rater reliability of the ADL CAT. For the responsiveness study,

9

participants were assessed with the ADL CAT at admission to the rehabilitation ward and at

M AN U

SC

RI PT

3

discharge from the hospital.

11

Setting: Eight rehabilitation units.

12

Participants: Three different (non-overlapping) patient groups were recruited. Fifty-five and

13

42 outpatients with chronic stroke participated in the intra-rater and inter-rater reliability

14

studies, respectively; 60 inpatients who had recently had a stroke participated in the

15

responsiveness study.

16

Interventions: Not applicable.

17

Main Outcome Measure: The ADL CAT.

18

Results: The intraclass correlation coefficient (ICC) values were 0.94 and 0.80 for the ADL

19

CAT in the intra-rater reliability and inter-rater reliability studies, respectively. The Classical

20

Test Theory-based minimal detectable change (MDCCTT) values were 6.5 and 9.5 for the

21

ADL CAT in the intra-rater reliability and inter-rater reliability studies, respectively. The

22

Kazis’ effect size and standardized response mean of the ADL CAT were moderate

23

(0.62-0.73).

AC C

EP

TE D

10

1

ACCEPTED MANUSCRIPT

Conclusions: The results of this study showed that the ADL CAT has good intra-rater

2

reliability and inter-rater reliability in outpatients with chronic stroke and sufficient

3

responsiveness in inpatients with stroke undergoing inpatient rehabilitation. Further

4

investigations on the responsiveness of the ADL CAT in the outpatients are needed to obtain

5

more evidence on the utility of the ADL CAT.

RI PT

1

6

Key Words: reliability; responsiveness; activities of daily living; computerized adaptive

8

testing; stroke.

SC

7

M AN U

9

AC C

EP

TE D

10

2

ACCEPTED MANUSCRIPT

List of abbreviations:

2

ADL activities of daily living

3

ADL CAT ADL Computerized Adaptive Testing system

4

BADL basic ADL

5

BI Barthel Index

6

CI confidence interval

7

CTT classical test theory

8

FAI Frenchay Activities Index

9

IADL instrumental ADL ICC intraclass correlation coefficient

11

IRT item response theory

12

LOA limits of agreement

13

MDC minimal detectable change

14

SEE standard error of estimate

15

SEM standard error of measurement

16

SRM standardized response mean

EP AC C

17

TE D

10

M AN U

SC

RI PT

1

3

ACCEPTED MANUSCRIPT

1

Stroke is a major cause of disability in activities of daily living (ADL) among the elderly.1-3 Assessing ADL is important for clinicians in planning ADL intervention,

3

estimating care requirements, and monitoring outcomes.4 To be clinically useful, a short and

4

precise ADL measure is preferred to improve the administrative efficiency and reduce

5

assessment burden.1

6

RI PT

2

The ADL Computerized Adaptive Testing system (ADL CAT) was developed to achieve both efficiency and precision of ADL assessments.1 The ADL CAT has three advantages.

8

First, the ADL CAT is quick to complete. The ADL CAT chooses only items tailored to a

9

patient and skips items that are either too easy or too difficult for patients; it requires an

10

average of only 88 seconds to complete.1 Such efficiency is unlikely to be achieved with

11

traditional measures, such as the Functional Independence Measure and Frenchay Activities

12

Index (FAI). Thus, the ADL CAT can enhance the efficiency of administration and reduce the

13

assessment burden on patients and raters.5 Second, the ADL CAT assesses a broad spectrum

14

of ADL function. Commonly, ADL refers to basic ADL (BADL).6, 7 However, assessing

15

BADL does not capture the information on higher levels of ADL functions that are necessary

16

for independence in the home and community (i.e., instrumental ADL, IADL).3 The ADL

17

CAT combines both the BADL and IADL items into one item bank to comprehensively

18

assess patients’ ADL functions.1 Third, the ADL CAT takes into account gender differences in

19

performing some IADL items (i.e., domestic chores) and thus assigns different weights to

20

these IADL items to prevent underestimation of male patients’ ADL function in performing

21

domestic chore items.1 Due to the aforementioned advantages, the ADL CAT demonstrates

22

great potential for use in clinical and research settings.

23

AC C

EP

TE D

M AN U

SC

7

Validity of the ADL CAT has been well examined.1 The ADL CAT has high concurrent

4

ACCEPTED MANUSCRIPT

validity (Pearson’s r=0.82) with the combined Barthel Index (BI, assessing BADL) and FAI

2

(assessing IADL) in patients with stroke.1 In addition, the 34 BADL and IADL items of the

3

ADL CAT item bank are one-dimensional.1 Thus, the construct validity of the item bank of

4

the ADL CAT is supported.

5

RI PT

1

However, some other important psychometric properties, such as intra-rater reliability, inter-rater reliability, and responsiveness of the ADL CAT, are still unknown, thus limiting

7

the utility of the measure. Intra-rater reliability reflects the extent of consistency between

8

repeated assessments administered by the same rater.8 Inter-rater reliability indicates whether

9

different raters give consistent scores when administering a measure to the same group of

10

patients.8 The responsiveness refers to a measure’s ability to detect change that occurs as a

11

result of therapy or disease progression.9 It is critical for the ADL CAT to have sufficient

12

intra-rater reliability, inter-rater reliability, and responsiveness to ensure its utility in clinical

13

and research settings. Thus, the aims of this study were (1) to examine the intra-rater

14

reliability and inter-rater reliability of the ADL CAT; (2) to investigate the responsiveness of

15

the ADL CAT in patients with stroke.

TE D

M AN U

SC

6

METHODS

EP

16 Participants

18

Intra-rater and inter-rater reliability

AC C

17

19

We recruited two convenience samples of outpatients with chronic stroke from the

20

Department of Physical Medicine and Rehabilitation at seven hospitals between March 2011

21

and May 2012. One convenience sample was for examining intra-rater reliability; the other

22

was for examining inter-rater reliability. All participants met the following criteria: (1)

23

diagnosis of cerebral hemorrhage or cerebral infarction; (2) having had a stroke recently, ≥6

5

ACCEPTED MANUSCRIPT

months. In addition, all participants received traditional rehabilitation (e.g., occupational

2

therapy, physical therapy, or speech and language pathology where needed, 1 to 3 times per

3

week for each therapy). The traditional rehabilitation provided trainings in ADL, mobility,

4

endurance and strength, balance, communication, or language-based skills. We excluded

5

patients with major comorbidities (e.g., dementia or rheumatoid arthritis) or recurrent stroke

6

during the study period that might influence ADL functioning.

7

Responsiveness

SC

9

We recruited a consecutive sample of patients undergoing inpatient rehabilitation at one hospital from May 2011 to January 2013. The inclusion criterion for selecting participants

M AN U

8

RI PT

1

was diagnosis of cerebral hemorrhage or cerebral infarction. All participants received

11

inpatient rehabilitation services (e.g., occupational therapy, physical therapy, or speech and

12

language pathology therapy, 3-5 times a week for each therapy). The inpatient rehabilitation

13

services focused on trainings in ADL, mobility, motor recovery, endurance and strength,

14

balance, chewing, or swallowing, where appropriate. Patients with major comorbidities were

15

excluded. Moreover, we excluded patients who stayed in the ward for less than 7 days

16

because their ADL functions tended to be stable, as indicated by the short hospital stay. The

17

whole study was approved by the local institutional review boards.

18

Procedure

EP

AC C

19

TE D

10

Prior to the study, the raters (rater A and B, both occupational therapists) received at

20

least 2 hours of training from the first author (a very experienced ADL CAT user) on the

21

administration of the ADL CAT. During the training section, the raters had to familiarize

22

themselves with the items, response categories, interview procedures and scoring. At the end

23

of training section, both raters individually interviewed 4 to 6 patients while the first author

6

ACCEPTED MANUSCRIPT

1

observed and scored at the same time. Then the raters’ interview procedures and scoring

2

results were checked by the first author to ensure that the procedures and results were

3

satisfactory. During the study, the raters interviewed the patient and his/her primary caregiver, if

RI PT

4

available, to assess the patient’s level of independence in daily life. The raters asked the

6

patient whether he or she had done a specific ADL task in the pre-specified time frame

7

(“whether or not the patient actually put on pants or shorts him/herself in the previous 1-2

8

days before assessment”). If the patients had done the task, the rater asked whether the

9

patient had done it by himself/herself or with assistance. If it was the latter, the rater further

M AN U

SC

5

asked the level of assistance during the task. If we obtained the responses from both the

11

patient and his/her primary caregiver, but there was a discrepancy, the raters further clarified

12

the discrepancy with the patient and his/her primary caregiver. After further clarification, if

13

the discrepancies still existed, the rater would further check with the patient and his/her

14

caregiver simultaneously to determine how the patient actually performed the ADL task

15

within the time frame. If a patient had difficulty responding to the interview (e.g., a patient

16

with aphasia or cognitive-perceptual deficits), the patient’s primary caregiver was

17

interviewed instead.

18

Intra-rater reliability

20 21 22

EP

AC C

19

TE D

10

The ADL CAT was administered to the participants twice by rater A, 7 days apart. Inter-rater reliability

Raters A and B independently interviewed and scored the participants, 7 days apart, and did not communicate with each other or the first author throughout the study. Such an

7

ACCEPTED MANUSCRIPT

1

independent interview and scoring design was intended to simulate the actual context of

2

ADL assessments in both clinical and research settings.

3

Responsiveness The ADL CAT was administered to the patients by rater B at admission to the

RI PT

4

rehabilitation ward and at discharge from the hospital.

6

Measures

7

ADL CAT. The ADL CAT, containing an item bank with 11 BADL tasks and 23 IADL tasks,

8

can be administered using a personal digital device (e.g., smart phone, tablet) via the

9

Internet.1 The ADL CAT assesses a patient’s actual performance (in terms of level of

M AN U

SC

5

assistance) in his/her daily tasks, which provides information on patients’ ADL functions

11

regarding the integration of the physical environment, personal assistance, and the use (or

12

nonuse) of assistive technology. Thus, the results of the ADLCAT indicate the level of

13

dependence/disability of a patient in real life and can be viewed as an outcome indicator and

14

as an indicator of the level of burden of long- term care.3, 10

TE D

10

In the initial development of the ADL CAT, a total of 4 response categories (“totally

16

dependent,” “partially dependent,” “sometimes independent, but not every time,” and “totally

17

independent, every time”) were proposed. However, after estimation of the item parameters,

18

a few of the items showed response category reversal and their response categories were

19

collapsed.1 The renewed version of the ADL CAT has items with 3 kinds of response

20

categories (2, 3, and 4 categories). However, such variation in the response categories can be

21

confusing for prospective users. To achieve ease of scoring, the developers of the ADL CAT

22

kept the 4-category design of the response categories of the 34 items.1

AC C

EP

15

8

ACCEPTED MANUSCRIPT

The ADL CAT presents subsequent items on the basis of the responses of the patients.

2

For example, if a female patient was totally dependent on someone to complete the task of

3

putting on pants or shorts (lowest level of independence), the patient and/or her primary

4

caregiver was asked about whether or not the patient went to the toilet to eliminate urine

5

herself, and after elimination, arranged clothes and cleaned herself (an easier task than the

6

“putting on pants or shorts” task). On the other hand, if the patient independently completed

7

the “putting on pants or shorts” task herself every time (highest level of independence), the

8

patient and/or her primary caregiver was asked about whether the patient wash dishes during

9

past week (a harder task than the “putting on pants or shorts” task).

SC

M AN U

10

RI PT

1

The stopping rule of the ADL CAT for each patient is either reliability (estimated by item response theory (IRT)) > .90 or a maximum test length of 7 items.1 The original ADL

12

CAT scores are standardized scores ranging from -2.65 to 2.56. For easier interpretation, the

13

standardized scores are transformed to T scores (mean=50, SD=10). The T score of the ADL

14

CAT ranges from 22.0 to 77.2.1 When a patient independently performs all BADL tasks but

15

none of the IADL tasks presented by the CAT, the highest possible T score is 55. Because

16

IADL tasks are usually more difficult than BADL tasks for patients, a T score of 55 can be

17

generally used to indicate whether a patient can independently perform all BADL tasks.

18

Analysis

19

Intra-rater and inter-rater reliability. Intraclass correlation coefficients (ICC2,1) were

20

employed to examine the extent of agreement between repeated assessments of the ADL CAT

21

administered twice by the same rater (intra-rater reliability) or by two raters individually

22

(inter-rater reliability). ICC values from 0.90-0.99 indicate high reliability; those from

AC C

EP

TE D

11

9

ACCEPTED MANUSCRIPT

1

0.80-0.89, good reliability; those from 0.70-0.79, fair reliability; those below 0.69, poor

2

reliability.11

3

We estimated the minimal detectable change (MDC) of the ADL CAT administered by a single rater (intra-rater) and by different raters (inter-rater). The MDC is the smallest

5

threshold of change scores that are detectable and beyond random error at a certain level of

6

confidence (usually 95%).The value of the MDC can be used as a threshold to determine

7

whether the changed score on a measure of an individual patient has reached a real

8

improvement (or deterioration) or is due to the measurement error. 12, 13 The MDC values

9

were estimated using Classical Test Theory (CTT) and IRT. The MDCCTT uses a single

M AN U

SC

RI PT

4

estimate of random measurement error for the measure all over the whole score continuum.14

11

That is, a single value of MDCCTT can be used for all levels of ADL function (all scores of

12

the ADL CAT in this study). On the other hand, the MDCIRT can be estimated for each

13

assessment and takes into account that a measure does not assess a characteristic equally well

14

and precisely over the whole score continuum.15, 16 The value of MDCIRT of each score of a

15

measure depends on the items tested.17

In CTT, both MDC values were calculated on the basis of the standard error of

EP

16

TE D

10

measurement (SEM) using the following formula:12

18

MDCCTT=z-score level of confidence × √2SEM

(1)

19

SEM=SD all testing scores× √1 − 

(2)

20

AC C

17

The z-score represents the confidence interval (CI) from a standard normal distribution

21

(i.e., 1.96 for 95% CI in this study). The SD all testing scores means the SD of all scores of two

22

assessments, and r is the coefficient of the intra-rater or inter-rater reliability (ICC value).

23

The multiplier of √2 refers to the additional uncertainty introduced by inclusion of scores

10

ACCEPTED MANUSCRIPT

1

from two separate assessments.12 Using CTT, the MDC values are invariant for every ability

2

estimate along the scale, because SEM depends on population distribution and is assumed the

3

same for each person.18

5 6

The MDCCTT value was considered acceptable when this value was<20% of the mean of

RI PT

4

all scores in a measure.19

The IRT-based MDC (MDCIRT) was calculated on the basis of standard error of estimate (SEE) using the following formula:

8

MDCIRT = z-score (1.96) × SEE    + SEE  

(3)

In IRT, one can generate a SEE for each individual score. The SEE allows users to

M AN U

9

SC

7

10

estimate the precision (or reliability) of a particular measurement.10 We calculated the

11

MDCIRT for each patient using the SEE for each patient’s first and second test scores in the

12

intra-rater and inter-rater reliability studies. The MDCIRT was also plotted for each patient. We also used Bland-Altman plots with 95% limits of agreement (LOA) to visually

TE D

13

examine the agreement between two repeated assessments.20 In these plots, the differences (d)

15

between each pair of observations were presented against the average value for each pair of

16

observations. Assuming that differences follow the standard normal distribution, 95% of the

17

differences will lie between d ±1.96SDdiff (i.e., LOA), where SDdiff represents the standard

18

deviation of differences. Moreover, these plots were used to illustrate the heteroscedasticity

19

in the representation of a tendency: changes in repeated assessments generally increase as the

20

average score of the assessments increase. The possibility of heteroscedasticity was evaluated

21

according to the association (i.e., Pearson’s r) between the average and the absolute change

22

in each pair of assessments. The data was considered heteroscedastic when r > 0.3.21

AC C

EP

14

11

ACCEPTED MANUSCRIPT

1

Paired t test was also performed to determine whether there were significant differences between repeated assessments in the intra-rater or inter-rater condition.

3

IRT reliability and test length in both reliability studies. We calculated IRT reliability for

4

each patient in both assessments in the intra-rater and inter-rater reliability studies on the

5

basis of the SEE using the following formula:

6

IRT reliability=1-SEE2

9

In addition, we calculated the test length by counting the number of items presented in

SC

8

(4)

each interview.

Paired t test was performed to determine whether there were significant differences in

M AN U

7

RI PT

2

IRT reliability/test length between repeated assessments in the intra-rater and inter-rater

11

conditions.

12

Responsiveness. The responsiveness of the ADL CAT was examined using two types of

13

effect size. First, Kazis’ effect size22 was calculated by dividing the mean change in score

14

between admission and discharge measurements by the SD of the admission score. Second,

15

standardized response mean (SRM)23 was obtained by dividing the mean change by the SD

16

of the change in admission and discharge scores. An effect size greater than 0.80 was

17

considered large, one of 0.50 to 0.80, moderate, and one of 0.20-0.49, small.24 We expected

18

that the responsiveness of the ADL CAT would be moderate to high. In addition, we used

19

paired t test to determine the statistical significance of the changes in scores on the ADL

20

CAT.

22

EP

AC C

21

TE D

10

RESULTS

Intra-rater reliability

12

ACCEPTED MANUSCRIPT

1

A total of 55 patients participated in the study. Of these, two patients had severe aphasia and 1 had severe cognitive-perceptual deficits. Their mean age was 74.3 years, and 73% of

3

the patients were male (Table 1). Table 2 shows that the mean scores of the ADL CAT at the

4

first and second assessments were 50.9 and 51.3, respectively, indicating that, on average, our

5

participants independently performed most BADL tasks but none of the IADL tasks.

RI PT

2

The ICC and MDCCTT values for the ADL CAT were 0.94 and 6.5 for the intra-rater

7

reliability study, respectively (Table 2). The MDCCTT was 12.8% of the mean of all scores of

8

the intra-rater assessments.

Figure 1 shows that the LOA of the ADL CAT ranged from -6.3 to 6.9. The association

M AN U

9

SC

6

between the average and the absolute change scores in each pair of assessments was less than

11

0.3 (Pearson’s r=0.12). In addition, there was no significant difference between intra-rater

12

assessments (p>0.05, Table 2).

13

Inter-rater reliability

TE D

10

A total of 42 patients participated in the study. Three of the 42 patients had severe

15

cognitive-perceptual deficits. Their mean age was 63.6 years, and 57.0% of the patients were

16

male (Table 1). Table 2 shows that the mean scores at the first and second assessments were

17

49.4 and 50.6, respectively.

EP

14

The ICC and MDCCTT values for the ADL CAT were 0.80 and 9.5 for the inter-rater

19

reliability study, respectively (Table 2). The MDCCTT was 18.9% of the mean of all scores of

20

the inter-rater assessments.

21 22

AC C

18

Figure 2 shows that the LOA of the ADL CAT ranged from -8.2 to 10.5. The association between the average and the absolute change scores in each pair of assessments

13

ACCEPTED MANUSCRIPT

1

was less than 0.3 (Pearson’s r=0.13). Moreover, no significant differences were found

2

between inter-rater assessments (p>0.05, Table 2).

3

Figure 3 shows the MDCIRT plots. The MDCIRT, as expected, was much smaller in the middle of the ADL continuum in our intra-rater and inter-rater reliability samples (i.e., a

5

patient with a T score of 50.3 had the lowest MDCIRT of 5.4). The MDCIRT was much higher

6

at the very low or very high end of the ADL continuum (e.g., a patient with the lowest T

7

score (22.0) had the highest MDCIRT of 16.2).

8

IRT reliability and test length in both reliability studies

SC

Table 3 shows that the mean IRT reliability was very high for the patients’ ADL CAT

M AN U

9

RI PT

4

scores (about 0.93 for each of the two studies). The mean test length was short (about 4 items

11

for the two studies). There was no significant difference in terms of the IRT reliability and

12

test length in the two studies (i.e., p≥0.362).

13

Responsiveness

14

TE D

10

A total of 71 patients were originally recruited. Eleven patients withdrew from the study because they were discharged early without notice. Sixty patients completed both baseline

16

and follow-up assessments. These patients were not significantly different from those who

17

withdrew from the study in terms of demographic characteristics (i.e. age and gender) or

18

ADL function (i.e. the ADL CAT baseline scores) (p>0.11). The median number of days from

19

onset to initial evaluation for these patients was 20 (minimum~maximum=9~50), indicating

20

that most of the patients were in the subacute stage (Table 4). Table 2 shows that the

21

admission and discharge scores of the ADL CAT were 39.8 and 45.9, respectively,

22

demonstrating a tendency of increase between the scores at admission and discharge.

AC C

EP

15

14

ACCEPTED MANUSCRIPT

1 2

The Kazis’ effect size and SRM of the ADL CAT were moderate (0.62-0.73) (Table 2). The change in scores on the ADL CAT was significant (p<0.001). DISCUSSION

3

Establishing intra-rater reliability, inter-rater reliability, and responsiveness is important

RI PT

4

for ensuring the utility of the ADL CAT. Our findings provide empirical evidence on these

6

important properties of the ADL CAT in patients with stroke for clinicians and researchers.

7

Our results showed that the ICC value (0.94) for the intra-rater agreement of the ADL

SC

5

CAT was high. The ICC value (0.80) for the inter-rater agreement of the ADL CAT was

9

good. The inter-rater agreement appeared slightly lower than the intra-rater agreement. Two

M AN U

8

possible reasons have been proposed. First, the ADL CAT scores’ variance between

11

participants (SD=7.8) in the inter-rater reliability study was smaller than those between

12

participants (SD=9.4) in the intra-rater reliability study.25 Because ICC reflects the ratio of

13

variance between participants to total variance (between-and within-participant variances),

14

this value becomes smaller when the between-participant variance decreases.25 Second,

15

training of only two hours on the administration of the ADL CAT might not be sufficient to

16

ensure the raters attain equally high proficiency in interview and scoring skills over time.

17

Although the inter-rater agreement of the ADL CAT was considered good (ICC=0.80),

18

rigorous rater training is recommended to ensure stable assessments within or between raters.

EP

AC C

19

TE D

10

The MDCsCTT were 6.5 points and 9.5 points for the ADL CAT in the intra-rater

20

reliability and inter-rater reliability studies, respectively. As expected, the MDCCTT obtained

21

from different raters was higher than that obtained from an individual rater. Both MDCCTT

22

values were < 20% of the mean of all scores and < 1 SD (10 of the T scores) of the ADL

23

CAT, indicating an acceptable level of random measurement error. More importantly, these

15

ACCEPTED MANUSCRIPT

MDCCTT values are useful for users when judging whether the change scores in consecutive

2

ADL CAT assessments signify real changes or random variations.26 For example, a change

3

score greater than 9.5 points between consecutive assessments administered by the different

4

raters can be interpreted as a real change (i.e., beyond random measurement error) with 95%

5

confidence.

6

RI PT

1

We calculated both the MDCCTT and the MDCIRT because each has its own merits. The MDCCTT takes into account the effects of raters’ inconsistency in testing (either intra-rater or

8

inter- rater interviewing skills and judgment). Furthermore, MDCCTT, a single value, is

9

applied for all patients no matter their level of functions. Because there was apparent

M AN U

SC

7

10

variability between inter-rater assessments of the ADL CAT (ICC=0.80), we recommend that

11

prospective users employ inter-rater MDCCTT (9.5) when the ADL CAT is administered by

12

different raters.

In contrast, the MDCIRT, generated using the SEE for each individual score, does not

TE D

13

take into account the rater’s effect. The value of MDCIRT depends on the quality of the item

15

bank of a CAT.27 That is, a better item bank will generate a lower MDCIRT. The ADL CAT

16

can calculate the MDCIRT automatically with each patient’s ADL score. Particularly, the ICC

17

value (0.94) for the intra-rater reliability was so high that the effect of raters’ inconsistency

18

may be negligible for the ADL CAT. Thus, such MDCIRT information, which is automatically

19

reported by the ADL CAT, can be very useful for users to determine whether a patient’s

20

change score is beyond random measurement error.

AC C

21

EP

14

However, the MDCIRT is usually large for patients with an extremely low or high level

22

of functioning. For example, the patient who had the lowest ADL CAT score (22.0) had a

23

very high MDCIRT, 16.2. The results indicate that the ADL CAT did not provide reliable

16

ACCEPTED MANUSCRIPT

information to describe the ADL function of patients with an extremely low level of ADL

2

functioning. Therefore, the item bank for the ADL CAT should in the future be extended at

3

both ends of ADL function by the addition of some very easy and very hard items, so that

4

better reliability (low random measurement error) can be achieved for patients with extreme

5

ADL functions.

6

RI PT

1

The Bland-Altman plots show that the mean scores of repeated assessments were

scattered throughout almost the entire range of possible scores (i.e., from 22.0 to 66.6 for the

8

intra-rater reliability study) of the ADL CAT, implying that our participants had a wide range

9

of ADL function. The associations between the average and the absolute change in each pair

10

of assessments in both the intra-rater reliability and the inter-rater reliability studies were less

11

than 0.3, indicating that heteroscedasticity did not exist. That is, the differences in repeated

12

assessments did not increase as the average score of the assessments increased. Moreover,

13

the results of paired t test showed no systematic bias between either intra-rater or inter-rater

14

assessments. Our findings demonstrate that the ADL CAT is reliable in assessing a wide

15

range of ADL function in patients with stroke over time.

M AN U

TE D

Moreover, the ADL CAT had high IRT reliability (≥.93) and a short test length (4 items,

EP

16

SC

7

on average) in both reliability studies. In addition, the differences of the IRT reliability and

18

test length in intra-rater assessments and inter-rater assessments were not significant. These

19

observations further support that the ADL CAT is reliable and efficient when used repeatedly

20

to assess patients’ ADL functions.

AC C

17

21

We found that the responsiveness of the ADL CAT was moderate (Kazis’ effect

22

size=0.62 and SRM=0.73) in detecting change in patients with stroke undergoing inpatient

23

rehabilitation. Such a result partially supports our original hypothesis that the responsiveness

17

ACCEPTED MANUSCRIPT

of the ADL CAT was moderate to high. A possible reason for the moderate responsiveness of

2

the ADL CAT is that our participants in the responsiveness study were all staying in

3

hospitals. IADL is not commonly performed by inpatients. Therefore, their scores on the

4

ADL CAT were generally limited within the score range of the BADL function, which might

5

have compromised the responsiveness. Nevertheless, the results support the value of the

6

ADL CAT in detecting the change of ADL function in inpatients with stroke.

The ADL CAT not only eases the administration burden on both clinicians and patients

SC

7

RI PT

1

but also improves the efficiency of patient management. The ADL CAT provides instant

9

outcome reports and automatic storage of results, which further increase the efficiency of

M AN U

8

data collection and management.28 Furthermore, the data of the ADL CAT are collected

11

directly from patients themselves and/or their caregivers, which is in line with

12

patient-reported outcomes approach. Since the information on ADL function is gathered from

13

the patients’ perspectives, the ADL CAT is useful in assisting clinicians to develop treatment

14

plans and monitor outcomes toward patient-centered care. In the future, the ADL CAT can be

15

combined with other CATs (e.g., the Balance CAT, the CAT- Fugl-Meyer motor scale)28-30 to

16

further extend the utility of the CAT in validation of treatment effectiveness, decision-making,

17

and data management.

18

Study Limitations

AC C

EP

TE D

10

19

This study has five limitations. First, our samples for the intra-rater reliability and

20

inter-rater reliability were convenience samples (with a few patients having severe aphasia

21

and cognitive-perceptual deficits). Thus, our results cannot be generalized to all stroke

22

populations. Second, we did not record whether the responses were from patients, primary

23

caregivers, or both. Some primary caregivers were not present, in which case only the

18

ACCEPTED MANUSCRIPT

patients were interviewed. The inconsistency of informants between the first and second

2

interviews might have caused underestimations of the intra-rater reliability and inter-rater

3

reliability in this study. Third, we examined the responsiveness of the ADL CAT in inpatients

4

with stroke receiving rehabilitation. Because inpatients are not likely to perform IADL, the

5

generalization of our results is limited. Fourth, the item bank of ADL CAT has not been

6

validated exclusively in inpatients. Thus, an investigation of validity in inpatients is needed

7

to further validate the ADL CAT. Fifth, whether the item parameters of ADL item bank are

8

stable over time (i.e., differential item functioning [DIF] due to time or the other factors,

9

except for gender1) has not been examined. It is important that ADL items for use as an

M AN U

SC

RI PT

1

10

outcome indicator over time not have a time-DIF. Future studies are needed to examine the

11

time-DIF of the ADL item bank to ensure that the item parameters are stable over time and

12

that our findings of responsiveness are robust.

CONCLUSIONS

14

TE D

13

The results of this study showed that the ADL CAT has good intra-rater reliability and inter-rater reliability in outpatients with chronic stroke and moderate responsiveness in

16

inpatients with stroke undergoing inpatient rehabilitation. The ADL CAT is very efficient,

17

needing about 4 items, on average, to complete the assessments, and this efficiency is

18

unlikely to be achieved with traditional measures (e.g., the 10-item BI or 15-item FAI).

19

Further investigations on the responsiveness of the ADL CAT in outpatients are needed

20

before it is used as an outcome measure in outpatients with stroke.

AC C

21

EP

15

19

ACCEPTED MANUSCRIPT

References

1 2

1.

Hsueh IP, Chen JH, Wang CH, Hou WH, Hsieh CL. Development of a computerized adaptive test for assessing activities of daily living in outpatients with stroke. Phys

4

Ther 2013;93:681-93. 2.

recovery in stroke. Stroke 1997;28:550-6.

6 7

Stineman MG, Maislin G, Fiedler RC, Granger CV. A prediction model for functional

3.

Hsieh CL, Hoffmann T, Gustafsson L, Lee YC. The diverse constructs use of

SC

5

RI PT

3

activities of daily living measures in stroke randomized controlled trials in the years

9

2005-2009. J Rehabil Med 2012;44:720-6.

10

4.

M AN U

8

Hsieh CL, Sheu CF, Hsueh IP, Wang CH. Trunk control as an early predictor of

11

comprehensive activities of daily living function in stroke patients. Stroke

12

2002;33:2626-30.

Associates; 2000.

14 15

6.

Hsueh IP, Wang WC, Sheu CF, Hsieh CL. Rasch analysis of combining two indices to assess comprehensive ADL function in stroke patients. Stroke 2004;35:721-6.

16 17

Howard W. Computerized adaptive testing: a primer. Mahwan: Lawrence Erlbaum

TE D

5.

7.

EP

13

Kelly-Hayes M, Robertson JT, Broderick JP, Duncan PW, Hershey LA, Roth EJ, et al. The American Heart Association Stroke Outcome Classification. Stroke

19

1998;29:1274-80.

20

8.

23

Portney LG, Watkins MP. Fundations of clinical research: applications to practice. Upper Saddle River: Pearson Prentice Hall; 2009.

21 22

AC C

18

9.

Tamanini JT, Dambros M, D'Ancona CA, Palma PC, Rodrigues-Netto N, Jr. Responsiveness to the Portuguese version of the International Consultation on

20

ACCEPTED MANUSCRIPT

1

Incontinence Questionnaire-Short Form (ICIQ-SF) after stress urinary incontinence

2

surgery. Int Braz J Urol 2005;31:482-9; discussion 90. 10.

measurements. J Rehabil Med 2007;39:585-90.

4 5

Jette AM, Tao W, Norweg A, Haley S. Interpreting rehabilitation outcome

11.

RI PT

3

Arnall FA, Koumantakis GA, Oldham JA, Cooper RG. Between-days reliability of

electromyographic measures of paraspinal muscle fatigue at 40, 50 and 60% levels of

7

maximal voluntary contractile force. Clin Rehabil 2002;16:761-71. 12.

used in physical therapy. Phys Ther 2006;86:735-43.

9 10

Haley SM, Fragala-Pinkham MA. Interpreting change scores of tests and measures

13.

M AN U

8

SC

6

Lu WS, Wang CH, Lin JH, Sheu CF, Hsieh CL. The minimal detectable change of the

11

simplified stroke rehabilitation assessment of movement measure. J Rehabil Med

12

2008;40:615-9. 14.

Chakravarty EF, Bjorner JB, Fries JF. Improving patient reported outcomes using

TE D

13 14

item response theory and computerized adaptive testing. J Rheumatol

15

2007;34:1426-31. 15.

Wang YC, Hart DL, Werneke M, Stratford PW, Mioduski JE. Clinical interpretation

EP

16

of outcome measures generated from a lumbar computerized adaptive test. Phys Ther

18

2010;90:1323-35.

19

16.

AC C

17

Hart DL, Wang YC, Cook KF, Mioduski JE. A computerized adaptive test for patients

20

with shoulder impairments produced responsive measures of function. Phys Ther

21

2010;90:928-38.

22 23

17.

Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care 2000;38:II28-42.

21

ACCEPTED MANUSCRIPT

1

18.

Lawrence Erlbaum Association; 2000.

2 3

Embretson S, Reise S. Item response theory for psychologist. Mahwah, New Jersey:

19.

Flansbjer UB, Holmback AM, Downham D, Patten C, Lexell J. Reliability of gait performance tests in men and women with hemiparesis after stroke. J Rehabil Med

5

2005;37:75-82. 20.

21.

Cohen J. Statistical power analysis for the behavioral sciences. Lawrence Erlbaum; 1988.

Bruton A, Conway JH, Holgate ST. Reliability? what is it, and how is it measured?

EP

25.

Physiothe 2000;86:94-9.

17 18

TE D

24.

15 16

Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care 1990;28:632-42.

13 14

Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989;27:S178-89.

23.

M AN U

22.

11 12

Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med 1998;26:217-38.

9 10

SC

methods of clinical measurement. Lancet 1986;1:307-10.

7 8

Bland JM, Altman DG. Statistical methods for assessing agreement between two

26.

Hsueh IP, Wang CH, Liou TH, Lin CH, Hsieh CL. Test-retest reliability and validity

AC C

6

RI PT

4

19

of the comprehensive activities of daily living measure in patients with stroke. J

20

Rehabil Med 2012;44:637-41.

21

27.

let the CAT out of the bag? Health Serv Res 2005;40:1694-711.

22 23

Cook KF, O'Malley KJ, Roddey TS. Dynamic assessment of health outcomes: time to

28.

Hou WH, Shih CL, Chou YT, Sheu CF, Lin JH, Wu HC, et al. Development of a

22

ACCEPTED MANUSCRIPT

1

computerized adaptive testing system of the Fugl-Meyer motor scale in stroke

2

patients. Arch Phys Med Rehabil 2012;93:1014-20.

3

29.

Hsueh IP, Chen JH, Wang CH, Chen CT, Sheu CF, Wang WC, et al. Development of a computerized adaptive test for assessing balance function in patients with stroke.

5

Phys Ther 2010;90:1336-44.

6

30.

RI PT

4

Yu WH, Hsueh IP, Hou WH, Wang YH, Hsieh CL. A comparison of responsiveness and predictive validity of two balance measures in patients with stroke. J Rehabil

8

Med 2012;44:176-80.

SC

7

M AN U

9 10

AC C

EP

TE D

11

23

ACCEPTED MANUSCRIPT

Figure legends:

2

Fig 1. The Bland-Altman plots show the agreement of the intra-rater assessments of the ADL

3

CAT. The solid line represents the mean of the difference. The limits of agreement (mean of

4

the difference ± 1.96×SDdiff= 0.3 ± 1.96 3.4) are presented as dotted lines.

5

Fig 2. The Bland-Altman plots show the agreement of the inter-rater assessments of the ADL

6

CAT. The solid line represents the mean of the difference. The limits of agreement (mean of

7

the difference ± 1.96 SDdiff= 1.2 ± 1.96 4.8) are presented as dotted lines.

8

Fig 3. The plots show each value of MDCIRT for each patient in the ADL CAT intra-rater and

9

inter-rater reliability studies.

M AN U

SC

RI PT

1

AC C

EP

TE D

10

24

ACCEPTED MANUSCRIPT

Table 1: Demographic and clinical characteristics of the participants from the reliability study

40 (72.7%) 15 (27.3%) 74.3 (16.9)

24 (57.1%) 18 (42.9%) 63.6 (12.4)

21 (38.2%) 34 (61.8%)

17 (40.5%) 25 (59.5%)

32 (58.2%) 22 (40.0%) 1 (1.8%)

18 (42.9%) 24 (57.1%) 0 (0%)

RI PT

Inter-rater study (n=42)

M AN U

Characteristic Sex, n (%) Male Female Age, years, mean (SD) Stroke type, n (%) Cerebral hemorrhage Cerebral infarction Side of hemiplegia, n (%) Right Left Bilateral Time since onset to initial evaluation, months, median (minimum~maximum)

Intra-rater study (n=55)

SC

1

21.7 (6.1~182.8)

23.4 (6.3~107.1)

AC C

EP

TE D

2 3

1

ACCEPTED MANUSCRIPT

Table 2: Reliability and responsiveness indices of the activities of daily living computerized adaptive test (ADL CAT) First test Mean (SD)

Second test Mean (SD)

Difference (Second-First) Mean (SD)

ICC (95% CI)

SEM

MDC (MDC%)

RI PT

1

Paired t test t value (p)

Kazis’ effect size

SRM

Reliability Intra-rater 50.9 (9.6)* 51.3 (9.4)† 0.3 (3.4) 0.94 2.4 6.5 0.7 (0.494) (n=55) (0.90-0.96) (12.8 %||) § Inter-rater 49.4 (7.3)‡ 1.2 (4.8) 0.80 3.4 9.5 1.6 (0.114) 50.6 (7.8) (n=42) (0.65-0.89) (18.9 %¶) Responsiveness 39.8 (9.8) 45.9 (7.4) 6.1 (8.3) 5.7 (<0.001) 0.62 (n=60) *The first assessment scores in the intra-rater reliability study ranged from 22.0 to 66.6. † The second assessment scores in the intra-rater reliability study ranged from 22.0 to 66.6. ‡ The first assessment scores in the inter-rater reliability study ranged from 36.4 to 62.1. § The second assessment scores in the inter-rater reliability study ranged from 36.5 to 66.6. || MDC%=MDC/mean of all scores of the intra-rater assessments of the ADL CAT (51.1) ¶ MDC%=MDC/mean of all scores of the inter-rater assessments of the ADL CAT (50.0) SD: standard deviation; ICC: intraclass correlation coefficient; CI: confidence interval; SEM: standard error of measurement; MDC: minimal detectable change; SRM: standardized response mean.

-

SC

-

M AN U

TE D

EP AC C

2 3 4 5 6 7 8 9 10

0.73

2

ACCEPTED MANUSCRIPT

Table 3: The mean IRT reliability and test length of the ADL CAT in the intra-rater and inter-rater studies

Mean IRT Reliability Paired t test P value 0.728 0.855

EP

TE D

M AN U

SC

Second test Mean(SD) 0.93 (0.05) 0.94 (0.02)

AC C

2 3

Study Intra-rater study (n=55) Inter-rater study (n=42) *IRT reliability=1-SEE2

First test Mean (SD) 0.93 (0.05) 0.94 (0.02)

Test length (number of items) First test Second test Paired t test Mean (SD) Mean(SD) P value 4.5 (1.8) 4.4 (1.7) 0.362 3.9 (1.6) 4.1 (1.7) 0.483

RI PT

1

3

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Table 4: Demographic and clinical characteristics of the participants from the responsiveness study Participants who Participants who Characteristic completed the withdrew from study (n=60) the study (n=11) Sex, n (%) Male 39 (65.0%) 6 (54.5%) Female 21 (35.0%) 5 (45.5%) Age, years, mean (SD) 68.5 (11.5) 74.6 (11.0) Stroke type, n (%) Cerebral hemorrhage 16 (26.7%) 2 (18.2%) Cerebral infarction 44 (73.3%) 9 (81.8%) Side of hemiplegia, n (%) Right 25 (41.7%) 4 (36.4%) Left 33 (55.0%) 7 (63.6%) Bilateral 2 (3.3%) 0 (0%) Time since onset to initial evaluation, days, median (minimum~maximum) 20 (9~50) 14 (10~37) ADL CAT baseline scores, median (minimum~maximum) 40.2 (22.0~69.4) 36.5 (22.0~51.6)

4

20

RI PT

10

SC

0

-10

-20 20

30

M AN U

Difference between the intra-rater assessments of the ADL CAT

ACCEPTED MANUSCRIPT

40

50

60

Mean scores of the intra-rater assessments of the ADL CAT

   

AC C

EP

TE D

 

70

 

20

RI PT

10

SC

0

-10

-20 30

40

M AN U

Difference between the inter-rater assessments of the ADL CAT

ACCEPTED MANUSCRIPT

50

60

Mean scores of the inter-rater assessments of the ADL CAT

   

AC C

EP

TE D

 

70

ACCEPTED MANUSCRIPT     18

RI PT

14 12

SC

10 8 6 4 2 20

30

M AN U

MDC values obtained using IRT

16

40

50

60

AC C

EP

TE D

Baseline scores for each patient in two reliability studies

70