Reliability of the Parallel Walk Test for the Elderly

Reliability of the Parallel Walk Test for the Elderly

812 ORIGINAL ARTICLE Reliability of the Parallel Walk Test for the Elderly Sally D. Lark, PhD, Peter W. McCarthy, PhD, David A. Rowe, PhD ABSTRACT. ...

193KB Sizes 1 Downloads 50 Views

812

ORIGINAL ARTICLE

Reliability of the Parallel Walk Test for the Elderly Sally D. Lark, PhD, Peter W. McCarthy, PhD, David A. Rowe, PhD ABSTRACT. Lark SD, McCarthy PW, Rowe DA. Reliability of the parallel walk test for the elderly. Arch Phys Med Rehabil 2011;92:812-7. Objective: To determine interrater agreement and test-retest reliability of the parallel walk test (PWT), a simple method of measuring dynamic balance in the elderly during gait. Design: Cohort study. Setting: Outpatient clinic. Participants: Elderly fallers (N⫽34; mean ⫾ SD age, 81.3⫾5.4y) registered at a falls clinic participated in this study based on Mini-Mental State Examination and Barthel Index scores. Interventions: Subjects were timed as they walked 6m between 2 parallel lines on the floor at 3 different widths (20, 30.5, 38cm) wearing their own footwear. They were scored for foot placement on (1 point) or outside the lines (2 points) by 2 separate raters. Fifteen subjects were retested 1 week later. Main Outcome Measures: Footfall score and time to complete the PWT. Intraclass correlation coefficients (ICCs) and 95% limits of agreement were calculated for interrater and test-retest reliability. Results: For widths of 20, 30.5, and 38cm, interrater reliability ICC range was .93 to .99 and test-retest ICC range was .63 to .90. Conclusions: The PWT was implemented easily by 2 raters with a high degree of interrater reliability. Test-retest reliability was not as high, possibly because of the high susceptibility of variation from 1 week to the next for frail elderly subjects. The 20- and 30.5-cm widths are recommended for future use of the PWT. Key Words: Aged; Balance; Gait; Rehabilitation; Reliability. © 2011 by the American Congress of Rehabilitation Medicine HE PARALLEL WALK test was validated previously as a T quick and simple quantitative measure of balance during gait, which would allow direct comparisons after an interven-

tion.1 It is based on the premise of increased lateral movement during gait that corresponds to decreased dynamic stability.2 The patient walks between 2 parallel lines of a designated width and is scored if he or she steps on or outside the lines. A higher score denotes a lack of stability. The PWT was reported to have been optimal in correctly classifying fallers and nonfallers at a distance of 20 to 30.5cm. Validity coefficients were

From the Faculty of Health, Sport, and Science, Glamorgan University, Pontypridd, Wales (Lark, McCarthy); Institute of Food, Nutrition, and Human Health, Massey University, Wellington, New Zealand (Lark); and Department of Sport, Culture, and the Arts, University of Strathclyde, Jordanhill Campus, Glasgow, Scotland (Rowe). No commercial party having a direct interest in the results of the research supporting this article has or will confer a benefit on the authors or on any organization with which the authors are associated. Correspondence to Sally D. Lark, PhD, Institute of Food, Nutrition, and Human Health, Massey University, Private Bag 756, Wellington 6140, New Zealand, e-mail: [email protected]. Reprints are not available from the author. 0003-9993/11/9205-00649$36.00/0 doi:10.1016/j.apmr.2010.11.028

Arch Phys Med Rehabil Vol 92, May 2011

.70 to .84, (.75 at 20cm with a score cutoff of 12; .70 at 30.5cm) and higher for time at .82 to .87.1 The PWT was developed because existing tests are temporal in assessment, such as the TUG test,3 or are qualitative, extensive, and time consuming for a public clinic setting, such as Tinetti balance performance,4,5 dynamic gait index,6 or functional gait assessment.7 The TWT commonly is used as a measure of dynamic balance during gait. However, conclusions are made about a person’s balance during gait and risk for falling even when it is not attempted.8 It has been reported previously that more than 40% would not attempt the TWT,8,9 and similarly, all elderly fallers and 44% of nonfaller subjects would not attempt it in comparison to the PWT.1 In considering the PWT as a tool for assessing dynamic balance during gait, it remains to determine interrater reliability, particularly to show whether a rater who is not familiar with testing elderly balance parameters or has had limited or no formal training can achieve the same scores during the test as a more experienced trained examiner. Furthermore, test-retest reliability of the PWT needs to be determined. Therefore, the purpose of this study was to examine the comparability of scoring between raters (interrater) and testretest reliability of the PWT. METHODS Participants Elderly fallers (N⫽36; mean ⫾ SD age, 81.3⫾5.4y) who had had a recent fall (within the previous 6 months) and were referred to an outpatient falls clinic of a city hospital initially accepted an invitation to voluntarily participate. The study was approved by the local National Health Service Ethics Committee. All participants signed a consent form after the tests had been verbally explained and shown to them. All subjects were living independently; they were not in a nursing home, but may have been in sheltered accommodation, and they were mobile with or without the aid of a walking stick. They subsequently had recovered from any injuries (which may or may not have resulted in hospitalization) sustained by their fall before attending the falls clinic. Recruitment criteria included a score of 23 or higher of a possible 30 points on the Folstein MMSE10 for mental cognitive ability and higher than 10 (of 20) for the Barthel Index11,12 for independent activities of daily living. Subjects were excluded if they had serious pathologic states that might have been exacerbated on exertion or be deemed to make the participant unsafe. These included unstable cardiovascular disease

List of Abbreviations ANOVA ICC LOA MMSE PWT TUG TWT

analysis of variance intraclass coefficient limits of agreement Mini-Mental State Examination parallel walk test Timed Up and Go tandem walk test

813

PARALLEL WALK TEST RELIABILITY, Lark

(severe hypertension, unstable angina), stroke, severe breathing problems, Parkinson’s disease, peripheral neuropathy (eg, diabetic), or rheumatism/arthritis of the lower limbs that was painful on the day of examination. In the final interrater analysis, 34 patients were included, and 15 of these were retested 1 week later. Parallel Walk Test For the PWT, participants walked at their normal gait (step and stride length) and speed for 6m between 2 parallel lines placed at a width of either 20, 30.5, or 38cm (8, 12, and 15 inches). Each participant achieved a total footfall score (SC) based on ⫹1 when any part of the foot was placed on the line and ⫹2 when the footfall was outside the line or they reached for something to maintain balance (eg, wall or railing that was ⬃1m away and required the person to step outside the lines to reach it). Higher scores denoted worse performance and therefore more unstable gait. Time taken to complete the test also was recorded for comparison and to calculate velocity. For scores, lower score denoted better performance and therefore more stable gait. Each subject carried out the tests in random order. Participants wore their own footwear, which generally was low-heel rubber-soled shoes, and were allowed to use their walking stick, if required, of which 13 of the 34 initial subjects and 6 of the 15 retested participants used a cane. All participants performed a short walk as a warm up (⬃20m) and 1 familiarization session for the 6-m length and starting instructions. They started walking on a verbal cue, and in each case, subjects were asked to look directly ahead and not at their foot placement. Raters 1 and 2 stood at opposite ends of the 6m. Both raters independently recorded the time and footfall scores for each subject during every test. This was repeated a week later for 15 subjects. Raters Raters were the first and second authors of this study. Rater 1 (S.D.L.) had more than 10 years experience working with elderly fallers and assessing balance and gait and had administered the PWT on more than 70 occasions. Rater 2 (P.W.M.) had not previously administered the PWT on an elderly population before this study and was given a short training session with instructions and practice on 4 patients. Data Processing and Analysis Footfall scores from each rater and test administration were transferred from a data collection sheet into an SPSSa data file. For each participant, test administration resulted in 12 scores; SC and time scores for each of the 3 conditions (8, 12, and 15 inches) from each of the 2 raters. This was repeated for the second test administration, adding a second set of 12 scores for the 15 participants for whom data were analyzed for the testretest reliability analysis. Interrater agreement was estimated using scores from the 34 patients who participated in test administration 1. To estimate test-retest reliability for the 15 patients who participated in both test administrations, individual rater scores were averaged to minimize any test-retest

variability caused by rater variability from test administration 1 to test administration 2 and focus on test-retest variability due to patient performance variability. Before the reliability analyses, descriptive statistics were calculated, including skewness and kurtosis, to evaluate the data for normality of distribution and the presence of outliers. After data checking by using descriptive statistics and inspection of individual data points when indicated, a similar analysis plan was used to investigate interrater agreement and test-retest reliability. Disagreements between raters (or differences between test administrations) were evaluated by using methods described by Bland and Altman.13 Systematic bias was assessed by using t tests for the significance of mean differences and calculations of Cohen’s d, a standardized effect size indicating meaningfulness of any differences. Cohen suggested standards for d of .20 as small, d of 0.5 as medium, and d of 0.8 or higher as large.14 Proportional bias was evaluated by using the correlation between differences between raters (or between test administrations) for each participant and the mean score of both raters (or both test administrations) for each participant. Additionally, the size of individual difference scores (ie, difference between raters for each participant or difference between test administrations for each participant) was evaluated by calculating 95% LOAs. Association between observations was evaluated by using ICCs from the 1-way ANOVA model, which were adjusted for a single rater or test administration by using the Spearman-Brown formula. A standard of .70 was used to indicate a minimally acceptable level of reliability.15 All significance tests were conducted using ␣ of .05. RESULTS Patient data, including Barthel Index (range, 16 –20) and MMSE scores, are listed in table 1. There were twice as many men as women, and the subgroup included in the test-retest analysis (n⫽15) was representative of the patient group as a whole (N⫽36), as indicated by similar demographic data (see table 1). Data Checking From descriptive statistics, scores for most PWT subtests appeared to be relatively normally distributed, denoted by skewness and kurtosis values less than 2.0. The main exception was the footfall score of the 15-in width PWT (PWT15SC) subtest, which had especially high kurtosis values. Normality of distribution is an important assumption underlying the use of such parametric analyses as t tests and ICCs, although such analyses can be robust to insubstantial violations of the underlying assumptions.16 One participant was missing data for the 8-in width for the footfall scores (PWT8SC) and recorded time at the same width (PWT8t1) from rater 2 at test administration 1 because of external factors interfering with test conditions, and 1 participant was deemed to be an outlier for PWT8t1 at test administration 1 (based on a scatterplot of data), and their data were removed from subsequent analyses. This resulted in a smaller sample size for the PWT8-in width subtest, as noted in the results tables 2-5.

Table 1: Patient Demographic Data Sample

Men/Women

Age (y)

Mass (kg)

Barthel Index Score

MMSE Score

Interrater (n⫽34) Test-retest (n⫽15)

12/24 5/10

81.3⫾5.4 80.3⫾5.3

72.8⫾11.7 73.0⫾12.0

18.3⫾1.3 18.2⫾1.4

26.1⫾2.4 26.1⫾2.2

NOTE. Values are mean ⫾ 1 SD for patients included in the interrater and test-retest analyses.

Arch Phys Med Rehabil Vol 92, May 2011

814

PARALLEL WALK TEST RELIABILITY, Lark Table 2: Descriptive Statistics for Raters 1 and 2 for Test Administration 1 Rater 1

Rater 2

Variable

Mean

SD

Minimum

Maximum

Skewness

Kurtosis

Mean

SD

Minimum

Maximum

Skewness

Kurtosis

PWT8SC* PWT8t1† PWT12SC PWT12t1 PWT15SC PWT15t1

8.31 16.62 2.42 15.39 0.69 14.33

7.45 6.04 3.95 5.71 2.32 4.80

0 8 0 7 0 8

33 32 17 34 11 28

1.06 0.88 2.06 1.25 3.69 0.72

1.19 0.66 4.45 2.52 13.55 0.57

8.14 16.56 2.72 15.31 0.44 14.39

7.26 6.13 4.19 5.65 1.65 4.90

0 8 0 7 0 7

30 30 17 35 9 29

0.92 0.65 1.76 1.42 4.56 0.82

0.66 ⫺0.30 2.81 3.37 22.22 0.91

NOTE. N⫽36. Abbreviations: PWT8, 12, or 15, distance between parallel lines in inches; SC, footfall scores recorded for the PWT; t1, time recorded for the PWT. *n⫽35. † n⫽34.

Interrater Agreement Descriptive statistics for raters 1 and 2 are listed in table 2, and results of interrater reliability analyses are listed in table 3. There was no significant (P⬎.05) mean difference between raters for any of the 6 sets of PWT footfall scores, and mean differences were deemed insignificant, indicated by small effect sizes.14 ICC values generally were extremely high (ICCⱖ.93) for all 6 sets of PWT scores except for PWT15SC, which had an ICC of .71. The 95% LOA values between raters generally were narrow, and there were no significant (P⬎.05) correlations between rater difference and mean rater score, except for the PWT15SC. Test-Retest Reliability Descriptive statistics for test administrations 1 and 2 are listed in table 4, and results for the test-retest reliability analysis are listed in table 5. Mean differences between test administrations were nonsignificant (P⬎.05) for all 6 scores, except for time recorded at the 12-in width (PWT12t1). These mean differences also generally were trivial, indicated by small effect sizes, and the mean difference for PWT12t1 was small to moderate.14 Interclass correlations were all greater than .70, except for PWT12t1. The 95% LOA values between test administrations were wider than for interrater agreement for all sets of scores except for PWT15SC, and correlations between the difference score and mean values for both test administrations were higher than for interrater agreement, with 1 significant (P⬍.05) correlation for PWT12SC. DISCUSSION The primary purpose of the present study was to investigate interrater agreement for various subscores of the PWT in a

sample of older fallers. A secondary purpose was to obtain preliminary data for test-retest reliability in a subset of the same sample. Overall, interrater reliability of the PWT was excellent, indicated by the insignificant mean differences between raters, very high reliability coefficients (except for PWT15SC), narrow 95% LOA values, and no proportional bias (except for PWT15SC). The preliminary evidence for testretest reliability generally was less favorable, with higher mean differences, lower reliability coefficients (although all except 1 were above commonly cited standards of acceptable reliability), broader 95% LOA values, and several indications of proportional bias (although only 1 was significant, for PWT12SC). From the excellent interrater results (eg, ICC⫽.93–.99 vs ICC⫽.71 and .97) and observation of raw scores, the 2 narrower widths (PWT8 and PWT12) appear to discriminate better between participants with differing levels of dynamic balance and have less variability between raters than the 15-in width (PWT15). In the PWT15, most participants (89%) had a footfall score of zero (ie, did not step on or outside the lines), which was evident by the high skewness and kurtosis results (see tables 2 and 4), and therefore data did not strictly meet the assumptions underlying use of the ANOVA or ICC. This also indicates that the PWT15SC does not discriminate between elderly participants. In our study, the level of rater experience did not appear to make a difference on recorded results for the PWT. Rater 1 (S.D.L.) is registered with the British Chartered Society of Physiotherapists, has more than 10 years of experience working in the area of area of balance testing and rehabilitation in the elderly, and had implemented the PWT extensively in research and clinical conditions. Although rater 2 (P.W.M.) has

Table 3: Interrater Reliability for the PWT Variable

Mean Difference

Cohen d

P

ICC

rAVGE, DIFF

PWT8SC* PWT8t1† PWT12SC PWT12t1 PWT15SC PWT15t1

.17 .06 ⫺.31 .08 .25 ⫺.06

.02 .01 ⫺.07 .01 .13 ⫺.01

.56 .76 .24 .52 .33 .78

.97 .98 .93 .99 .71 .97

.11 ⫺.08 ⫺.16 .08 .47‡ ⫺.08

95% LOA

⫺3.24 ⫺2.09 ⫺3.34 ⫺1.43 ⫺2.73 ⫺2.35

NOTE. N⫽36. Mean difference ⫽ Mean Rater 1 ⫺ Mean Rater 2. Abbreviations: PWT8, 12, or 15, distance between parallel lines in inches; SC, footfall scores recorded for the PWT; rAVGE, between average of 2 raters and difference between raters; t1, time recorded for the PWT. *n⫽35. † n⫽34. ‡ P⬍.05.

Arch Phys Med Rehabil Vol 92, May 2011

to to to to to to

3.58 2.21 2.72 1.59 3.23 2.24

, correlation

DIFF

815

PARALLEL WALK TEST RELIABILITY, Lark Table 4: Descriptive Statistics for Test Administrations 1 and 2 Test Administration 1

Test Administration 2

Variable

Mean

SD

Minimum

Maximum

Skewness

Kurtosis

Mean

SD

Minimum

Maximum

Skewness

Kurtosis

PWT8SC* PWT8t1† PWT12SC PWT12t1 PWT15SC PWT15t1

8.57 14.81 3.23 16.33 0.83 14.97

9.60 5.27 5.05 6.42 2.36 5.56

0.0 8.5 0.0 8.0 0.0 7.5

31.5 24.5 17.0 34.5 8.5 28.5

1.20 0.68 1.77 1.72 2.99 0.88

⫺0.83 ⫺0.85 2.91 4.06 8.89 1.27

7.25 13.89 4.27 14.23 1.27 13.83

7.57 5.75 8.20 5.73 2.76 5.80

0.0 7.0 0.0 7.0 0.0 8.0

26.5 29.0 29.5 28.0 8.0 29.0

1.67 1.68 2.44 1.29 2.30 1.69

2.30 3.38 6.35 1.49 4.02 2.69

NOTE. N⫽15. Abbreviations: PWT8, 12, or 15, distance between parallel lines in inches; SC, footfall scores recorded for the PWT; t1, time recorded for the PWT. *n⫽14. † n⫽13.

extensive knowledge of human movement, anatomy, and kinesiology, he was inexperienced in working with the elderly and had not previously performed the PWT. He had a short time to familiarize himself with the protocol of footfall scoring, time keeping, and verbal instructions to participants. Interrater scores in this study were better than the interrater reliability for novice and experienced physical therapists who rated Tinetti balance scores.17 The investigators for that study found that the proportion of observed agreement ranged from .96 to .52, depending on the balance test and more variation for the ␬ coefficients (level of agreement between raters with a correction factor for change agreement), which ranged from .90 to ⫺.03. The lower ␬ coefficients probably were a result of subjective terms for scoring, such as “safe,” “unsafe,” “steady,” and “unsteady.” Such subjective terms were avoided in the PWT, with 2 possible balance scoring scenarios of either stepping on the line or outside the line, which afforded clear quantitative scoring by both raters. We expected that test-retest results overall would be lower than interrater results because test-retest coefficients typically are lower than other types of reliability, such as intertrial or interrater (depending on the subjectivity of the test protocol).18 However, there were still reasonable ICCs for the 20-cm width (ICC⫽.77 for footfall scores and ICC⫽.75 for time), which was the most sensitive in the validity tests for discriminating between fallers and nonfallers.1 In our test-retest reliability analysis, rater scores were combined (ie, we used the mean of 2 raters for each test administration), which helped remove error caused by rater variability, giving a more precise estimate of true test-retest reliability, or error variance due to test administration rather than rater error/variability. However, additional analyses indicated similar results when test-retest re-

liability was estimated from a single rater. A reasonable conclusion might be that the lack of consistency from test administration 1 to test administration 2 (see table 5) probably was caused by factors other than intrarater variability. Variability between repeated assessments, even 1 week apart, was thought to be large because the sample population was frail elderly persons who had experienced at least 1 fall. The frail status of these elderly subjects means they are at high risk for refalling and deteriorate in motor function over time. Perhaps if the retest was repeated minutes later, as it was for the TUG test by Botolfsen et al19 (ie, it becomes an intertrial), there may have been a better outcome. Another TUG test, in comparison, was tested on mainly independent community dwelling elderly without cognitive impairments and showed only moderate test-retest results (ICC⫽.50), although the retest was carried out approximately 112 days later.20 The investigators recorded that 6.5% of subjects tested were physically unable to perform the TUG test, concluding that the TUG had questionable test-retest reliability. In contrast, other studies using the TUG test on elderly populations have reported good interrater reliability for 2 experienced raters (Spearman r⫽.96); intrarater reliability (Spearman r⫽.93) for elderly amputees (mean age, 73.5y),21 or using an expanded TUG test on elderly with impaired mobility test-retest (ICC⫽.68) when the retest was done minutes later; interrater (⬎4 raters) ICC range of .87 to .96.19 These latter results should not be directly compared with the present testretest results because they constitute an intertrial for which consistency typically is higher than test-retest reliability, which is concerned with stability during extended periods of several days.18 Similarly, inter- and intrarater reliability for the Func-

Table 5: Test-Retest Reliability for the PWT Variable

Mean Difference

Cohen d

P

ICC

rAVGE, DIFF

95% LOA

PWT8SC* PWT8t1† PWT12SC PWT12t1 PWT15SC PWT15t1

1.32 0.92 ⫺1.04 2.10 ⫺0.44 1.14

.15 .17 ⫺.16 .34 ⫺.11 .20

.42 .41 .51 .01 .14 .16

.77 .75 .63 .85 .90 .86

.36 ⫺.13 ⫺.58‡ .26 ⫺.38 ⫺.09

⫺10.38 to 13.02 ⫺6.73 to 8.59 ⫺12.62 to 10.54 ⫺3.31 to 7.51 ⫺2.56 to 1.68 ⫺4.64 to 6.92

NOTE. N⫽15. Mean difference ⫽ Mean Rater 1 ⫺ Mean Rater 2. Abbreviations: PWT8, 12, or 15, distance between parallel lines in inches; SC, error scores recorded for the PWT; rAVGE, DIFF, correlation between average of 2 raters and difference between raters; t1, time recorded for the PWT. *n⫽14. † n⫽13. ‡ P⬍.05.

Arch Phys Med Rehabil Vol 92, May 2011

816

PARALLEL WALK TEST RELIABILITY, Lark

tional Gait Assessment was acceptable (ICC⫽.86 and .74, respectively), but the retest was repeated after 1 hour.7 The ability of persons to safely complete the test is paramount to the purpose of the PWT. In contrast, in studies involving TWT, subjects were graded even if they could not participate. For example, Dargent-Molina et al22 assigned scores to those who were unwilling, unable, or did not participate in the TWT on a sliding scale of 1 (can do 4 consecutive tandem steps) to 4 (did not take part). On a previous occasion, no participant from the falls clinic and half of nonfallers would not perform the TWT and consequently would be rated as 4 by the Dargent-Molina22 study. However, as with our scores of the 38-cm width in the PWT in which nearly all subjects scored a perfect zero, the statistical outcome becomes essentially meaningless. The TWT appears to identify persons as at most risk for falling without quantifiable evidence. Furthermore, in the study by Nelson et al23 in which the TWT was used as an assessment as part of physical performance tests after home-based exercises in the elderly, 1 of their patients fell while doing the TWT at home. In all data collected using the PWT, in comparison, participants (fallers and nonfallers) have all been able to complete the test and there have been no adverse events. The safety of the PWT is partly because of the use of mobility aids; canes were used if required because the primary function of the PWT was to assess stability during everyday independent mobility (gait) and show decline/improvements over time or with an intervention. Therefore, use of walking aids was allowed, but should be recorded for each participant so that this can be taken into account when retesting on repeated occasions, for example, if a person has worsened and is now using a cane or needs an enhanced mobility aid. The 38-cm (15 inch) width separation was not sensitive to differences in dynamic balance, and all except 2 subjects scored zero. This also was reported in the initial validity report,1 but the width was used in the present study for comparison. Although the ICC was very high for 38cm (ICC⫽.90 and .86 for footfall score and time, respectively), test-retest reliability was misleading for 38cm because almost everyone scored zero. This also supports the conclusion that a narrower width of 20 or possibly 30.5cm would be better suited for the PWT for the elderly population. The 38cm might better suit persons with more problematic gait, such as patients with cerebral palsy, for which there are multiple levels of severity against which a person with cerebral palsy is rated. Some interest has been expressed in the PWT from groups who work with people with cerebral palsy to quantifiably rate gait. However, the reliability and validity of the PWT for this population have yet to be determined. Study Limitations A limitation of the present study was the relatively small nonrandomized sample of patients. However, this is typical of most clinical studies (ie, most descriptive studies involve convenience samples). Sample size appeared to be sufficient for calculated interrater coefficients; however, the smaller sample for test-retest was not ideal and should be considered as preliminary or pilot data. Caution should be applied in the interpretation of these data. CONCLUSIONS The PWT is a reliable test that can be used by therapists and examiners because there was excellent interrater reliability between raters with substantially different levels of experience and training. Although test-retest reliability was Arch Phys Med Rehabil Vol 92, May 2011

not as high as the interrater result, it represents moderate to good repeated reliability for the narrowest width tested (20cm). The lower test-retest reliability may have been due to the level of frailty of the patients tested, which may lead to variable function over relatively short periods. These issues may be resolved in future studies for predictive validity, such as in predicting elderly fallers, and the sensitivity of the PWT to identify improvements in dynamic balance after an intervention. Acknowledgments: We acknowledge the cooperation and support of Firdaus Adenwalla, MD, and his staff at the Neath and Port Talbot Hospital Falls Clinic. References 1. Lark SD, Pasupuleti S. Validity of a functional dynamic walking test for the elderly. Arch Phys Med Rehabil 2009;90:470-4. 2. Maki BE, McIllroy WE. Postural control in the older adult. Clin Geriatr Med 1996;12:635-58. 3. Mathias S, Nayak US, Issacs B. Balance in elderly patients: the “get-up and go” test. Arch Phys Med Rehabil 1986;67: 387-9. 4. Tinetti ME. Performance-orientated assessment of mobility problems in elderly patients. J Am Geriatr Soc 1986;34:119-26. 5. Tinetti ME, Williams TF, Mayewski R. Fall risk index for elderly patients based on number of chronic disabilities. Am J Med 1986;80:429-34. 6. Schumway-Cook A, Woollacott M. Motor control: theory and practical applications. Baltimore: Williams & Wilkins; 1995. 7. Wrisley DM, Marchetti GF, Kuharsky DK, Whitney SL. Reliability, internal consistency, and validity of data obtained with the functional gait assessment. Phys Ther 2004;84:906-18. 8. Cho B-L, Scarpace D, Alexander NB. Tests of stepping as indicators of mobility, balance, and fall risk in balance-impaired older adults. J Am Geriatr Soc 2004;52:1168-73. 9. Morris R, Harwood RH, Baker R, Sahota O, Armstrong S, Masud T. A comparison of different balance tests in the prediction of falls in older women with vertebral fractures: a cohort study. Age Ageing 2007;36:78-83. 10. Folstein MF, Folstein SE, McHugh PR. ‘Mini-Mental State’. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 1975;12:189-98. 11. Mahoney FI, Bathel DW. Functional evaluation: the Barthel Index. Maryland State Med J 1965;14:61-5. 12. Collin C, Wade DT, Davis S, Horne V. The Barthel ADL Index: a reliability study. Int Disabil Stud 1988;10:61-3. 13. Bland JM, Altman DG. Applying the right statistics: analysis of measurement studies. Ultrasound Obstet Gynecol 2003;22: 85-93. 14. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum; 1988. 15. Nunnally J, Bernstein IH. Psychometric theory. 3rd ed. Boston: McGraw-Hill; 1994. 16. Haynes RB, Sackett DL, Guyatt GH, Tugwell P. Clinical epidemiology: how to do clinical practice research. 3rd ed. Philadelphia: Lippincott, Williams & Wilkins; 2006. 17. Cipriany-Dack LM, Innerst D, Johannsen J, Rude V. Interrater reliability of the Tinetti balance scores in novice and experienced physical therapy clinicians. Arch Phys Med Rehabil 1997;78: 1160-4. 18. Baumgartner T, Jackson A, Mahar M, Rowe D. Measurement for evaluation in physical education and exercise science. 8th ed. Boston: McGraw-Hill; 2006.

PARALLEL WALK TEST RELIABILITY, Lark

19. Botolfsen P, Helfostad JL, Moe-Nilssen R, Wall JC. Reliability and concurrent validity of the expanded timed up-and-go test in older people with impaired mobility. Physiol Res Int 2008;13:94-106. 20. Rockwood K, Awalt E, Carver D, MacKnight C. Feasibility and measurement properties of the Functional Reach and the Timed Up and Go tests in the Canadian Study of Health and Aging. J Gerontol A Biol Sci Med Sci 2000;55:M70-73. 21. Schoppen T, Boontra A, Groothoff JW, deVries J, Göeken LNH, Eisma WH. The Timed “Up And Go” test: reliability and validity in persons with unilateral lower limb amputation. Arch Phys Med Rehabil 1999;80:825-8.

817

22. Dargent-Molina P, Favier F, Grandjean H, et al. Fall-related factors and risk of hip fracture: the EPIDOS prospective study. Lancet 1996;348:145-9. 23. Nelson ME, Layne JE, Bernstein MJ, et al. The effects of multidimensional home-based exercise on functional performance in elderly people. J Gerontol Med Sci 2004;59A,2:154-60. Supplier a. SPSS, version 16.1.1; SPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.

Arch Phys Med Rehabil Vol 92, May 2011