The Harris hip score

The Harris hip score

The Journal of Arthroplasty Vol. 16 No. 5 2001 The Harris Hip Score Comparison of Patient Self-Report With Surgeon Assessment Nizar N. Mahomed, MD, S...

57KB Sizes 41 Downloads 100 Views

The Journal of Arthroplasty Vol. 16 No. 5 2001

The Harris Hip Score Comparison of Patient Self-Report With Surgeon Assessment Nizar N. Mahomed, MD, ScD, FRCSC,* David C. Arndt, MD,† Brian J. McGrory, MD,‡ and William H. Harris, MD§

Abstract: Outcome evaluations are of primary concern in contemporary medical practice. Questionnaires are being used increasingly to provide input data for such outcomes evaluation. This study comprised 50 primary total hip arthroplasties in 36 patients who had undergone the procedure at least 12 months before enrollment. Each patient completed a self-report Harris Hip Score (HHS) 30 days before a formal evaluation by an independent orthopaedic surgeon that included a HHS. Comparison was made between the completed responses to the individual items on the self-report HHS and surgeon-assessed HHS. Concordance of item response and ␬ statistic were calculated. Overall the self-report and surgeon-assessed HHS showed excellent concordance. The results of this study support the use of the HHS as a self-report instrument. Key words: Harris hip score, hip arthroplasty, self report, outcomes, arthritis.

Since Codman [1] first drew attention to the importance of evaluating outcomes, orthopaedists have worked to quantify clinical outcomes. This interest has been spurred on by changes in the

health care industry that have led to increased emphasis on evaluating quality of care and containing costs. Outcome assessment allows purchasers and providers to evaluate the quality of services delivered. The use of standardized outcome instruments allows comparisons between different patient cohorts to evaluate the effectiveness of different procedures or prostheses. Many authors have stressed the importance of outcome evaluation in total hip arthroplasty (THA) [2– 6]. Patient-based self-report questionnaires are an accepted method of evaluating patient outcomes and quality of life. This methodology has been shown to be reliable, valid, and sensitive in evaluating outcomes of THA [7–15]. Generic health status instruments, such as the Medical Outcomes Study Short Form 36 (SF36), and disease-specific instruments, such as the Western Ontario McMaster Osteoarthritis Index (WOMAC), are required to evaluate THA outcomes adequately [13,14]. Evaluating outcomes of THA using contemporary patient-based measures introduces certain problems. Long-term outcomes in THA are important because the average service life of a primary THA is ⬎10

From the *Division of Orthopaedic Surgery, University of Toronto, Toronto Western Hospital, Toronto, Ontario, Canada; †Harvard Vanguard Medical Associates, Somerville, Massachusetts; ‡Department of Orthopaedic Surgery and Rehabilitation, University of Vermont School of Medicine, Burlington, Vermont; ‡Orthopaedic Associates, Portland PA, Portland, Maine; §Department of Orthopaedic Surgery, Harvard Medical School; §Hip and Implant Unit, Massachusetts General Hospital; and §Orthopaedic Biomechanics Laboratory, Massachusetts General Hospital, Boston, Massachusetts. Study performed at Orthopaedic Biomechanics Laboratory, Massachusetts General Hospital, Boston, Massachusetts. Submitted December 30, 1999; accepted January 4, 2001. Funds were received in partial or total support of the research material described in this article from the William H. Harris Foundation, Health Services Research Fellowship from the Orthopaedic Research and Education Foundation and American Academy of Orthopaedic Surgeons, and Research Fellowship from the Canadian Arthritis Society. Reprint requests: Nizar N. Mahomed, MD, Toronto Western Hospital, 399 Bathurst Street, ECW 1-002, Toronto, Ontario, M5T 2S8. Copyright © 2001 by Churchill Livingstone威 0883-5403/01/1605-0005$35.00/0 doi:10.1054/arth.2001.23716

575

576 The Journal of Arthroplasty Vol. 16 No. 5 August 2001 years [16,17]. The THA literature is based primarily on case series reports of long-standing THA cohorts, which form the basis of much of the current understanding of this procedure. These cohorts were established before the development of contemporary patient-based outcome instruments. Preoperative and midterm outcome data for these existing cohorts are based on empirically derived physicianassessed instruments. Future evaluations of these cohorts will continue to rely on these traditional measures. Lieberman et al [18] recommended using the Harris Hip Score (HHS) in addition to contemporary quality-of-life measures for evaluating THA outcomes. The HHS is the most widely used physician-assessed measure of hip function after THA. Although empirically derived, the HHS evaluates similar domains as the WOMAC, including patient hip pain and function [11,19 –21]. Malchau et al [22] showed the HHS to be a reliable and valid measure of hip function. Several studies have used the HHS as a patient self-report questionnaire in combination with other quality-of-life instruments to evaluate THA outcomes [23–25]. A self-report questionnaire format provides the advantage of obtaining follow-up data by mailed survey without recalling patients for formal evaluation. To date, there has been no evaluation of the HHS as a self-report questionnaire. The aim of this study was to compare the HHS as a patient self-report and a physician-assessed instrument in evaluating the outcomes of THA patients.

Materials and Methods This study is based on a cohort of 36 patients with 50 primary THAs who were a minimum of 1 year postsurgery. These were consecutive patients returning for routine annual follow-up who had undergone THA by a single senior surgeon. No cases were excluded. Patients were seen between November 1994 and June 1995. All patients were surveyed by mailed questionnaire before the clinical follow-up appointment. The mean time between completion of questionnaires and clinical follow-up was 30 days. During this relatively short interval, the patient’s health state with respect to hip function was unlikely to have changed because all patients were at least 1 year postsurgery. The questionnaires included the SF36, WOMAC, and a self-report HHS. The selfreport HHS consisted of the HHS questions on hip pain, limp, use of walking supports, distance walked, difficulty with sitting in a chair, difficulty

Table 1. Self-Report Harris Hip Score Questionnaire Question 1. Please describe any pain in your hip: A. No pain B. Slight pain or occasional pain C. Mild, no effect on ordinary activity, pain after unusual activity, uses aspirin or similar medication D. Moderate pain that requires pain medicine stronger than aspirin/similar medications. I’m active but have had to make modifications and/or give up some activities because of pain E. Marked or severe pain that limits activity and requires pain medicine frequently F. Totally disabled—wheelchair or bed ridden 2. Amount and type of support used: A. None B. Cane for long walks C. Cane all the time D. 2 canes E. 1 crutch F. 2 crutches or walker G. Unable to walk 3. Limp. This should be judged at the end of a long walk using the type of support chosen in question 2. A. None B. Slight C. Moderate D. Severe 4. Distance that you can walk. This should be judged with the aid of a support if you use one. A. Unlimited B. 5–6 blocks C. 1–4 blocks D. In the house only E. Unable to walk 5. Climbing stairs: A. Normally B. Need a banister or cane or crutch C. Must put both feet on each step/severe trouble climbing stairs D. Unable to climb stairs 6. Shoes and socks: A. Can put on socks and tie a shoe easily B. Can put on socks and tie a shoe with difficulty C. Cannot put on socks and shoes 7. Sitting: A. Comfortable in any chair B. Comfortable only in high chair, or can sit comfortably for only 0.5 hour C. Cannot sit for 0.5 hour because of pain

Score

44 40 30

20 10 0 11 7 5 2 3 0 0

11 8 5 0 11 8 5 2 0 4 2 1 5 4 2 0 5 3 0

putting on shoes and socks, and difficulty with climbing stairs (Table 1). The possible score range was 0 to 90. For ease of presentation, this score range was rescaled to 0 to 100. The question on use of public transport was excluded because it was not applicable in the same manner as it had been when the HHS was developed. Hip range of motion and deformity could not be evaluated in a patient selfreport format and were excluded. An independent orthopaedic surgeon who was unaware of the re-

Harris Hip Score • Mahomed et al. Table 2. Cohort Demographics (n ⫽ 36) Age (y) Gender Men Women Education Completed high school Completed college Postgraduate schooling Side Left Right Total hip arthroplasty Unilateral Bilateral Primary diagnosis (no. hips) Congenital hip dysplasia Osteoarthritis Avascular necrosis Rheumatoid arthritis Other

69 ⫾ 12 11 (31%) 25 (69%) 12 (33%) 9 (25%) 15 (42%) 22 (44%) 28 (56%) 22 (61%) 14 (39%) 24 (48%) 8 (16%) 3 (6%) 3 (6%) 12 (24%)

sponses on the mailed questionnaire evaluated patients at the clinical follow-up visit. This evaluation included a HHS, physical examination, and routine radiographic examination. In addition, patients completed a WOMAC and SF36.

Statistical Analysis The primary analysis focused on the comparison of responses to the individual items in the selfreport HHS as completed by patients and the physician-assessed HHS. These items included pain, limp, support, distance walked, sitting, putting on shoes and socks, and climbing stairs. For each item, we calculated the unweighted distribution of responses, degree of concordance, and ␬ statistic [26]. The ␬ statistic is a measure of reproducibility between repeated assessments of the same categorical variable [27]. It measures the degree of agreement that exists between iterations beyond the amount expected by chance alone. Values of ␬ ⬍0 indicate correlation less than what would be expected by chance alone, ␬ of 0 indicates agreement by chance alone, and ␬ of 1 indicates perfect agreement. Values of ␬ between 0 and 0.4 indicate marginal reproducibility; between 0.4 and 0.75, indicate good reproducibility; and ⬎0.75, indicate excellent reproducibility [28]. We also compared the agreement between scale scores for the HHS, WOMAC, and SF36 from the mailed survey and clinical evaluation visit using the Pearson correlation coefficient. Test-retest reliability is a measure of an instrument’s stability in response patterns during a short

577

period in which the individual’s actual health status has not changed and their scores should not change. The WOMAC and SF36 have proven testretest reliability. We compared the mean scale scores on these instruments at the time of mailed survey and clinical follow-up. These scores were compared with self-report and physician-based responses on the HHS to assess the stability of response patterns. Statistical analysis was performed using PC-SAS version 6.12 (SAS Institute, Cary, NC) and BMDP procedure 4F (BMDP Statistical Software Inc, Los Angeles, CA). The crucial level for statistical significance was P⬍.05.

Results There were 11 men and 25 women, with a mean age of 69 years (Table 2). The most frequent primary diagnosis was congenital dysplasia in 48% of the hips, followed by osteoarthritis in 16%. Fourteen patients had undergone bilateral THA. Of patients, 56% had a right THA, and 44% had a left THA. The mean HHS by self-report questionnaire was 76.0 ⫾ 19.0; the surgeon-assessed HHS was 78.7 ⫾ 18.7 (Table 3). The WOMAC scores at clinical follow-up were pain, 2.3 ⫾ 3.1; stiffness, 1.8 ⫾ 1.7; and physical function, 15.3 ⫾ 12.3. The SF36 scores at clinical follow-up were general health, 74.9 ⫾ 17.6; physical function, 47.8 ⫾ 28.1; role physical, 55.5 ⫾ 41.4; and bodily pain, 64.6 ⫾ 26.4. The Pearson correlation for the 3 WOMAC scales ranged

Table 3. Comparison of WOMAC and SF-36 SelfReport Scores With Self-Report and SurgeonAdministered Harris Hip Score Scores at Time of Pearson Scores From Clinical Correlation Mailed Survey Assessment Coefficient HHS WOMAC Pain Stiffness Physical function SF-36 General health Physical function Role physical Bodily pain Vitality Social function Role emotional Mental health *P ⬍ .0001.

76.0 ⫾ 19.0

78.7 ⫾ 18.7

0.99*

2.4 ⫾ 3.1 1.9 ⫾ 1.7 15.6 ⫾ 11.6

2.3 ⫾ 3.1 1.8 ⫾ 1.7 15.3 ⫾ 12.3

0.90* 0.96* 0.92*

69.5 ⫾ 23.1 45.1 ⫾ 29.0 53.5 ⫾ 42.3 63.4 ⫾ 28.2 57.7 ⫾ 19.1 47.8 ⫾ 12.8 75.3 ⫾ 41.4 77.4 ⫾ 17.6

74.9 ⫾ 17.6 47.8 ⫾ 28.1 55.5 ⫾ 41.4 64.6 ⫾ 26.4 60.6 ⫾ 20.7 51.0 ⫾ 10.1 77.3 ⫾ 37.2 80.0 ⫾ 18.3

0.78* 0.96* 0.97* 0.95* 0.93* 0.71* 0.97* 0.91*

578 The Journal of Arthroplasty Vol. 16 No. 5 August 2001 from 0.90 to 0.96 (P⬍.0001); for the 8 SF36 scales, the range was 0.78 to 0.97 (P⬍.0001). The highest Pearson correlation coefficient was noted for the self-report HHS and physician-assessed HHS at 0.99 (P⬍.0001). The mean time interval between the self-report questionnaire and clinical evaluation was 30 days. It is unlikely that the patients’ hiprelated health status would change in this short period given that they were all at least 1 year post-THA. This supposition was verified by the WOMAC and SF36 scores, which did not show any significant differences at the 2 administrations. Similarly the HHS at the 2 administrations did not show a significant difference, indicating that response patterns remain stable when there is no true change in health state. The patient-based self-report administration of the HHS revealed that patients had little difficulty in understanding or completing the questions. The question on the use of walking aids was the only item with a response rate of ⬍85% (Table 4). This response rate may reflect the multiple combinations of devices that are used by patients at different times and for different activities (ie, indoor activities vs outdoor walks). The remaining 6 items had excellent response rates ranging from 86% to 100%. These response rates are a conservative estimate because we used strict criteria for defining nonresponse, including instances in which patients had circled 2 responses to the same question. For the question on walking aids, if a patient circled cane and crutches, this was coded as missing response. Overall the response rate for the self-administered format was comparable to that in most other widely used instruments. Concordance measures the frequency of exact matches for responses to each item in the selfreport and physician-assessed HHS. The higher the concordance rates, the greater the amount of agreement between the 2 formats, with 100% represent-

ing perfect agreement. The concordance rates between patient self-report and physician-assessed items ranged from 85% to 100% (Table 4). The lowest level of agreement occurred for the question regarding distance walked; this is understandable because it requires some level of arbitrary judgment on the part of the respondent (physician or patient) regarding the definition of a city block. The remaining 6 items had concordance rates ⬎95%, indicating excellent agreement between the 2 formats. The ␬ statistics evaluate the level of agreement between the 2 methods of administration that exists beyond chance. Values ⬎0.75 indicate excellent agreement and are considered sufficient for most instruments in which group level comparisons are being considered. The values for the ␬ statistic for each item of the HHS ranged between 0.79 and 1.00 (P⬍.0001) (Table 4). The lowest ␬ value of 0.79 still denotes excellent agreement between the self-report and physician-assessed formats. As mentioned earlier, this question requires some degree of interpretation in the meaning of a city block and accounts for its lower level of agreement. The ␬ values for the remaining 6 items were ⬎0.90, indicating that patient self-report responses are essentially equivalent to physician-assessed scores.

Discussion This study compared the performance of a selfreport HHS with that of the traditional physicianadministered HHS. Our results show that the selfreport format was easy to use by patients, with 6 of the 7 items having a response rate of ⬎85%. The correlation between the overall score from the 2 formats was 0.99 and exceeded that of the correlation between the 2 self-administrations of the WOMAC and SF36. The agreement between the 2 formats was excellent with concordance rates rang-

Table 4. Individual Harris Hip Score Item Response Comparison Between Self-Report and Surgeon Assessment

HHS Items Pain Distance walked Support Limp Stair climbing Sitting Shoes and socks *P ⬍ .0001.

Score From Mailed Survey

Score From Surgeon Assessment

Self-Report % Completion

Concordance Between Self-Report and Surgeon Assessment

␬ Statistic

37.1 ⫾ 9.5 6.4 ⫾ 4.1 7.4 ⫾ 4.2 7.4 ⫾ 3.7 2.4 ⫾ 1.1 4.1 ⫾ 1.1 3.0 ⫾ 1.1

37.5 ⫾ 9.4 7.4 ⫾ 4.1 8.2 ⫾ 4.0 7.6 ⫾ 3.6 2.3 ⫾ 1.1 4.1 ⫾ 1.1 3.0 ⫾ 1.2

94 92 76 90 86 100 92

96 85 97 100 98 100 96

0.94* 0.79* 0.96* 1.00* 0.96* 1.00* 0.92*

Harris Hip Score • Mahomed et al.

ing from 85% to 100%. The ␬ statistic for the individual items ranged from 0.79 to 1.00, indicating excellent agreement. The HHS is the most widely used hip scoring system in the literature. Although empirically derived, the HHS covers similar domains (hip pain and function) as contemporary patient-based qualityof-life measures such as the WOMAC. The HHS is one of the few traditional hip scoring systems that has had its performance characteristics evaluated [5,29 –31]. Use of the HHS in evaluating existing THA cohorts is crucial because the preoperative and short-term outcomes usually are based on the HHS. Comparison of long-term outcomes with earlier reports necessitates the continued use of the HHS in addition to more contemporary instruments. Contemporary outcome instruments are patient based and use a self-report methodology. The use of the HHS as a self-report instrument offers several advantages, including easier administration by mailed survey, evaluation format consistent with contemporary instruments, less expensive than formal physician assessment, and less burdensome to patients than formal clinical evaluation. We found excellent agreement between patient self-report and physician assessment of pain and function based on items in the HHS. In contrast, Lieberman et al [32] found significant differences between patient self-report and physician evaluation of outcomes after THA. Lieberman et al [32] used visual analog scales for evaluating outcomes. We believe this methodology introduces significant problems of interpretation by respondents because there are no interval scale markers to help define what a given response means for patients and physicians. In our study, we used the HHS, which contains multiple choice questions with fixed response categories. This format minimizes problems of subjective interpretation of the response scales by respondents in relation to the severity of the problem being evaluated. Limitations of this study include the fact that there was a high proportion of patients with developmental hip dysplasia, which may not be representative of the general population. We believe, however, that that should not have a significant effect on response patterns on the self-report and clinician-administered versions of the HHS. Patient age may affect the level of correlation between self-report and clinician-administered formats. In this study, the mean age was 69 years. This is typical of most THA cohorts. Finally, the level of education in this cohort was relatively high, and this may have led to improved correlation between the 2 formats. We are not aware, however, of any pub-

579

lished report in the literature that level of education influences patient response patterns on a self-report versus clinician-administered questionnaire. The performance of a patient self-report HHS is comparable to that of a physician-administered HHS. Because a self-report format offers several advantages over a physician-administered format, greater consideration should be given to its use in evaluating the outcomes of THA.

Acknowledgment We thank Bob Lew and Liz Wright, for their help with the statistical analysis.

References 1. Codman J: The shoulder. Thomas Todd, Boston, 1934 2. Bellamy N, Campbell J: Hip and knee rating scales for total joint arthroplasty: a critical but constructive review. Part I. J Orthop Rheum 3:3, 1989 3. Johnston RC, Fitzgerald RH, Harris WH, et al: Clinical and radiographic evaluation of total hip replacement. J Bone Joint Surg 72A:161, 1990 4. Liang MH, Katz JN, Phillips C, et al: The total hip arthroplasty outcome evaluation form of the American Academy of Orthopaedic Surgeons. J Bone Joint Surg 73A:639, 1991 5. McGrory BJ, Morrey BF, Rand JA, et al: Correlation of patient questionnaire responses and physician history in grading clinical outcome following hip and knee arthroplasty. J Arthroplasty 11:47, 1996 6. NIH consensus conference: Total hip replacement. NIH Consensus Development Panel on Total Hip Replacement. JAMA 273:1950, 1995 7. Ware J, Sherbourne C: The MOS 36-item short-form health survey (SF): I. conceptual framework and item selection. Med Care 30:473, 1992 8. Ware JH, Snow KK, Kosinski M, et al: SF-36 health survey: manual and interpretation guide. The Health Institute, Boston, 1993 9. Bellamy N, Buchanan W: A preliminary evaluation of the dimensionality and clinical importance of pain and disability in osteoarthritis of the hip and knee. Clin Rheumatol 5:231, 1986 10. Bellamy N, Buchanan WW, Goldsmith CH, et al: Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 15:1833, 1988 11. Bellamy N: WOMAC Osteoarthritis Index: a users guide. University of Western Ontario, London, Ontario, 1995 12. Bellamy N: Osteoarthritis clinical trials: candidate variables and clinimetric properties. J Rheumatol 24: 768, 1997

580 The Journal of Arthroplasty Vol. 16 No. 5 August 2001 13. Bombardier C, Melfi CA, Paul J, et al: Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care 33:AS131, 1995 14. Hawker G, Melfi C, Paul J, et al: Comparison of a generic (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol 22:1193, 1995 15. Hawker G, Wright J, Coyte P, et al: Health-related quality of life after knee replacement. J Bone Joint Surgery Am 80A:163, 1998 16. Harris WH, Sledge CB: Total hip and total knee replacement (2). N Engl J Med 323:801, 1990 17. Harris WH, Sledge CB: Total hip and total knee replacement (1). N Engl J Med 323:725, 1990 18. Lieberman JR, Dorey F, Shekelle P, et al: Outcome after total hip arthroplasty, comparison of a traditional disease specific and a quality of life measurement outcome. J Arthroplasty 12:639, 1997 19. Bellamy N, Buchanan W, Goldsmith CH, et al: Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip and knee. J Rheumatol 15:1833, 1988 20. Bellamy N, Buchanan W, Goldsmith C, et al: Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes following total hip or knee arthroplasty in osteoarthritis. J Orthop Rheumatol 1:95, 1988 21. Harris HW: Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty; an end-result stage using a new method of result evaluation. J Bone Joint Surg 51A:737, 1969

22. Malchau H, Soderman P, Herberts P: The validity and reliability of Harris Hip Score. In SICOT. SICOT, Sydney, 1999 23. Fortin PR, Clarke AE, Joseph L, et al: Outcomes of total hip and knee replacement performed in a US and a Canadian referral center: preoperative functional status predicts outcomes at six months after surgery. Arthritis Rheum 42:1722, 1999 24. Mahomed NN, Katz JN, Liang MH, et al: The role of patient expectations in predicting functional outcomes and satisfaction with total hip and knee arthroplasty. Canadian Orthopaedic Association Annual Meeting, Ottawa, Canada, 1998 25. Mahomed NN, Phillips CB, Fossel AH, et al: Functional health status and satisfaction with outcome in revision arthroplasty. J Bone Joint Surg Br 80B:11, 1998 26. Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 33:159, 1977 27. Cohen J: A coefficient of agreement for nominal scales. Education Psychology Measures 20:37, 1960 28. Rosner B: Fundamentals of biostatistics. Duxbury Press, Boston, 1995 29. McGrory BJ, Shinar AA, Freiberg AA, et al: Enhancement of the value of hip questionnaires by telephone follow-up evaluation. J Arthroplasty 12:340, 1997 30. Wright J, Young N: The patient-specific index: Asking patients what they want. J Bone Joint Surg Am 79A:974, 1997 31. Wright J, Young NL: A comparison of different indices of responsiveness. J Clin Epidemiol 50:239, 1997 32. Lieberman J, Dorey F, Shekelle P, et al: Differences between patients’ and physicians’ evaluations of outcome after total hip arthroplasty. J Bone Joint Surg 78A:835, 1996