Reliability of the Ahlbäck classification of knee osteoarthritis

Reliability of the Ahlbäck classification of knee osteoarthritis

OsteoArthritis and Cartilage (2003) 11, 580–584 © 2003 OsteoArthritis Research Society International. Published by Elsevier Ltd. All rights reserved. ...

88KB Sizes 4 Downloads 49 Views

OsteoArthritis and Cartilage (2003) 11, 580–584 © 2003 OsteoArthritis Research Society International. Published by Elsevier Ltd. All rights reserved. doi:10.1016/S1063-4584(03)00095-5

International Cartilage Repair Society

Reliability of the Ahlba¨ck classification of knee osteoarthritis M. Galli M.D.*, V. De Santis M.D. and L. Tafuro M.D. Orthopaedic Department, Catholic University, Rome, Italy Summary Objective: The aim of the study was to assess the inter-observer reliability and intra-observer reproducibility of the Ahlba¨ck radiographic classification for knee osteoarthritis (OA). Design: Ninety-six knee radiographs of patients with clinical signs of arthritis were evaluated and classified into five grades of Ahlba¨ck classification by three observers with different experience. The evaluation was repeated after 1 month. The inter- and intra-rater agreements were assessed by statistical analysis (k and kw coefficients). Results: The inter-observer agreement was statistically significant, but with low or medium values of the coefficients. The intra-rater agreement was less significant than the inter-rater. Inter- and intra-rater agreement coefficients decreased when the cases of stage V arthritis were excluded from the analysis. The evaluations were influenced by the age and working experiences of each observer. Conclusions: The authors stress that the Ahlba¨ck classification, as routinely applied, should not be used in orthopaedic surgery without the support of clinical and/or arthroscopic examinations, because of its poor reliability. © 2003 OsteoArthritis Research Society International. Published by Elsevier Ltd. All rights reserved. Key words: Osteoarthritis, Knee, Ahlba¨ck classification, K statistics.

Moreover, for a practical and extensive use of a particular classification, it is important to verify that its reliability is preserved when applied by observers of different experience using radiographs routinely performed in a major radiological centre. The aim of the present study was, therefore, to assess, by appropriate statistical techniques, the degree of inter- and intra-observer reproducibilities of the Ahlba¨ck classification system for knee OA, in a series of radiographs, in compliance with the conditions mentioned previously.

Introduction The Ahlba¨ck classification of knee osteoarthritis (OA), published in 19681, is probably the most quoted classification in the literature, and is still widely used in clinical practice. It is applied in orthopaedics for the follow-up of disease progression2, as a classification criterion in clinical trials3 or epidemiological studies4,5. It is also used (together with other clinical and radiological parameters) for surgical indications: total knee arthroplasty6 or unicompartmental7, arthroscopic debridement with or without tibial osteotomy8, YAG laser for medial femoral condyle chondroplasty9 or for assessment of orthopaedic surgery outcomes10,11. Quite recently, the ‘consensus meeting’ (Firenze, 2001)12 of the Knee Committee of the International Society of Arthroscopy, Knee Surgery and Orthopaedic Sports Medicine indicated Ahlba¨ck classification as a valid aid in selecting the surgical procedure12. Determination of the Ahlba¨ck stage, therefore, has important consequences, as it has a direct effect on the patient’s clinical course. Despite its common use and acceptance, its reliability and reproducibility are not reported. In a recent review by Ulm University13, concerning the reliability of several classifications of arthritis (16 for the knee and three for the knee and hip), Ahlba¨ck classification is not mentioned. The authors stress that their choice is ‘restricted to those radiographic systems for which studies on inter- and intra-rater reliabilities have been published’. Obviously, radiographic classifications should not be used, in particular, for surgical indications, without an assessment of intra- and inter-observer reliabilities.

Materials and methods To ensure uniformity, only the radiological material of one centre (Department of Radiology of the Catholic University, Policlinico A. Gemelli) was considered. The Department agreed to supply the radiographs of 100 knees of patients referred to X-ray examinations because of OA clinical signs; our request was for radiographs consecutively performed since the beginning of 2001, excluding those where pathologies other than OA were found. Radiographs were performed in AP and lateral views lying down; standing, leg-extended, teleradiographs (at 2 m) with central ray aimed to joint space were also performed. Only the standing view was used for the study. All identifying data on the radiograph were obscured. Mean age of the patients was 70 years (range 55–81 years). Each radiograph was reviewed and classified by three observers: a senior orthopaedist (A), a medical student attending the department of orthopaedics (B) and a resident in the same department with special interest in knee pathologies, in his fourth year of postgraduate training (C). An explanation of the classification and a copy of Ahlba¨ck’s original paper were given as reference to each

*Address correspondence and reprint requests to: Marco Galli, Via Misurina 33, Rome 00135, Italy. Tel: 39-633653616; Fax: 39-63051161; E-mail: [email protected] Received 4 December 2002; revision accepted 16 April 2003.

580

Osteoarthritis and Cartilage Vol. 11, No. 8

581 Table I Ahlba¨ck classification criteria

Stage 0: No radiographic sign of arthritis Stage I: Narrowing of the joint space (JSN) (with or without subchondral sclerosis). JSN is defined by a space inferior to 3 mm, or inferior to the half of the space in the other compartment (or in the homologous compartment of the other knee) Stage II: Obliteration of the joint space Stage III: Bone defect/loss <5 mm Stage IV: Bone defect/loss between 5 and 10 mm Stage V: Bone defect/loss >10 mm, often with subluxation and arthritis of the other compartment

observer. After a discussion on six radiographs showing the different stages of the classification, the observers accepted to participate using the consensus criteria of Table I. No questions or discussion between observers were allowed during or after the test. For the inter-rater evaluation, each observer reviewed the radiograms randomly, and independently of the others. During this first phase of the study, four radiographs were discarded by the senior participant because of poor radiological quality; therefore, 96 radiographs were actually studied (see Table II). For intra-rater evaluation, the same radiographs placed in a new order and under the same conditions were classified by each observer after 1 month. We determined the proportion of agreement of the three observers using the Cohen’s Kappa statistical analysis14, where k is a coefficient of agreement corrected for chance (k can vary from 0, complete disagreement, to 1, complete agreement). For paired inter-rater evaluation of each observer with each one of the others and for the determination of the intra-rater agreement of the same observer after 1 month, we used kw instead of k. When two observers are compared, the ‘weighted’ coefficient kw has the advantage of taking into account also partial agreement; thus, kw is greater than k when calculated for the same cases15,16. In order to reject the H0 hypothesis (the ‘no agreement’ hypothesis), the test, Z=k/SE(k)17, was used (SE, standard error). The strength of agreement was defined by Fleiss’ score18, as follows: 0—insufficient agreement (coefficient k not significant); 1—low agreement* (k<0.40); 2—medium agreement** (0.40
Results CASES

In the classification of the senior participant, the 96 radiographs were subdivided as follows: grade 0, number 10; grade 1, number 9; grade 2, number 15; grade 3, number 20; grade 4, number 25; grade 5, number 17. The more advanced stages of OA prevailed on the less advanced ones. Table II shows in detail the scores given by each observer to each radiograph. AGREEMENT BETWEEN THE THREE OBSERVERS (K)

The proportion of agreement of the three observers (corrected for chance) was k=0.23*(23%) with SE=0.03. Beeing Z=7.67 the possibility of no agreement can be rejected (P<0.001), but the value of k shows that the agreement was low (Fleiss’ score, 1).

INTER-RATER AGREEMENT BETWEEN PAIRED OBSERVERS (KW TEST)

Table III shows the kw coefficients evaluated between observers A vs B, A vs C, B vs C; the results when grade V radiographs are excluded from the analysis are in italics. The best agreement (medium agreement in Fleiss’ score) is between A and C; there is agreement (borderline medium) also between A and B, while observer B does not significantly agree with observer C. The coefficients of agreement decrease for all the observers when grade V radiographs (with well-evident OA) are excluded from the analysis. INTRA-RATER AGREEMENT FOR EACH OBSERVER (KW TEST)

Table IV shows the kw coefficients for intra-rater agreement (italics represent the same meaning as in Table III). Intra-rater agreement coefficients are lower than inter-rater ones. The difficulty to reproduce the same evaluations after 1 month (more pronounced for observer B) becomes greater for all observers when grade V radiographs are excluded. Observer C shows the best performance, with a significant intra-rater value also for low grades of arthritis.

Discussion Although Ahlba¨ck classification is not mentioned, the extensive review by Sun et al.13 clearly demonstrated that radiographic classifications of arthritis, and mainly the Kellgren and Lawrence19 classification, show higher reliability when applied to the knee than when applied to the hip. Ahlba¨ck and Kellgren and Lawrence classifications show a good correlation, k⫽0.76, according to Petersson et al.5. Moreover, Ahlba¨ck classification seems easy to apply and suitable for the assessment of medial compartment arthritis of the knee20; thus, it seems particularly useful for the orthopaedic treatment of knee disorders. It is based on joint space narrowing (JSN), which is the feature that seems to be the most reliable parameter to reveal the degenerative disorder and to follow its evolution21,22. On the contrary, the Kellgren and Lawrence classification is based on the presence of osteophytes and sclerosis, focusing on osteophytes. However, it has been stated that osteophytes can also be present, especially in the elderly, without signs of arthritis and with a preserved cartilage23,24. Some doubts and criticism about Ahlba¨ck classification are reported in the following discussion. For a correct appreciation of JSN, the radiographic technique is very important. In our study, the radiographs were performed in full extension of the knee, as is usually done. These are not very suitable to show signs of initial arthritis; a 30° of flexion of the knee can shift the evaluation towards a higher

582

M. Galli et al.: Ahlba¨ck classification of knee osteoarthritis Table II Details of the scores given by each observer to each radiograph

Case number

Grades 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

1

2 °



3

5 ` `

` ° °` *° *° ° *`

*

*

* *°

°`

° * `

* °

°` `

*

*°` °`





*° *°`

* `

`

*

* *° °

°` ` *`

* *

°`

° ° *° °

` *°

` *

°` *

*` °` ` *°`

` *° °`

° °

*`

*° *`

*` *°

*°` *°` `

`

`

*

` ` *°`

* `



*`

° °`

*°` °`

`

4 *°

*` *°

* `

*`

Case number

°

° `

Grades 0

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

* *° *°

1

2

*`

°

` °` ` `

*

°

`



`

`

*° °`

3

4

* °

°`



`

°

` *` *°`

*

°`

* *°

* * °

`

*

*

° *° *

`

`

5

* *°

`

*°` °` ` *°`

°` ° *°` °`

* `

*° *°` *°` *°`

*

°` *` *

`

` ` `

*

*

° ° * ° *

° *° *` ° * °`

°

°` ` ` *°`

°` *`

°

`

* * *

°` °`

*

°

*° *

°`

*Observer A; °observer B; `observer C.

Ahlba¨ck stage25. Parallax is also important for a correct analysis of JSN, that should be ascertained under fluoroscopy control13. Accuracy and precision of joint space width measurements can be improved with a view of semiflexion, correction for magnification and automated computerised measurement26. Moreover, the relationship between radiographic JSN and cartilage degeneration as demonstrated during arthroscopy is not clearly defined and constant27. Our study supports further elements of doubt. In fact, for a widespread use, an ideal classification with clear and well-defined semiologic criteria should provide the same, or at least similar, results independently of the individual skills, specific experience and age of the observer. We chose the

Table III The kw coefficients evaluated between observers A vs B, A vs C, B vs C A B

0.40** 0.28*

C

0.45** 0.26*

B

0.18 (n.s.) 0.15 (n.s.)

*Low agreement; **medium agreement; n.s., not significant. The coefficients of agreement decrease for all the observers when grade V radiographs are excluded from the analysis (coefficients in italics).

Osteoarthritis and Cartilage Vol. 11, No. 8

583

Table IV The kw coefficients for intra-rater agreement

Whole series Grade V excluded

References

A

B

C

0.32* 0.15 (n.s)

0.17 (n.s) 0.01 (n.s.)

0.35* 0.33*

*Low agreement; n.s., not significant. Italics have the same meaning as in Table III. Observer C shows the best performance, with a significant intra-rater value also for low grades of arthritis.

Ahlba¨ck score because, being rather simple and rough, a good reproducibility could be expected even when applied by observers not too skilled in its usage. On the contrary, in our study we observed that • evaluations performed by observers with different experience and age showed low agreement (even though statistically significant); • personal attitudes played an important role. Evaluations of observer B, the youngest and less experienced, showed a significant correlation with those of the senior orthopaedist A (Table III). Observer C had a better capacity to reproduce his evaluations after 1 month (Table IV); • age and experience were important. The study shows fairly high values of kw for observer A, the oldest and most experienced of them; • the classification was not suitable for a good differentiation between low and medium grades; the coefficients significantly decreased when grade V radiographs were excluded. It should be stressed that orthopaedic treatment protocols are mainly based on medium and low Ahlba¨ck stages; • the reproducibility (precision in the observation) was poor; the observers found difficulty in exactly reproducing their evaluations after 1 month. The kw intra-rater values were constantly lower than those of inter-rater; • the observed values of the coefficients were lower than those reported in some reports for other classifications, as Kellgren and Lawrence19, Summers et al.28 and Kannus et al.29 classifications. Of course, we are aware that the reliability could undoubtedly be improved if the mentioned radiographic particular devices were used, and the radiographs were evaluated only by very expert and skilled observers. However, these conditions are seldom met in practice. Additional studies aimed to assess the reliability of the classification in ‘ideal’ conditions could probably be interesting.

Conclusions Our study shows that the Ahlba¨ck classification has some major limitations. It could probably achieve good results when using a particular and optimal radiographic technique and when the radiographs are evaluated by a very expert orthopaedist or rheumatologist. But caution should be recommended for a routine use in general practice. In particular, in our opinion it should not be used for surgical treatment protocols without previous clinical and arthroscopical examinations.

1. Ahlba¨ck S. Osteoarthrosis of the knee. A radiographic investigation. Acta Radiol Diagn (Stockh) 1968;277(Suppl):7–72. 2. Sahlstrom A, Johnell O, Redlund-Johnell I. The natural course of arthrosis of the knee. Clin Orthop 1997; 340:152–7. 3. Nelissen RG, Valstar ER, Rozing PM. The effect of hydroxyapatite on the micromotion of total knee prostheses. A prospective, randomized, double-blind study. J Bone Joint Surg Am 1998;80:1665–72. 4. Larsson AC, Petersson I, Eckdahl C. Functional capacity and early radiographic osteoarthritis in middle-aged people with knee pain. Physiother Res Int 1998;3:153–63. 5. Petersson IF, Boegard T, Saxne T, Silman AJ, Svensson B. Radiographic osteoarthritis of the knee classified by the Ahlback and Kellgren/Lawrence systems for the tibiofemoral joint in people aged 35–54 years with chronic knee pain. Ann Rheum Dis 1997; 56:493–6. 6. Mont MA, Mayerson JA, Krackow KA, Hungerford DS. Total knee arthroplasty in patients receiving workers’ compensation. J Bone Joint Surg Am 1998;801: 285–90. 7. Romanowsky MR, Rapicci JA. Eight year follow-up on minimally invasive unicondilar arthroplasty. American Academy of Orthopaedic Surgeons, Annual Meeting Posters Presentations, Dallas 2002, February 13–17. 8. Branca A, De Pamfilis R, Cevenini A. Indicazioni e limiti delle tecniche tradizionali nel trattamento della patologia degenerativa del ginocchio. 17° SIGC Congress, Bari 1998, November 26–28. Available from: http://www.sicg.org//Abstract/Bari98/branca. htm. 9. Janecki CJ, Perry MW, Bonati AO, Bendel M. Parameters for safe application of the Holmium YAG Laser for chondroplasty of the medial femoral condyle. J Arthroplasty Arthroscopic Surg 1998;9(1): 1–6. 10. Aydogdu S, Sur H. High tibial osteotomy for varus deformity of more than 20 degrees. Rev Chir Orthop Reparatrice Appar Mot 1997;84(5):439–46. 11. Rockborn P, Gillquist J. Results of open meniscus repair. Long term follow-up study with a matched unjured control group. J Bone Joint Surg Br 2000; 82(4):494–8. 12. International Society of Arthroscopy, Knee Surgery and Orthopaedic Sports Medicine (ISAKOS). Knee Committee Hosts Meeting in Florence: Total knee replacement in relatively young patients with osteoarthritis generates discussion and consensus. Summer 2001 Newsletter. Available from: http://www. isakos.com/newsletter/. 13. Sun Y, Guenther KP, Brenner H. Reliability of radiographic grading of osteoarthritis of the hip and knee. Scand J Rheumatol 1997;26:155–65. 14. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37–47. 15. Norman GR, Steiner DL. Biostatistica. Milano: Casa Editrice Ambrosiana, 2000, p.166–7. 16. Armitage P, Berry G. Statistical Methods in Medical Research. Oxford: Blackwell, 1994 (Chap 12). 17. Siegel S, Castellan NJ. Statistica non parametrica. Milan: McGraw-Hill, 1992 (Chap 9), p. 361.

584 18. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd edn. New York: Wiley, 1981 (Chap 13). 19. Kellgren JH, Lawrence JS. Radiological assessment of osteoarthrosis. Ann Rheum Dis 1957;16:494–502. 20. Keyes GW, Carr AJ, Miller RK, Goodfellow JW. The radiographic classification of medial gonoarthrosis. Correlation with operation methods in 200 knees. Acta Orthop Scand 1992;63(5):497–501. 21. Hart DJ, Spector TD. The classification and assessment of osteoarthritis. In: Silman AJ, Symmons DPM, Eds. Classification and Assessment of Rheumatic Diseases: Part I. London: Baillie`re Tindall 1995; 407–32. 22. Altman RD, Fries JF, Bloch DA, Carstens J, Cooke TD, Genant H, et al. Radiographic assessment of progression in osteoarthritis. Arthritis Rheum 1987; 30(11):1214–25. 23. Danielsson LG, Hernborg J. Clinical and roentgenological study of knee joints with osteophytes. Clin Orthop 1970;69:302–12. 24. Hernborg J, Nilsson BE. The relationship between osteophytes in the knee joint, osteoarthritis and aging. Acta Orthop Scand 1973;44:69–74.

M. Galli et al.: Ahlba¨ck classification of knee osteoarthritis 25. Davies AP, Calder DA, Marshall T, Glasgow MM. Plain radiography in the degenerate knee. A case for change. J Bone Joint Surg Br 1999;81(4):632–5. 26. Buckland-Wright JC, Macfarlane DG, Williams S, Ward RJ. Accuracy and precision of joint space width measurements in standard and macroradiography of osteoarthritic knees. Ann Rheum Dis 1995;54: 872–80. 27. Brandt KD, Fife RS, Braunstein EM, Katz B. Radiographic grading of the severity of knee osteoarthritis: relation of the Kellgren and Lawrence grade to a grade based on joint space narrowing, and correlation with arthroscopic evidence of articular cartilage degeneration. Arthritis Rheum 1991;34(11):1381–6. 28. Summers MN, Haley WE, Reveille JD, Alarcon GS. Radiographic assessment and psychologic variables as predictors of pain and functional impairment in osteoarthritis of the knee or hip. Arthritis Rheum 1988;31(2):204–9. 29. Kannus P, Jarvinen M, Paakkala T. A radiological scoring scale for evaluation of post-traumatic osteoarthritis after knee ligament injuries. Int Orthop 1988; 12:291–7.