J Shoulder Elbow Surg (2015) 24, 353-357
www.elsevier.com/locate/ymse
Reliability testing of two classification systems for osteoarthritis and post-traumatic arthritis of the elbow Michael H. Amini, MD, Joshua B. Sykes, MD, Stephen T. Olson, MD, Richard A. Smith, PhD, Benjamin M. Mauck, MD, Frederick M. Azar, MD, Thomas W. Throckmorton, MD* Department of Orthopaedic Surgery and Biomedical Engineering, University of Tennessee–Campbell Clinic, Memphis, TN, USA Hypothesis and background: The severity of elbow arthritis is one of many factors that surgeons must evaluate when considering treatment options for a given patient. Elbow surgeons have historically used the Broberg and Morrey (BM) and Hastings and Rettig (HR) classification systems to radiographically stage the severity of post-traumatic arthritis (PTA) and primary osteoarthritis (OA). We proposed to compare the intraobserver and interobserver reliability between systems for patients with either PTA or OA. Methods: The radiographs of 45 patients were evaluated at least 2 weeks apart by 6 evaluators of different levels of training. Intraobserver and interobserver reliability were calculated by Spearman correlation coefficients with 95% confidence intervals. Agreement was considered almost perfect for coefficients >0.80 and substantial for coefficients of 0.61 to 0.80. Results: In patients with both PTA and OA, intraobserver reliability and interobserver reliability were substantial, with no difference between classification systems. There were no significant differences in intraobserver or interobserver reliability between attending physicians and trainees for either classification system (all P > .10). The presence of fracture implants did not affect reliability in the BM system but did substantially worsen reliability in the HR system (intraobserver P ¼ .04 and interobserver P ¼ .001). Conclusions: The BM and HR classifications both showed substantial intraobserver and interobserver reliability for PTA and OA. Training level differences did not affect reliability for either system. Both trainees and fellowship-trained surgeons may easily and reliably apply each classification system to the evaluation of primary elbow OA and PTA, although the HR system was less reliable in the presence of fracture implants. Level of evidence: Level III, Diagnostic Study. Ó 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. Keywords: Elbow; osteoarthritis; post-traumatic arthritis; classification; reliability
Approval granted by University of Tennessee Center for the Health Sciences Institutional Review Board, Memphis, Tennessee: No. 12-0200-XM. *Reprint requests: Thomas W. Throckmorton, MD, 1211 Union Avenue, Suite 510, Memphis TN 38104, USA. E-mail address:
[email protected] (T.W. Throckmorton).
Although fractures of the elbow joint represent only 6% of fractures in adults, they occur in patients of all ages,12 and radiographic evidence of post-traumatic arthritis (PTA) is common.11 Primary osteoarthritis (OA) of the elbow accounts for 1% to 2% of all patients presenting with
1058-2746/$ - see front matter Ó 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. http://dx.doi.org/10.1016/j.jse.2014.10.015
354
M.H. Amini et al.
Figure 1 Broberg and Morrey classification of elbow arthritis. (A) Grade 1: slight joint space narrowing with minimal osteophyte formation. (B) Grade 2: moderate joint space narrowing with moderate osteophyte formation. (C) Grade 3: severe degenerative change with gross destruction of the joint.
Figure 2 Hasting and Rettig classification of elbow arthritis. (A) Class I: degeneration in the margins of the ulnotrochlear joint with the presence of coronoid and olecranon spurring; absence of degenerative changes within the radiocapitellar joint. (B) Class II: class I with mild joint space narrowing within the radiocapitellar joint, without subluxation of the radial head. (C) Class III: class II with radiocapitellar subluxation.
degenerative arthritis16 and is most common in middleaged, male laborers.13,15,17 Many treatment options exist for elbow arthritis,17 and appropriate treatment for individual patients is based on a multitude of factors, including the severity of disease. PTA is commonly graded according to the system proposed by Broberg and Morrey (BM), based on osteophyte formation and joint space narrowing.3 Primary OA usually is graded according to the system proposed by Hastings and Rettig (HR), based on the presence of subluxation and involvement of the radiocapitellar joint.15 Previous work has demonstrated fair reliability of the BM system in grading of patients with PTA,10 but the reliability of the HR system has not been examined. Further, these 2 systems have not been compared in the same cohort of patients. The purpose of this investigation was to evaluate the intraobserver and interobserver reliability of the BM and HR systems in patients with PTA and OA. We also sought to determine the effect that level of training of the observers had on the reliability of both systems. We hypothesized that both systems would prove reliable in grading of both types of elbow arthritis and that more senior observers would be more reliable than trainees.
Materials and methods This is an agreement study of 2 classification systems for elbow OA and PTA in a group of nonconsecutive patients. After receiving approval from our Institutional Review Board, we used Current Procedural Terminology codes to identify 45 patients who were seen for elbow arthritis, excluding inflammatory conditions, at our institution. Best-quality anteroposterior and lateral radiographs from each patient were de-identified, and all radiographs were reoriented to represent right elbows to improve consistency. The original characterizations for both the BM3 (Fig. 1) and HR15 (Fig. 2) classifications were included in the beginning of the file for reference by the evaluators. The films were then electronically distributed to the 6 evaluators (3 attending orthopedic surgeons with fellowship training in upper extremity surgery [B.M.M., F.M.A., and T.W.T.] and 3 orthopedic trainees [M.H.A., J.B.S, and S.T.O.]). Each evaluator classified the radiographs according to both systems after an interval of at least 2 weeks, a period used in other reliability studies in upper extremity surgery.2,7 Of the 45 patients, 19 had PTA and 26 had primary OA. In the patients with PTA, the initial injuries included 7 radial head or neck fractures, 4 intercondylar distal humeral fractures, 2 elbow dislocations, 2 olecranon fracture-dislocations, 1 Monteggia fracture-dislocation, and 1 capitellar fracture. We were unable to definitively ascertain the initial injury in 2 patients because of the time that had passed between the injury and presentation to our institution. Four patients who had a previous radial head excision
Reliability of elbow arthritis classification Table I
355
Intraobserver and interobserver reliability
Post-traumatic arthritis Intraobserver Interobserver Osteoarthritis Intraobserver Interobserver
Broberg-Morrey classification3
Hastings-Rettig classification15
.74 (.67-.82) .65 (.62-.69)
.68 (.58-.78) .66 (.60-.72)
.77 (.73-.82) .66 (.63-.69)
.63 (.57-.70) .63 (.58-.69)
Bold ¼ statistically significant.
were not classified according to the HR system because the classification relies on involvement and subluxation of the radiocapitellar joint. Intraobserver and interobserver reliability were calculated by Spearman rank correlation coefficients with 95% confidence intervals, and agreement was stratified as follows: 0.01 to 0.20, indicating slight agreement; 0.21 to 0.40, fair; 0.41 to 0.60, moderate; 0.61 to 0.80, substantial; 0.81 to 0.99, almost perfect; and 1.00, perfect.9 Previous work has demonstrated that a binary simplification of the BM system (i.e., stages 0/1 and stages 2/3) showed better interobserver reliability.10 As a result, each observer’s ratings for both systems also were reclassified into a binary system of 0/1 or 2/3 for BM and 0/I or II/III for HR. We also compared mean intraobserver reliability between the 3 trainees and 3 attending surgeons with paired t tests. Differences with P < .05 were considered statistically significant.
Results In patients with PTA, both the BM and HR systems demonstrated substantial intraobserver and interobserver reliability (Table I). Ratings in the BM system were unanimous in 2 patients, spanned 2 categories in 13 patients (all raters chose none and mild or all chose mild or moderate), and spanned 3 categories in 4 patients (all raters chose none, mild, or moderate). In the HR system, ratings were unanimous in 2 patients, spanned 2 categories in 9 patients, and spanned 3 categories in 4 patients. Four patients with previous radial head resections were not classified according to the HR system, as mentioned previously. In patients with OA, both the BM and HR systems again demonstrated substantial intraobserver and interobserver reliability (Table I). Ratings in the BM system were unanimous in 4 patients, spanned 2 categories in 17 patients, and spanned 3 categories in 5 patients. In the HR system, ratings were unanimous in 4 patients, spanned 2 categories in 18 patients, and spanned 3 categories in 4 patients. When ratings were reclassified in a binary fashion (0/1 or 2/3 for BM and 0/I or II/III for HR), intraobserver reliability and interobserver reliability were moderate (0.410.60) in all but 1 circumstance. In all instances except 1, both systems were more reliable with use of standard grading as originally described rather than with binary classification (Table II).
Table II
Comparison of standard and binary ratings
Post-traumatic arthritis Broberg-Morrey3 Intraobserver Interobserver Hastings-Rettig15 Intraobserver Interobserver Osteoarthritis Broberg-Morrey3 Intraobserver Interobserver Hastings-Rettig15 Intraobserver Interobserver
Standard
Binary
.74 (.67-.82) .65 (.62-.69)
.54 (.37-.70) .38 (.24-.56)
.68 (.58-.78) .66 (.60-.72)
.45 (.12-.78) .44 (.23-.70)
.77 (.73-.82) .66 (.63-.69)
.55 (.45-.63) .48 (.31-.66)
.63 (.57-.70) .64 (.58-.69)
.54 (.25-.83) .66 (.51-.80)
Bold ¼ statistically significant.
Table III
Reliability between staff surgeons and trainees
Post-traumatic arthritis Broberg-Morrey3 Hastings-Rettig15 Osteoarthritis Broberg-Morrey3 Hastings-Rettig15
Table IV head
Staff
Trainee
P values
.74 .61
.75 .76
.91 .20
.81 .58
.74 .68
.11 .18
Reliability with presence or absence of the radial Present
Absent
.74 (.68-.80) .66 (.62-.70)
.88 (.72-1.0) .73 (.66-.80)
.68 (.58-.78) .61 (.56-.66)
NA NA
3
Broberg-Morrey Intraobserver Interobserver Hastings-Rettig15 Intraobserver Interobserver NA, not applicable.
There were no statistically significant differences in reliability between the 3 fellowship-trained staff surgeons and the 3 trainees for either classification system or diagnosis. Both staff and trainees demonstrated mostly substantial agreement (Table III), and the presence or absence of a radial head did not influence reliability (Table IV). In patients with PTA, the presence or absence of implants did not influence reliability with the BM system; however, with the HR system, reliability was moderate to substantial in the absence of implants but worsened to slight reliability when implants were present (Table V).
356
M.H. Amini et al.
Table V Reliability with or without fracture implants (posttraumatic arthritis) Present
Absent
.78 (.68-.88) .62 (.52-.72)
.73 (.61-.84) .69 (.65-.73)
.14 (0-1.0) .16 (0-.43)
.69 (.51-.86) .54 (.47-.60)
3
Broberg-Morrey Intraobserver Interobserver Hastings-Rettig15 Intraobserver Interobserver
Bold ¼ statistically significant.
Discussion Elbow arthritis continues to be a challenging problem. One factor that weighs heavily in determining the appropriate choice of the many available treatment options is the severity of disease. The purpose of this study was to assess the reliability of the BM and HR classification systems in evaluating the radiographic severity of degenerative changes in patients with PTA and OA. The results revealed substantial intraobserver and interobserver reliability of both the BM and HR classifications in grading of patients with PTA and OA. Furthermore, our findings did not suggest that training level affected reliability, which was largely substantial for both staff surgeons and trainees. In contrast to previous studies,10 comparing severity in a binary fashion for either system did not improve reliability. In fact, we found it to be less reliable than standard evaluation according to the originally described criteria. In patients with PTA, a previous radial head excision did not affect reliability of the BM system (these patients were not rated in the HR system, as mentioned previously). The presence of implants did not adversely affect the reliability with the BM system but did result in only slight reliability with the HR system. This may reflect the difficulty in assessing 2 key distinguishing features of the HR system, degenerative changes and subluxation of the radiocapitellar joint, in the presence of fracture implants. Specifically, radial head implants and other hardware tend to obscure the radiocapitellar articulation, making classification more difficult. To our knowledge, this is the first study examining the reliability of the HR classification, and although it is often used in studies of PTA, little has been published examining the reliability of the BM classification. In a previous study demonstrating fair agreement,10 the authors included a large number of raters from different institutions and even different countries. Their results demonstrated worse reliability in younger surgeons who treated less elbow trauma annually and improved reliability with binary ratings. Our study demonstrated substantial reliability that may partially be due to the smaller number of raters and the fact that all the raters were from a single institution. Also, the strong reliability between staff surgeons and trainees may be because the trainee-raters were under the tutelage of the staff-raters. In
the current study, it is unclear why ratings did not improve in a binary system. This may reflect the difficulty in improving on the initially strong reliability numbers in this study compared with the initial fair reliability in the previous study. One of the limitations of the current study is the smaller number of raters, all from a single institution; however, 6 raters is comparable to similar upper extremity reliability studies in which 6 or fewer raters were used.1,2,4-8,14 Another limitation is the lack of clinical correlation in the use of the BM system in patients with OA and the HR system in patients with PTA, as neither classification was originally described for these respective diagnoses. Further correlation of function and outcomes is necessary before routine use of these pairings; however, reliability was similarly substantial in the classification of patients in either system with the originally intended diagnosis (BM for PTA and HR for OA). We believe this reflects the common characteristics seen in degenerative elbow disease, such as joint space narrowing, osteophyte formation, and eventual ulnohumeral erosion leading to radiocapitellar subluxation. Because both classification schemes evaluate these common characteristics to some extent, these results suggest that both systems are more versatile for radiographic classification of elbow osteoarthritic conditions than originally described. Finally, the radiographic systems analyzed here do not account for a patient’s clinical presentation, such as pain and stiffness, and therefore may not be precise for guiding treatment. However, the reliability data presented here may help provide a common language not just for classification purposes in research but also for communication among clinicians in discussing the radiographic severity of degenerative elbow disease. In summary, the findings of this study support the use of the BM and HR classifications in patients with PTA and primary OA. This may prove helpful in future studies by allowing comparison of treatment choice, function, and outcomes according to severity of radiographic changes.
Conclusion Identifying the severity of radiographic degenerative changes in elbow arthritis is critical in formulating an appropriate treatment plan for each patient. Both the BM and HR classification systems demonstrated substantial intraobserver and interobserver reliability in evaluating the radiographic severity of PTA and OA. In this study, both systems were reliably applied by trainees and staff surgeons.
Disclaimer The authors, their immediate families, and any research foundation with which they are affiliated have not
Reliability of elbow arthritis classification received any financial payments or other benefits from any commercial entity related to the subject of this article.
357
8.
9.
References 1. Austin LS, O’Brien MJ, Zmistowski B, Ricchetti ET, Kraeutler MJ, Joshi A, et al. Additional x-ray views increase decision to treat clavicular fractures surgically. J Shoulder Elbow Surg 2012;21:1263-8. http://dx.doi.org/10.1016/j.jse.2011.08.050 2. Blonna D, Zarkadas PC, Fitzsimmons JS, O’Driscoll SW. Validation of a photography-based goniometry method for measuring joint range of motion. J Shoulder Elbow Surg 2012;21:29-35. http://dx.doi.org/10. 1016/j.jse.2011.06.018 3. Broberg MA, Morrey BF. Results of delayed excision of the radial head after fracture. J Bone Joint Surg Am 1986;68:669-74. 4. Cools AM, De Wilde L, Van Tongel A, Ceyssens C, Ryckewaert R, Cambier DC. Measuring shoulder external and internal rotation strength and range of motion: comprehensive intra-rater and interrater reliability study of several testing protocols. J Shoulder Elbow Surg 2014;23:1454-61. http://dx.doi.org/10.1016/j.jse.2014. 01.006 5. Elsharkawi M, Cakir B, Reichel H, Kappe T. Reliability of radiologic glenohumeral osteoarthritis classifications. J Shoulder Elbow Surg 2013;22:1063-7. http://dx.doi.org/10.1016/j.jse.2012.11.007 6. Hall JM, Azar FM, Miller RH 3rd, Smith R, Throckmorton TW. Accuracy and reliability testing of two methods to measure internal rotation of the glenohumeral joint. J Shoulder Elbow Surg 2014;23:1296-300. http://dx.doi.org/10.1016/j.jse.2013.12. 015 7. Iannotti JP, McCarron J, Raymond CJ, Ricchetti ET, Abboud JA, Brems JJ, et al. Agreement study of radiographic classification of
10.
11.
12.
13. 14.
15.
16. 17.
rotator cuff tear arthropathy. J Shoulder Elbow Surg 2010;19:1243-9. http://dx.doi.org/10.1016/j.jse.2010.02.010 Kappe T, Cakir B, Reichel H, Elsharkawi M. Reliability of radiologic classification for cuff tear arthropathy. J Shoulder Elbow Surg 2011; 20:543-7. http://dx.doi.org/10.1016/j.jse.2011.01.012 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74. Lindenhovius A, Karanicolas PJ, Bhandari M, Ring D, COAST Collaborative. Radiographic arthrosis after elbow trauma: interobserver reliability. J Hand Surg Am 2012;37:755-9. http://dx.doi.org/10. 1016/j.jhsa.2011.12.043 Lindenhovius AL, Buijze GA, Kloen P, Ring DC. Correspondence between perceived disability and objective physical impairment after elbow trauma. J Bone Joint Surg Am 2008;90:2090-7. http://dx.doi. org/10.2106/JBJS.G.00793 McKee MD, Jupiter JB. Trauma to the adult elbow and fractures of the distal humerus. In: Browner BD, Jupiter JB, Levine AM, Trafton PG, Krettek C, editors. Skeletal trauma: basic science, management, and reconstruction. 4th ed. Philadelphia: Saunders/Elsevier; 2009. p. 1503-92. Morrey BF. Primary degenerative arthritis of the elbow. Treatment by ulnohumeral arthroplasty. J Bone Joint Surg Br 1992;74:409-13. Nowak DD, Gardner TR, Bigliani LU, Levine WN, Ahmad CS. Interobserver and intraobserver reliability of the Walch classification in primary glenohumeral arthritis. J Shoulder Elbow Surg 2010;19: 180-3. http://dx.doi.org/10.1016/j.jse.2009.08.003 Rettig LA, Hastings H 2nd, Feinberg JR. Primary osteoarthritis of the elbow: lack of radiographic evidence for morphologic predisposition, results of operative debridement at intermediate follow-up, and basis for a new radiographic classification system. J Shoulder Elbow Surg 2008;17:97-105. http://dx.doi.org/10.1016/j.jse.2007.03.014 Stanley D. Prevalence and etiology of symptomatic elbow osteoarthritis. J Shoulder Elbow Surg 1994;3:386-9. Wysocki RW, Cohen MS. Primary osteoarthritis and posttraumatic arthritis of the elbow. Hand Clin 2011;27:131-7. http://dx.doi.org/10. 1016/j.hcl.2011.02.001