J Shoulder Elbow Surg (2015) 24, 358-363
www.elsevier.com/locate/ymse
Agreement of olecranon fractures before and after the exposure to four classification systems Cesar A. Benetton, MDa, Guilherme Cesa, MDa, Gabriel El-Kouba Junior, MDa, Ana Paula B. Ferreira, DDS, PhDa,*, Jo~ao Ricardo N. Vissoci, MScb, Ricardo Pietrobon, MD, PhD, MBAc a
Orthopedics and Traumatology Institute (IOT), Joinville, SC, Brazil Medicine Department, Faculdade Inga, Maring a, PR, Brazil c Department of Surgery, Duke University Medical Center, Durham, NC, USA b
Background: Although classification systems of olecranon fractures are important to help choose the best treatment and to predict prognosis, their degree of observer agreement is poorly investigated. The objective of this study was to investigate the intraobserver and interobserver reliability of currently used classification systems for olecranon fractures. Our hypothesis is that the Colton classification presents an acceptable agreement because it is simpler to use; on the other hand, considering the AO classification’s complexity, we expect it to reach a lower level of agreement. Methods: Radiographic images of elbow joint fractures were classified according to Colton, AO, Mayo, and Schatzker classification systems. The raters were 8 orthopedic surgeons split into 2 groups with 4 participants each, one with specialists in upper extremity surgery and the other with orthopedic surgeons without a specific focus on upper extremity surgery. This first procedure was the pretest training, aimed at calibrating participants’ judgment. Image classification was conducted after all training was completed. After 30 days from the initial rating session, the test was conducted once again following the exact same procedures. Results: The Colton classification has substantial intraobserver and interobserver agreement for specialists and nonspecialists. The Schatzker classification revealed a fair agreement for both specialists and nonspecialists. A fair concordance was also found for the Mayo classification. The AO classification demonstrated a moderate rate of agreement for specialists, whereas nonspecialists presented slight intraobserver agreement. Conclusion: No classification system is widely accepted because it can be affected by interobserver variability, which can raise questions about its use in a research as well as in a clinical context. Level of evidence: Level III, Diagnosis Study. Ó 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. Keywords: Inter/intraobserver; agreement; AO classification; Mayo classification; Colton classification; Schatzker classification; olecranon fracture
We obtained approval from the Research Ethics Committee of S~ao Jose Hospital before the initiation of this project: Protocol No. 12031. All study participants provided informed consent before enrollment in the study.
*Reprint requests: Ana Paula B. Ferreira, DDS, PhD, Rua: Blumenau, 1316, Joinville, SC, 89204322, Brazil. E-mail address:
[email protected] (A.P.B. Ferreira).
1058-2746/$ - see front matter Ó 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. http://dx.doi.org/10.1016/j.jse.2014.10.025
Classification agreement of olecranon fractures Olecranon fractures represent about 10% of all upper limb fractures.19 Most fractures are intra-articular, treatment outcomes depending on anatomic reduction, stability degree, and early active mobilization of the joint.13 If they are not properly treated, these fractures can limit flexion, extension, and pronation/supination, severely affecting activities of daily living.16 Given that treatment is primarily determined by a proper anatomic classification, one would expect that multiple studies should have evaluated the degree of observer agreement for imaging classifications. Unfortunately, to our knowledge, this literature is nonexistent. Fracture classifications are useful to characterize a problem, to suggest a prognosis, and to assist in choosing the most appropriate treatment.7 The accuracy and reliability of a classification can be judged by its reproducibility, that is, by different observers at the same time or by the same observer at different times.4,7,15,18 Colton,6 Mayo,12 Schatzker,17 and AO13 classification systems are widely used in orthopedic practice, also being extensively cited in the literature.2,3,5,8 Nevertheless, there is no universal consensus regarding the reproducibility of these classification systems for olecranon fractures.10 Given this gap in the literature, the objective of this study was to investigate the intraobserver and interobserver reliability of currently used classification systems for olecranon fractures.
Materials and methods Subjects Our study included 8 orthopedic surgeons who work in the same department. Orthopedic surgeons were split into 2 groups with 4 participants each, one with specialists in upper extremity surgery and the other with orthopedic surgeons without a specific focus on upper extremity surgery. To the best of our knowledge, there is no consensus regarding power analysis for observer agreement studies. Therefore, we did not conduct a sample size calculation.
Images Eighteen cases of elbow joint fractures with anteroposterior and profile (lateral view of the elbow) radiographic images were selected from the records of the Hospital Municipal S~ao Jose (Joinville, SC, Brazil) from July to December 2012. Images were retrieved by 2 third-year orthopedic residents and by 1 orthopedic surgeon specialist in upper extremity surgery who were aware of the classification systems. Images were chosen to be representative of a wide range of fracture patterns according to the Colton, Mayo, Schatzker, and AO classification systems (see later section on classification). Any signs that could lead to patient identification were removed. The severity of open fractures and the final outcome of each patient were not disclosed to evaluators. Radiographic images with incorrect olecranon position that could cause any misunderstanding in image classification were excluded. Images with low quality or with artifacts or other
359 technical defects during image acquisition were also excluded. Although dynamic imaging can be used in some classifications, in our study we restricted the evaluation to static radiographs.
Classification systems Colton classification This system is based on degree of displacement and fracture pattern, ultimately aiming to provide treatment decision support.6 It classifies the olecranon fractures into 2 major groups: undisplaced (type I) and displaced (type II). A type I undisplaced fracture is defined as having <2 mm of separation and no increase in displacement with flexion of 90 , and the patient can actively extend the elbow against gravity. Colton further subdivided displaced fractures into type IIA, avulsion; IIB, transverse and oblique; IIC, comminuted; and IID, fracture-dislocations.
Mayo classification This system classifies fractures on the basis of 3 factors: stability, displacement, and comminution.12 Type I fractures are undisplaced, type II are displaced and stable, and type III are displaced and unstable. Each is divided into subtype A (noncomminuted) or B (comminuted). Type I: Undisplaced fractures. In an undisplaced fracture, it matters little whether a single fragment or several fragments are present; thus, noncomminuted (type IA) and comminuted (type IB) fractures may be considered to be essentially the same lesion. Type II: Displaced, stable fractures. In this pattern, the fracture fragments are displaced more than 3 mm, but the collateral ligaments are intact and the forearm is stable in relation to the humerus. The fracture may be either noncomminuted (type IIA) or comminuted (type IIB). Type III: Displaced, unstable fractures. The type III fracture is one in which the fracture fragments are displaced and the forearm is unstable in relation to the humerus. This injury is really a fracture-dislocation. It also may be either noncomminuted (type IIIA) or comminuted (type IIIB).
Schatzker classification This system focuses specifically on fracture morphology and the biomechanical considerations related to each type of internal fixation.17 It is composed of 6 fracture patterns, types A to F. Type A fractures describe a transverse fracture extending through the articular surface of the ulnohumeral joint. Type B fractures describe a type A fracture with some associated comminution or impaction at the articular surface. Type C and type D fractures are oblique fractures (proximal to the midpoint of the trochlear notch) and comminuted fractures, respectively. Type E fractures comprise an oblique fracture that begins and ends distal to the midpoint of the trochlear notch of the ulna. Schatzker type F injuries include olecranon fracture-dislocations that involve a fracture of the radial head and likely soft tissue disruption of the medial collateral ligament.
AO classification This system classifies olecranon fractures as a subdivision of elbow fractures.13 They can be further classified into type A fractures, which represent extra-articular fractures of the metaphysis of either the radius or ulna. Type B fractures are intraarticular fractures of the radius or ulna. Olecranon fractures,
360 Table I
C.A. Benetton et al. Interobserver agreement measurements of each classification system among specialists and nonspecialists
Agree (%) k (P value) Confidence interval (95%)
Specialists Colton vs. nonspecialists Colton
Specialists Schatzker vs. nonspecialists Schatzker
Specialists Mayo vs. nonspecialists Mayo
Specialists AO vs. nonspecialists AO
75 0.674 (<.01) 0.53-0.79
46.2 0.332 (<.01) 0.20-0.48
36.2 0.192 (<.01) 0.07-0.34
37.5 0.211 (<.01) 0.08-0.35
specifically, are subclassified in this group as type B1 fractures. Type C fractures are intra-articular fractures of both the olecranon and radial head.
Procedures This first procedure was the pretest training, aimed at calibrating participants’ judgment so that all could start at a similar level. A collection of 4 images was presented to participants through multimedia projections. Images were projected one at a time. Each image remained on the screen until the last participant had classified it. Each participant was asked to independently classify each image according to the classification systems described before. During the whole process, participants were able to consult all printed classification schemes. After the last participant had finished, all of them discussed their answers for each classification system. The discussion was moderated to establish consensus and was mediated through a detailed coverage of each classification. The orthopedic resident responsible for this study mediated the discussion. Subsequently, a new set of 4 images was presented, and all steps were repeated. Image classification was conducted after all training was completed and consisted of 10 new images (i.e., different from those presented in pretest training). They were also projected through a multimedia projector, one at a time. Once the last participant classified each image according to all classification systems, another image was projected. As in the pretest training, participants had access to the written classification schemes at the moment of image classification. Participants were not allowed to review other participants’ ratings or to review their own ratings after they had completed the image classification. After 30 days from the initial rating session, without pretest training, the same 10 images were presented in a different sequence, and the test was conducted once again following the exact same procedures.
Statistical analysis Analysis of categorical items was performed with the Fleiss k coefficient method to calculate the agreement. The following scales were used to evaluate scores: 0.81 to 1.00, almost perfect; 0.61 to 0.80, substantial; 0.41 to 0.60, moderate; 0.21 to 0.40, fair; 0.00 to 0.20, slight; and <0.00, poor.10 To calculate the confidence intervals associated with k values, we used a bootstrapping method based on 1000 randomized samples. All analyses were performed with R Language software (R Core Team 2013),27 specifically laying emphasis to the ‘‘irr’’ package26 used to run the agreement analysis and the ‘‘boot’’ package25 for the bootstrapping.
Reproducible research This paper followed the framework for reproducible research reports.22 Data set (in.csv format) and figures are available in our open
repository.23 Data analysis codes are shared through our GitHub repository (https://github.com/joaovissoci/OlecranonFracture).24 The codes are linked to the data set and functional. All documents are licensed with Creative Commons AttributiondNoncommercial 3.0 License.
Results Overall, the Colton classification was associated with a substantial interobserver agreement (k ¼ 0.67), the Schatzker and AO classifications presented a fair interobserver concordance (k ¼ 0.33 and k ¼ 0.21, respectively), and the Mayo classification found a slight interobserver agreement (k ¼ 0.19) (Table I). Associations can be better observed in Figure 1, demonstrating the agreement in 3 levels (colors black, gray, and white). The squares represent the overplotting between the observed and expected frequencies. The Colton classification shows bigger areas of agreement (black areas), whereas few white spaces are left (showing disagreement). The Schatzker, Mayo and AO classifications show a smaller amount of black areas in the plot, and among them, the Mayo classification is the one that represents less agreement. Intraobserver agreement for the Colton classification was associated with a substantial agreement for both specialists and nonspecialists (k ¼ 0.60 and k ¼ 0.67, respectively).The Schatzker classification was associated with a fair intraobserver agreement for both specialists and nonspecialists (k ¼ 40). The Mayo classification was associated with a fair intraobserver agreement for specialists (k ¼ 0.18), whereas nonspecialists were associated with a moderate intraobserver concordance (k ¼ 0.51). Regarding the AO classification, specialists demonstrated a moderate intraobserver agreement (k ¼ 0.45), whereas nonspecialists presented a slight intraobserver agreement (Fig. 2).
Discussion To our knowledge, this is the first study assessing the intraobserver and interobserver reliability of classification systems for olecranon fractures. Although numerous classification systems have been described for olecranon fractures, none has been universally accepted.8 Our study demonstrated that the Colton classification has substantial intraobserver and interobserver agreement for specialists and nonspecialists. The Schatzker classification
Classification agreement of olecranon fractures
361
Figure 1 Interobserver agreement among specialists and nonspecialists for Colton, Schatzker, Mayo, and AO classifications. N, frequencies of observations displayed at the opposite side of the axis.
revealed a fair agreement for both specialists and nonspecialists. A fair concordance was also found for the Mayo classification. Finally, the AO classification demonstrated a moderate rate of agreement for specialists, whereas nonspecialists presented slight intraobserver agreement. Although there are no previously published studies on intraobserver or interobserver agreement in classifications of olecranon fractures, our findings are consistent with those studies that evaluated the agreement on other bone fractures. Specifically, our findings corroborate studies evaluating the classification of distal radius fractures by AO and Mayo systems, in which a fair concordance was obtained for these classifications.1,9,11 Walton et al21 tested the reliability of Schatzker and AO classifications for tibial plateau fractures and found fair interobserver agreement for the Schatzker classification and moderate reliability for the AO classification. This poor agreement for the AO classification could be explained by its complexity. The AO/ASIF is a more complex system that involves the radius as well as associated ulnar injuries and
ligamentous injuries. These added variables presumably have a negative impact on its reliability.11 In our study, the surgeon’s expertise was determinant on intraobserver agreement for the AO classification as indicated by the high k coefficient for specialists. The Mayo classification system showed low reliability, which could be explained by the lack of familiarity of the surgeons in our institution with the Schatzker and Mayo systems. Nevertheless, the Mayo classification is easy to memorize, is easily reproducible, and infers prognosis by indirectly describing the situation of the elbow’s ligament. On the other hand, the Schatzker classification is more difficult to memorize and purely descriptive, based on the pattern of fracture. As it considers the type of internal fixation required according to the fracture lines, it has the possibility of inferring prognosis. The Colton classification is simple and descriptive, and the prognosis is indicated according to injury pattern.6,14 It is commonly used in our institution, which led us to substantial agreement. Familiarity with a specific classification system could positively affect the results.20
362
C.A. Benetton et al.
Figure 2 Intraobserver agreement measurements (percentage of agreement and k values/95% confidence intervals) for Colton, Schatzker, Mayo, and AO classifications. Sp, specialists; NonSp, nonspecialists.
Although our study is unique, image standardization limits the generalizability of our results. Our effort to exclude radiographs with poor contrast or positioning does not reflect the way in which films are used in actual practice and can have artificially improved the overall observer agreement.
Conclusions Among the classification systems compared in this study, Colton was the one with the best agreement. The AO classification presented better agreement among specialists, probably because of its complexity. Therefore, no classification system is universally accepted, and every classification system is subject to interobserver variability, ultimately raising questions about the suitability of its use in a research as well as in a clinical context. Learning scales that are more reliable and reproducible can ensure more predictable treatment results. Evaluating the reproducibility of the
classifications can help surgeons choose one that fits better to their needs and help them to follow a standardized model to diagnosis and treatment. These can indeed help in the selection of treatment, improving outcomes, and can also establish better prognosis. The development of a more thorough method for evaluating all components of a classification system is therefore one of the suggested next steps in the field. The evaluation of changes in planned treatment secondary to reclassification is also a suggestion to future research.
Disclaimer The authors, their immediate families, and any research foundation with which they are affiliated have not received any financial payments or other benefits from any commercial entity related to the subject of this article.
Classification agreement of olecranon fractures
References 1. Andersen DJ, Blair WF, Steyers CM Jr, Adams BD, el-Khouri GY, Brandser EA. Classification of distal radius fractures: an analysis of interobserver reliability and intraobserver reproducibility. J Hand Surg Am 1996;4:574-82. 2. Anderson ML, Larson AN, Merten SM, Steinmann SP. Congruent elbow plate fixation of olecranon fractures. J Orthop Trauma 2007;6: 386-93. http://dx.doi.org/10.1097/BOT.0b013e3180ce831e 3. Bailey CS, MacDermid J, Patterson SD, King GJ. Outcome of plate fixation of olecranon fractures. J Orthop Trauma 2001;8:542-8. 4. Belloti JC, Tamaoki MJ, Franciozi CE, Santos JB, Balbachevsky D, Chap Chap E, et al. Are distal radius fracture classifications reproducible? Intra and interobserver agreement. Sao Paulo Med J 2008;3: 180-5. http://dx.doi.org/10.1590/S1516-31802008000300008 5. Chehab EL, Toro JB, Helfet DL. The management of fractures of the elbow joint in athletes: review article. Int Sport Med J 2005;2:84-98. 6. Colton CL. Fractures of the olecranon in adults: classification and management. Injury 1973;52:121-9. 7. Garbuz DS, Masri BA, Esdaile J, Duncan CP. Classification systems in orthopaedics. J Am Acad Orthop Surg 2002;4:290-7. 8. Hak DJ, Golladay GJ. Olecranon fractures: treatment options. J Am Acad Orthop Surg 2000;4:266-75. 9. Kural C, Sungur I, Kaya I, Ugras A, Ert€urk A, Cetinus E. Evaluation of the reliability of classification systems used for distal radius fractures. Orthopedics 2010;11:801. http://dx.doi.org/10.3928/01477447-20100924-14 10. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;1:159-74. 11. Matsunaga FT, Tamaoki MJ, Cordeiro EF, Uehara A, Ikawa MH, Matsumoto MH, et al. Are classifications of proximal radius fractures reproducible? BMC Musculoskelet Disord 2009;10:120. http://dx.doi. org/10.1186/1471-2474-10-120 12. Morrey BF. Current concepts in the treatment of fractures of the radial head, the olecranon, and the coronoid. Instr Course Lect 1995;44:175-85. 13. Mueller ME, Allgower M, Schneider R, Willenegger H. Manual of internal fixation: techniques recommended by the AO-ASIF group. Berlin: Springer; 1991. p. 142-3 (ISBN No. 978-3-662-02695-3).
363 14. Newman SDS, Mauffrey C, Krikler S. Olecranon fractures. Injury 2009;6:575-81. http://dx.doi.org/10.1016/j.injury.2008.12.013 15. Pervez H, Parker MJ, Pryor GA, Lutchman L, Chirodian N. Classification of trochanteric fracture of the proximal femur: a study of the reliability of current systems. Injury 2002;33:713-5. http://dx.doi.org/ 10.1016/S0020-1383(02)00089-X 16. Rommens PM, Schneider RU, Reuter M. Functional results after operative treatment of olecranon fractures. Acta Chir Belg 2004;2: 191-7. 17. Schatzker J. Fractures of the olecranon. In: Schatzker J, Tile M, editors. The rationale of operative fracture care. Berlin: Springer; 2005. p. 123-30 (ISBN No. 978-3-662-02485-0). 18. Shepherd LE, Zalavras CG, Jaki K, Shean C, Patzakis MJ. Gunshot Femoral shaft fractures: is the current classification system reliable? Clin Orthop Relat Res 2003;408:101-9. 19. Veillette CJ, Steinmann SP. Olecranon fractures. Orthop Clin North Am 2008;2:229-36. http://dx.doi.org/10.1016/j.ocl.2008.01.002 20. Wainwright AM, Williams JR, Carr AJ. Interobserver and intraobserver variation in classification systems for fractures of the distal humerus. J Bone Joint Surg Br 2000;5:636-42. 21. Walton NP, Harish S, Roberts C, Blundell C. AO or Schatzker? How reliable is classification of tibial plateau fractures? Arch Orthop Trauma Surg 2003;8:396-8. http://dx.doi.org/10.1007/s00402-0030573-1 22. Website: A framework for reproducible, interactive research: application to health and social science. Available at: http://arxiv.org/abs/ 1304.5688. Accessed April 02, 2013. 23. Website: Figsharedcredit for all your research. Available at: http://dx. doi.org/10.6084/m9.figshare.101990. Accessed January 13, 2013. 24. Website: GitHubdbuild software better, together. Available at: https://github.com/joaovissoci/OlecranonFracture. Accessed January 13, 2013. 25. Website: Package ‘‘boot’’. Available at: http://cran.r-project.org/web/ packages/boot/boot.pdf. Accessed December 04, 2012. 26. Website: Package ‘‘irr.’’ Available at: http://cran.r-project.org/web/ packages/irr/irr.pdf. Accessed December 04, 2012. 27. Website: The R project for statistical computing. Available at: http:// www.R-project.org/. Accessed December 04, 2012.