Journal Pre-proof Radiographic assessment of the subtalar joint: an evaluation of the Kellgren-Lawrence scale and proposal of a novel scale
David Vier, Thomas Louis, Daniel Fuchs, Christian T. Royer, Jacob R. Zide, David E. Jaffe PII:
S0899-7071(19)30248-7
DOI:
https://doi.org/10.1016/j.clinimag.2019.08.009
Reference:
JCT 8784
To appear in:
Clinical Imaging
Received date:
17 May 2019
Revised date:
25 July 2019
Accepted date:
15 August 2019
Please cite this article as: D. Vier, T. Louis, D. Fuchs, et al., Radiographic assessment of the subtalar joint: an evaluation of the Kellgren-Lawrence scale and proposal of a novel scale, Clinical Imaging(2019), https://doi.org/10.1016/j.clinimag.2019.08.009
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier.
Journal Pre-proof
Radiographic assessment of the subtalar joint: an evaluation of the Kellgren-Lawrence scale and proposal of a novel scale David Vier, MDa, Thomas Louis, MDb, Daniel Fuchs, MDac, Christian T. Royer, MDa, Jacob R. Zide, MDa, David E. Jaffe, MDad
ro -p
b Baylor University Medical Center at Dallas Department of Radiology 3500 Gaston Avenue Dallas, TX 75246 USA
of
a Baylor University Medical Center at Dallas Department of Orthopedics 3500 Gaston Ave 6 Hoblitzelle Dallas, TX 75246 USA
lP
re
c Rothman Institute Orthopaedics (Present address) 925 Chestnut St Philadelphia, PA 19107 USA
Jo
ur
Corresponding Author: David Vier
[email protected] 281-636-1892
na
d Arizona Bone and Joint Specialists (Present address) OrthoArizona 5620 E Bell Road Scottsdale, AZ 85254 USA
Declarations of interest: none
Journal Pre-proof Radiographic assessment of the subtalar joint: an evaluation of the KellgrenLawrence scale and proposal of a novel scale
Abstract Objective To evaluate the reliability of grading subtalar (ST) arthrosis on lateral weightbearing radiographs in a heterogenous patient population using the Kellgren-Lawrence (KL) scale, correlate these findings to advanced imaging (CT and/or MRI), and to validate a novel scale.
-p
ro
of
Materials and Methods A random collection of 40 lateral weightbearing radiographs presenting to a foot and ankle clinic were reviewed by nine multi-disciplinary independent reviewers. Interobserver reliability was assessed for KL scores. A musculoskeletal radiologist graded available advanced imaging on all 40 radiographs and the advanced imaging scores were correlated to the radiographic scores. A novel scoring system was created and tested for interobserver reliability.
lP
re
Results There was overall fair reliability amongst reviewers with the traditional KL score, kappa = 0.26. The best agreement was seen amongst those deemed to have a grade 0, with only moderate agreement (k = 0.50). There was only fair interobserver reliability with severe, Grade 4 scores (k= 0.28). Radiographic scores did have moderate correlation with advanced imaging (r=0.56). A new, simple grading system was proposed and its interobserver reliability was improved substantially (kappa =0.68).
ur
na
Conclusions The KL scoring system is not applicable to the subtalar joint. The new NSS grading system has improved reliability. Radiographs only had moderate correlation to advanced imaging. Further studies are warranted to correlate clinically.
Jo
Keywords Subtalar Joint; Arthrosis; Radiographic Classification; Arthritis This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
1. Introduction Subtalar (ST) arthritis is commonly diagnosed in the clinical setting following a history of trauma, tibiotalar arthrodesis, or rheumatologic conditions. Posttraumatic subtalar arthritis has been shown to be present in as high as 38% of patients after calcaneal fractures, and to require arthrodesis in as many as 28% of those treated surgically.[1,2] Up to 81% of patients following talar neck fractures develop subtalar arthritis.[3] After ankle arthrodesis, reported rates of secondary ST arthritis range from 44 to 93%.[4,5] Subtalar arthritis rates are as high as 32% in patients with rheumatoid arthritis.[6,7] With
Journal Pre-proof this high frequency of disease, a reliable radiographic scoring system would be helpful for tracking disease progression. The diagnosis is routinely made by correlating patients’ presenting symptoms and physical examination signs with weightbearing radiographs. Radiographic arthrosis of the subtalar joint is generally evaluated on lateral foot and/or ankle radiographs by assessing joint space narrowing, bony sclerosis, subchondral cysts, and osteophytes. However, the undulating surface of the subtalar joint is often difficult to clearly visualize on plain radiographs since the orientation of the articular surfaces are often not perpendicular to the x-ray. The differential diagnosis of lateral hindfoot pain is wide and includes additional etiologies beyond subtalar arthritis. Advanced imaging such as CT or MRI can be helpful to isolate the true etiology of patients’ presenting symptoms when the severity of radiographic arthrosis is questionable.
lP
re
-p
ro
of
The Kellgren-Lawrence (KL) grading scale was developed for grading severity of osteoarthritis with the best reliability when applied to the knee and hand, but had lower intraobserver reliability for many other joints.[8–10] Despite its limitations, it has been historically generalized to multiple joints, including the subtalar joint. Previous studies have shown slight to moderate interobserver reliability with marginally better intraobserver reliability of this grading system in patients with posttraumatic subtalar arthritis and following total ankle replacement (TAR). The Paley Grading System (PGS) has also been used with even worse results.[1,11–14] These studies represent highly specific patient populations often with other concomitant deformity and distracting findings.
ur
na
Without a reliable classification of subtalar arthrosis, grading severity of subtalar arthritis remains a clinical challenge. We sought to evaluate the reliability of the KL score applied to the subtalar joint in a heterogenous population of patients presenting to a foot and ankle clinic, and to see how strong of a correlation existed to the interpretation of subtalar arthrosis on advanced imaging. In addition, we aimed to propose and validate an alternative scoring system with increased reliability and clinical relevance.
Jo
2. Materials and Methods 2.1 Study Population
A collection of 40 lateral weightbearing ankle radiographs at a foot and ankle orthopedic clinic were selected at random for review and were stripped of identifying patient information. Patients were excluded if they did not have advanced imaging correlates (CT or MRI), had open physes, had a history of hindfoot fusion, and/or had a diagnosis of subtalar coalition. 2.2 Radiograph evaluation Six Foot and Ankle fellowship trained orthopedic surgeons and three MSK fellowship trained radiologists reviewed the radiographs for subtalar arthrosis. The reviewers were asked to evaluate the joint with the KL scale (graded 0 to 4, see Table 1). They were provided the description of the scale and allowed to use it as a reference while they evaluated the radiographs. Every grader was blinded from the identity and history of the patient, and the compiler of radiographs was not included as a reviewer.
Journal Pre-proof In a separate sitting three months later, three Foot and Ankle orthopedic surgeons and an MSK radiologist repeated the grading of the radiographs using a modified, novel scale (grade 0 to 2, see Table 2). Examples of each modified score are displayed in Figure 1. This scale was developed with an intention of having a simple and reliable system to allow high reproducibility and potentially carry clinical significance. It was created by a combination of aggregating the higher and lower KL scores together and simplifying the qualifications for each score that can easily be questioned when using the traditional KL score. This new grading system is referred to as the None, Some, Severe (NSS) scale. The radiographs were randomized and a significant delay in scoring was used to prevent any familiarization from their previous scoring. 2.3 Advanced imaging evaluation
-p
ro
of
In addition, a single MSK fellowship trained radiologist graded the severity of subtalar arthrosis on available advanced imaging (either CT, MRI, or both) using the original Kellgren-Lawrence scale. Again, the reviewer was blinded to the identity of the patient, their history, radiograph, and their chief complaint. The grade of subtalar arthrosis on advanced imaging was correlated to the independent graders to assess accuracy and validity of radiographic evaluation. 2.4 Statistical Analysis
lP
re
Interobserver reliability was assessed amongst all the reviewers by using a marginal kappa statistic. After original grading using the classic KL, the interobserver reliability was determined among all radiographs. Reliability was then recalculated after grouping higher and lower scores together. Using this information, the NSS scale was proposed and tested and new interobserver reliability was calculated.
3. Results
Jo
ur
na
Correlation of radiographic severity to advanced imaging was performed using a Spearman’s correlation test. For each patient reviewed, an average radiographic score amongst all reviewers was determined. This average score was correlated to the advanced imaging score to determine the relationship of advanced imaging score to its radiographic counterpart. In addition, it was assessed how often the radiographic reviewers chose the same grade as the advanced imaging score.
The demographics of all 40 patients selected are demonstrated in Table 3. The average age of the patient was 51 (range, 16 to 75) years old. All reviewers completed the evaluation of all 40 patients. Overall, interobserver reliability across all graders was fair (k=0.26). There was moderate agreement amongst reviewers with patients deemed to have zero subtalar arthrosis (k = 0.50). There was fair interobserver reliability with Grade 4 scores (k= 0.28). Grouping Grades 3 and 4 together provided a slight increase in scoring reliability (k= 0.46). There remained only fair agreement when grades 1-2 (k = 0.31) and grades 1-3 (k =0.37) were combined. All 40 patients had advanced imaging correlates, which included 30 CTs and 10 MRIs. The scores (0 to 4) on advanced imaging had a moderate correlation with radiographic scores, (r=0.56, p=0.00015), meaning that higher radiographic scores were indeed associated with a higher score on advanced imaging. However, on average, the nine reviewers only recorded the same score as the advanced imaging score on 15.1
Journal Pre-proof radiographs (37.8%, range 11 to 18). Reviewers were within +/- one grade difference 82.0% of the time (range 29 to 37 within 1 score). Radiographic scores were off by at least two grades from the advanced imaging correlate 18.0% of the time. After being instructed on the NSS scale, three orthopedic attendings and an MSK radiologist graded the same 40 lateral weightbearing radiographs. The overall interobserver reliability amongst these reviewers was improved with a substantial interobserver reliability, kappa = 0.68. 4. Discussion Our findings demonstrate that there is minimal agreement on disease severity based on lateral weightbearing radiographs with the KL scoring system therefore it should not be applied to the subtalar joint clinically or for research purposes.
lP
re
-p
ro
of
The agreement between observers’ measurements, and thus reliability, are most widely judged by the kappa statistic. To describe how reliable a measurement is, arbitrary cutoffs are made to create ranges of these kappa values. The most used is Landis and Koch with six categories ranging from poor to almost perfect, but some argue that these are not as clinically relevant as other criteria.[15] Some consider three categories including poor, fair to good, and excellent to be more clinically applicable.[16] Others recommend analyzing the actual kappa value as opposed to solely relying on these descriptions of reliability using arbitrary cutoffs.[17] Regardless of the categorization of the kappa statistic, our work can safely conclude that the Kellgren-Lawrence scale is not a reliable means to report the severity of subtalar arthrosis. The new proposed NSS scale had much better interobserver reliability and can be easily administered and applied in a clinical or research setting.
Jo
ur
na
Importantly, it can be extrapolated that a poorly reliable grading system cannot be used to attempt to correlate clinical disease. It would be impossible to claim that a patient with a KL score of 2 would be likely to have more subtalar symptoms than a KL score of 3, especially given that reviewers cannot reliably distinguish between the two. Recent studies have used KL to suggest positive outcomes and treatments; however, utilizing a classification without adequate reliability based on these results is not ideal and can lead to inaccurate conclusions.[14] These classifications must be modified and improved until their intraobserver and interobserver reliability is validated.[17,18] We have demonstrated that our modified scale for scoring subtalar arthrosis severity on lateral weightbearing radiographs can be administered easily and has acceptable interobserver reliability. It was designed to be simple and reproducible, with three grades such that observers would be able to easily and reliably categorize the severity of arthrosis as none, some, or severe. This scoring system certainly has its limitations. With less possible scores, the accuracy for assessing true disease is more limited. This is an unfortunate consequence of using a two-dimensional radiograph to try to define the anatomy of this unique joint. In addition, having less possible scores may limit the ability to detect statistical differences in smaller study designs. Nevertheless, our scale is reliable, and the authors believe there is likely clinical significance between each score. It is likely that if a patient had a modified score of zero at an index time and progressed to modified score of one (or two) after a certain period of time or intervention that this
Journal Pre-proof would likely represent a clinically significant change. Further work will be needed to validate this conclusion.
lP
re
-p
ro
of
Because of the lack of bony detail and accuracy that is often associated with assessment of subtalar radiographic analysis, advanced imaging plays an important role in clinical management. Both CT and MRI allow for improved overall assessment of the classic features of arthrosis in many different planes and slices of the subtalar joint. Previous studies used coronal slices at the posterior facet of the subtalar joint to analyze posttraumatic arthritis.[19–22] CT has also been used to correlate subtalar alignment to type of osteoarthritis in the ankle joint, and simulated weightbearing CT scans to correlate deformity to Kellgren-Lawrence arthrosis on radiographs.[23,24] To our knowledge there has not been any correlation of severity of subtalar arthrosis on lateral weightbearing radiographs to that observed on advanced imaging. This study did demonstrate a correlation between radiographic review and advanced imaging, though scoring was not necessarily accurate. Figure 2 demonstrates an example of a patient that was thought to have minimal arthritic change on radiographs, but the CT scan demonstrated more arthritic disease than originally appreciated. It is important to note that radiographic analysis is performed while weightbearing, perhaps accentuating arthritic change. The advanced imaging in this study was not performed during stance, perhaps undervaluing the true extent of joint degeneration. It is important to note that CT and MRI scans are typically not performed while weightbearing. There may be increased availability for weightbearing CT scans and there may be a role for this technology when assessing arthritic hindfoot deformity, but the cost and availability of this technology may be prohibitive.
Jo
ur
na
Because of the potential difficulty in assessing the extent of radiographic arthrosis in the subtalar joint, the clinical history and physical examination are critical in diagnosis. Accurately palpating the subtalar joint and isolating hindfoot motion are vital. However, in some circumstances it can be difficult to distinguish between subtalar, ankle, peroneal, or other pathology. If there is suspicion for subtalar pain and any evidence of radiographic findings consistent with subtalar arthritis, correlation with advanced imaging may be necessary to outline the severity of the disease process or assess for other potential pain generators. The authors stress the importance of not reflexively ordering advanced imaging and we strongly recommend empiric treatment based on clinical judgment prior to ordering CT or MRI. There may also be a role for diagnostic and/or therapeutic injection prior to ordering any expensive imaging. Advanced imaging should be ordered to confirm a diagnosis if no progress is made with initial empiric treatment or for potential operative planning. We have developed an algorithm regarding the use of advanced imaging following radiographic screening (Figure 3). Our study has its limitations, but it is the first to thoroughly explore the radiographic assessment of lateral weightbearing radiographs in a diverse patient population. The patients chosen at random for this study were patients that presented to an orthopedic foot and ankle clinic and therefore do not represent a truly random patient population. Another potential weakness is that we only included patients with advanced imaging correlates, which possibly led to selection of patients with more severe disease. In addition, although the authors believe the NSS scale would have clinical relevance, this study does not
Journal Pre-proof provide any data to validate this claim. Another potential weakness is the lack of adjunctive specialty views of the hindfoot. A Broden’s view, for example, can be used to more accurately visualize the subtalar joint and may provide better radiographic reliability. This specialty view is not routinely obtained in our clinic, but can certainly be employed to gather more information in the setting of clinical suspicion for subtalar pathology. 5. Conclusion
of
Overall, the Kellgren-Lawrence scale is unreliable for assessment of the severity of subtalar arthrosis on lateral weightbearing radiographs and carries no clinical correlation. There was moderate correlation between radiographic severity of disease and advanced imaging. Our proposed NSS scoring system is reproducible and carries potential clinical implications and relevance.
ro
References
Paley D, Hall H. Intra-Articular Fractures of the Calcaneus. J Bone Jt Surg 1993.
[2]
Sanders R, Vaupel ZM, Erdogan M, Downes K. Operative Treatment of Displaced Intraarticular Calcaneal Fractures. J Orthop Trauma 2014;28:551–63. doi:10.1097/BOT.0000000000000169.
[3]
Dodd A, Lefaivre KA. Outcomes of Talar Neck Fractures: A Systematic Review and Meta-analysis. J Orthop Trauma 2015;29:210–5. doi:10.1097/bot.0000000000000297.
[4]
Ahlberg A, Henricson AS. Late results of ankle fusion. Acta Orthop Scand 1981;52:103–5.
[5]
Coester LM, Saltzman CL, Leupold J, Pontarelli W. Long-Term Results Following Ankle Arthrodesis for Post-Traumatic Arthritis 2001.
[6]
Chan P-SJ, Kong KO. Natural history and imaging of subtalar and midfoot joint disease in rheumatoid arthritis. Int J Rheum Dis 2013;16:14–8. doi:10.1111/1756185X.12035.
[7]
VAINIO K. The rheumatoid foot; a clinical study with pathological and roentgenological comments. Ann Chir Gynaecol Fenn Suppl 1956;45:1–107.
[8]
Kellgren JH, Lawrence JS. Radiological Assessment of Osteo-arthrosis. Ann Rheum Dis 1957:494–502.
[9]
KELLGREN JH, MOORE R. Generalized osteoarthritis and Heberden’s nodes. Br Med J 1952;1:181–7. doi:10.1136/BMJ.1.4751.181.
Jo
ur
na
lP
re
-p
[1]
[10] Brandt KD, Fife RS, Braunstein EM, Katz B. Radiographic grading of the severity of knee osteoarthritis: relation of the Kellgren and Lawrence grade to a grade based on joint space narrowing, and correlation with arthroscopic evidence of articular cartilage degeneration. Arthritis Rheum 1991;34:1381–6. [11] Mayich DJ, Pinsker E, Mayich MS, Mak W, Daniels TR. An Analysis of the Use
Journal Pre-proof of the Kellgren and Lawrence Grading System to Evaluate Peritalar Arthritis Following Total Ankle Arthroplasty. Foot Ankle Int 2013;34:1508–15. doi:10.1177/1071100713495379. [12] de Muinck Keizer R-JO, Backes M, Dingemans SA, Goslings JC, Schepers T. Post-traumatic subtalar osteoarthritis: which grading system should we use? Int Orthop 2016;40:1981–5. doi:10.1007/s00264-016-3236-x. [13] Kraus VB, Kilfoil TM, Hash TW, McDaniel G, Renner JB, Carrino JA, et al. Atlas of radiographic features of osteoarthritis of the ankle and hindfoot. Osteoarthr Cartil 2015. doi:10.1016/j.joca.2015.08.008.
of
[14] Dekker TJ, Walton D, Vinson EN, Hamid KS, Federer AE, Easley ME, et al. Hindfoot Arthritis Progression and Arthrodesis Risk After Total Ankle Replacement. Foot Ankle Int 2017;38:1183–7. doi:10.1177/1071100717723130.
ro
[15] Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.
-p
[16] Svanholm H, Starklint H, Gundersen HJ, Fabricius J, Barlebo H, Olsen S. Reproducibility of histomorphologic diagnoses with special reference to the kappa statistic. APMIS 1989;97:689–98.
re
[17] Garbuz DS, Masri BA, Esdaile J, Duncan CP. Classification systems in orthopaedics. J Am Acad Orthop Surg n.d.;10:290–7.
lP
[18] Wright JG, Feinstein AR. Improving the reliability of orthopaedic measurements. J Bone Joint Surg Br 1992;74:287–91.
na
[19] Lee TH, Wapner KL, Mayer DP, Hecht PJ. Computed Tomographic Demonstration of the Vacuum Phenomenon in the Subtalar and Tibiotalar Joints. Foot Ankle Int 1994;15:382–5. doi:10.1177/107110079401500707.
ur
[20] Bradley SA, Davies AM. Computed tomographic assessment of old calcaneal fractures. Br J Radiol 1990;63:926–33. doi:10.1259/0007-1285-63-756-926.
Jo
[21] Bower BL, Keyser CK, Gilula LA. Rigid subtalar joint—a radiographic spectrum. Skeletal Radiol 1989;17:583–8. doi:10.1007/BF02569405. [22] Stephens HM, Sanders R. Calcaneal Malunions: Results of a Prognostic Computed Tomography Classification System. Foot Ankle Int 1996;17:395–401. doi:10.1177/107110079601700707. [23] Krähenbühl N, Tschuck M, Bolliger L, Hintermann B, Knupp M. Orientation of the Subtalar Joint. Foot Ankle Int 2016;37:109–14. doi:10.1177/1071100715600823. [24] Greisberg J, Hansen ST, Sangeorzan B. Deformity and Degeneration in the Hindfoot and Midfoot Joints of the Adult Acquired Flatfoot. Foot Ankle Int 2003;24:530–4. doi:10.1177/107110070302400704.
Journal Pre-proof
Table 1. Kellgren-Lawrence Grading Scale for Osteoarthrosis Grade 0
Characteristics No radiographic features of osteoarthrosis are present Doubtful joint space narrowing and possible osteophytic lipping
2
Definite osteophytes and possible joint space narrowing
3
Multiple osteophytes, definite joint space narrowing, some sclerosis, and possible bony deformity
4
Large osteophytes, marked joint space narrowing, severe sclerosis, and definite bony deformity
of
1
ro
Table 2. NSS Grading Scale for Subtalar Osteoarthrosis
Characteristics Definitively no evidence of subtalar joint arthrosis. There is a clearly wide joint space with no asymmetry, sclerosis, or osteophytes.
1 (Some)
There is abnormality of subtalar joint implying possible arthrosis of the subtalar joint. There is suggestion of joint space narrowing or irregularity, sclerosis, or marginal osteophyte formation at any location along the subtalar joint.
2 (Severe)
Severe and obvious arthrosis of the subtalar joint with marked joint space narrowing, severe bony sclerosis, subchondral cysts, and large osteophyte formation.
na
lP
re
-p
Grade 0 (None)
Table 3. Patient Demographics Male Female
ur
Demographics
Sex
Jo
Age Total Ankle Replacements Other Ankle Hardware Chief Complaint Midfoot pain Hindfoot pain Ankle pain
19 21 51 (range, 16 to 75) 6 7 2 5 33
Journal Pre-proof
ro
of
Figure 1. Proposed Grading System Examples. Examples from the database used for subtalar arthrosis grading in the study. Grade 0 represents no arthrosis, Grade 1 represents mild to moderate arthrosis, and Grade 2 represents severe arthrosis.
Jo
ur
na
lP
re
-p
Figure 2: CT imaging correlation with radiographic imaging. Figure 2A shows an example of the lateral weightbearing radiograph of a patient included in the study. Figures 2B, 2C, and 2D are sagittal, coronal, and axial CT slices, respectively, of the same patient revealing more detailed areas of arthrosis (arrows represent most affected areas) not as clearly visualized on the lateral radiograph.
Journal Pre-proof
Jo
ur
na
lP
re
-p
ro
of
Figure 3. Algorithm for Advanced Imaging of Subtalar Joint Osteoarthrosis Using Proposed Scale.
Journal Pre-proof
Highlights
ur
na
lP
re
-p
ro
of
The Kellgren-Lawrence scale is not reliable for the subtalar joint Radiographic subtalar arthrosis had moderate correlation with advanced imaging The new NSS scale has improved reliability for the severity of subtalar arthrosis
Jo