Reproducibility of volumetric measurements on maxillary sinuses

Reproducibility of volumetric measurements on maxillary sinuses

Int. J. Oral Maxillofac. Surg. 2011; 40: 195–199 doi:10.1016/j.ijom.2010.10.008, available online at http://www.sciencedirect.com Research Paper Imag...

190KB Sizes 0 Downloads 31 Views

Int. J. Oral Maxillofac. Surg. 2011; 40: 195–199 doi:10.1016/j.ijom.2010.10.008, available online at http://www.sciencedirect.com

Research Paper Imaging

Reproducibility of volumetric measurements on maxillary sinuses

R. Kirmeier1, C. Arnetzl1, T. Robl2, M. Payer1, M. Lorenzoni3, N. Jakse1 1 Department of Oral Surgery and Radiology, School of Dentistry, Medical University of Graz, Austria; 2Department of Radiology, Medical University of Graz, Austria; 3 Department of Prosthodontics, School of Dentistry, Medical University of Graz, Austria

R. Kirmeier, C. Arnetzl, T. Robl, M. Payer, M. Lorenzoni, N. Jakse: Reproducibility of volumetric measurements on maxillary sinuses. Int. J. Oral Maxillofac. Surg. 2011; 40: 195–199. # 2010 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved. Abstract. Although computer assisted volumetric quantification of human maxillary sinuses is commonly used to measure volumetric changes during life, reliability data for this procedure are lacking. The objective of this retrospective study is to test a semi-automatic virtual volumetric analysis technique on 36 CT scans of human maxillary sinuses. Three examiners with different clinical experience performed all measurements in three replicates. As principle of proof, the technique was examined on 12 phantoms with known volumes. The validation of the method revealed that the mean relative error was 0.364%. For the retrospective volumetric measurements from maxillary sinuses the intra- and inter-examiner agreement was quantified using appropriate intraclass correlation coefficients (ICC 1,k and ICC 2,k) and the Bland–Altman analysis. ICC values ranging from 0.997 to 0.999 indicate almost perfect agreement for intra- and inter-examiner data. The Bland–Altman analysis demonstrated good intra- as well as inter-examiner agreement for the two proficient examiners and a lack of agreement for the untrained examiner. It can be concluded that this measurement procedure using CT scans could be strongly recommended for clinical application to determine the volume of human maxillary sinuses reliably.

The bone-encompassed luminal anatomy of the paired maxillary sinuses has proven to be a complex quadrangular pyramid, with the nasal wall as its base and the apex pointing toward the zygomatic process10. For morphometric studies of maxillary sinuses, invasive methods such as injection of various materials15,20,23 or serial sections of cadavers22 have been superseded by virtual reality techniques. Plain two-dimensional (2D) radiography is unsuitable for volume analysis of maxillary sinuses1. With the introduction of computer-assisted 0901-5027/020195 + 05 $36.00/0

tomographic (CT) imaging in 19738, threedimensional (3D) anatomical information has become available1,2,10,11. As real values cannot be obtained directly from living patients CT datasets are the only clinically available information on maxillary sinus size. Until now, one of the main obstacles has been a lack of information on the influence of instrumental, physical and human limitations, which could cause measurements to deviate from the ‘true’ value19. Validation studies to determine the experimental uncertainty are crucial

Keywords: maxillary sinus; volumetric analysis; computed tomography; reliability; inter- and intra-examiner agreement.. Accepted for publication 8 October 2010 Available online 11 November 2010

in the assessment of volumetric analysis techniques19. An advantage of CT datasets over invasive methods is that repeated measurements can be performed in a ‘virtual patient’ without destruction of the object that is to be quantified19. Only a few groups have tested volumetric analysis techniques with a phantom model6,16,18,23. Most CT measurement studies only determined the metric variables of the maxillary sinus and correlated these data to age, race, body height, cranial morphology and to the volume of other

# 2010 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.

196

Kirmeier et al.

paranasal sinuses1,2,10,11,23. None of the studies give data on intra- and interexaminer agreement. With increasing requirements for objective measurement procedures it should be of particular interest for clinicians and researchers to ensure that the experimental uncertainty of CTbased volume determination is known. Potential applications of this method lie in studies including surveys of maxillary sinus volumes in larger populations and monitoring the volume change after intentional manipulation in the maxillary sinus, for example internal sinus augmentation or extraction of teeth associated with the antrum. The objective of this retrospective study was to evaluate the applicability of a virtual analysis technique to quantify human maxillary sinus volumes. Another aim was to test the intra- and inter-examiner reproducibility of this technique for the first time. Materials and methods

[()TD$FIG]

A validation of the method with well characterized phantoms was conducted to test the experimental procedure in advance. The geometrically complex anatomy of the maxillary sinuses was mimicked by moulding (ImpregumTM, 3M Espe, Seefeld, Germany) two different rubber ducks’ heads (Fig. 1). Target volumes of 13, 17 and 21 cm3 each were prepared using ‘class AS’ one-mark pipettes, (Brand GmbH, Wertheim, Germany) and poured into each of the two moulds. The solution contained 100 ml iopromide 300 mg/ml (Ultravist1, Schering AG, Berlin, Germany) diluted in 100 ml tap water. The phantom set comprises 12 different simulated maxillary sinuses. The clinical part of this investigation was conducted as a retrospective study. CT data

acquisition and volume determination were performed as described below. The protocol was authorized by the Ethics Committee of the Medical University of Graz, Austria (Nr. 19-018ex07/08). Sample size calculation was based on a preliminary series of 10 maxillary sinuses (60 readings). The standard deviation between three observers was determined using a paired t-test of equivalence of means, taking half the standard deviation of 5.000 as the limit for the equivalence-difference, with alpha set at 0.05 and power at 80%. When the expected mean difference is 0.200, the minimal number of samples should comprise 31 maxillary sinuses. CT datasets of patients who had CT scans for diagnostic reasons in the ear nose and throat (ENT) region between 2003 and 2007 were retrieved from the database of the Department of Radiology, Medical University of Graz. Selection criteria were that CT data acquisition was performed exclusively by spiral CT technology (Somatom1 plus 4 CT Scanner, Siemens AG, Medical Solutions, Erlangen, Germany) using 120 kVp and 150 mA. Further inclusion criteria were that all patients were male, aged 20–30 years at the time of the CT scan, with complete dentition (without consideration of wisdom teeth). All subjects with evidence of sinus pathology such as inflammation, cysts or tumour mass within or adjacent to the maxillary antrum as well as a history of paranasal sinus surgery or maxillofacial trauma were excluded from the study. If one of the two maxillary sinuses was scanned incompletely, that CT dataset was discarded. All included CT scans were anonymized. The volume of the maxillary sinus was defined as the bone-encompassed space comparable to the cavity in a dry skull. All volume determinations are based on semi-automated computer calculations

Fig. 1. CT image in the sagittal projection of the two geometrically different maxillary sinus phantoms used for the calibration studies. Both samples comprise 21 cm3 of radiopaque liquid. The selected density threshold coefficients were 1000 to +300 Hounsfield units (HU).

using commercial programs and equipment. The original CT datasets were transferred to an independent workstation (Ultra 10-workstation, Sun Microsystems, Santa Clara, CA, USA) in the DICOM format and evaluated in random order by one undergraduate dental student (E1), one well experienced radiologist (E2) and one well experienced oral surgeon (E3). Specific tools of the commercial graphics program (Somaris1 Sienet Magic View 10001, VB32B, Siemens AG, Medical Solutions, Erlangen, Germany) can generate 2D and 3D reconstructions for image processing and volume calculation. The 3D region of interest of each sample was defined by using the ‘draw’ function to trace the perimeter of the radiopaque liquid or the bony boundary of the maxillary sinuses on each consecutive axial CT slice displayed on the computer monitor. Appropriate density threshold coefficients, expressed in Hounsfield units (HU), were selected to exclude impression material or bony structures as well as dental roots (HU 1000 to +300). Thus, in the 3D template only radiopaque liquid or air and soft tissue (mucous membrane) were extracted by the volume rendering function of the program. The volume of the region of interest was calculated automatically. All examiners were trained to use this standardized study procedure and the software tools. They performed measurements in a darkened room independently of each other and blinded to previous readings. Each measurement cycle was performed three times with a minimum interval of 2 weeks between sessions to minimize personal memory effects. Data were analysed on a personal computer using Excel 2000 (Microsoft Corporation, Redmond, WA, USA), nQuery 5.0 (Statsol, Cork, Ireland) and SPSS 14.0 (SPSS Inc., Chicago, IL, USA) on a Microsoft1 Windows1 XP platform. The volumes are expressed as cm3 (median and range). The experimental uncertainty between the observers’ measurements and their target values was represented as absolute error in cm3 and relative error in percentage. The intra- and inter-examiner data were tested by an appropriate intraclass correlation coefficient (ICC) to obtain the degree of agreement. Following SHROUT & FLEISS21 for the intra-examiner agreement, the ICC 1,k (one-way random model) was used. For the inter-examiner agreement in a two-way random effects layout, the ICC 2,k equation, was selected. The dimensionless values for the ICCs range from 0 to 1, where 0 means no agreement and 1

[()TD$FIG]

Volumetric measurements on maxillary sinuses

197

total agreement. In accordance with LANDIS & KOCH, ICC values were classified on a six-point scale as poor (0.00), slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80) and almost perfect (0.81–1.00) agreement12. Additionally, the repeatability and agreement of measurements were assessed by the method described by BLAND & ALTMAN4, consisting of d¯ as the mean difference; SE of d¯ as the standard error of the mean differences; 95% CI for d¯ as the 95% confidence interval for the mean of differences and SDDiff as the standard deviation of the differences and 95% limits of agreement as d¯  2SDDiff . Results

The experimental uncertainties between the target phantom volumes of 13, 17 and 21 cm3 and the calculated volumes were less than 0.5% for all examiners. Table 1 presents the number of measurements, the calculated volumes, and the absolute and relative error. The agreement plot of observations versus the three target values for the phantom models is shown in Fig. 2. To assess the intra- and inter-examiner agreement, 324 readings were recorded. The time required to segment the volume of one maxillary sinus varied from 17 to 22 min (19.34  3.21 min) depending on the number of slices. The current study with a sample size of 36 maxillary sinuses revealed an average maxillary sinus volume of 21.99 cm3 (4.34 cm3 SD, min 13.22, max 29.24) in 20–30-yearold male subjects. The ICC coefficient for intra- and inter-examiner reliability indicates almost perfect agreement (Table 2). The Bland–Altman analysis estimates for the intra-examiner repeatability a onesided confidence limit of 1.069 cm3 or less (Table 2). As the assessment of the interexaminer agreement indicated that there is a systematic bias between examiner 1 (E1) and both examiners 2 (E2) and 3 (E3), as in the 95% CI for d, zero was not included between the lower and the upper limit. There was no bias between examiners E2 and E3.

Fig. 2. The calculated volumes are plotted against the three target volumes of 13, 17 and 21 cm3; each of the three clusters comprises 18 circles representing the measured value against the target values.

The Bland–Altman plot of the differences between examiners 2 and 3 against the average of these two examiners is depicted in Fig. 3. This plot shows that a few differences are outside of or close to the 95% limits of agreement. The slight bias of about 0.1 is not significant for the purposes of this study. Discussion

In this study, the validation of the method reveals very small uncertainties of semiautomatic virtual volumetric analysis on CT scans of maxillary sinus-like phantoms. These excellent results are supplemented by the agreement plot where readings for the three target volumes scatter evenly around the line of identity. Comparisons of previous validation studies show considerable variability on the experimental uncertainty; this may be due to different CT data acquisition techniques, segmentation methods and choice of objects of investigation. Assessing irregular phantoms, both UCHIDA et al.23 and ROTH et al.18 calculated that the mean difference was 5% or less by means of standard reconstruction based on axial

Table 1. The values for calculated volume, absolute error and relative error are the means of three determinations by three examiners each. Target volume

k

Calculated volume/cm3 (SD)

Absolute error/cm3 (SD)

Relative error/%

13 ml 17 ml 21 ml

18 18 18

12.938 (0.050) 16.926 (0.039) 20.932 (0.100)

0.067 (0.041) 0.074 (0.039) 0.092 (0.077)

0.386 0.230 0.477

Average

54

N/A

0.078 (0.055)

0.364

k is the number of judgments; SD, standard deviation; N/A, not applicable.

slides. POSNICK et al.16 compared intracranial volume measurements on dry skulls with CT-derived measures, finding an average difference of 3.7%, while DASTIDAR et al. reported that the relative error was within 1%, obtaining the geometrically simple volume of fluid-filled syringes6. One reason for the more expedient results in this study may be due to the semi-automatic segmentation method selecting appropriate Hounsfield unit thresholds to automatically exclude debarred structures (i.e. impression material or bone and teeth) from the volume rendering. Another reason could be that spiral CT avoids gaps between successive slices and hence provides a real 3D dataset24. In the clinical part of the study, an average maxillary sinus volume of 21.99 cm3 was obtained from 20- to 30year-old male subjects. In this decade of life, the maxillary sinuses are typically entirely expanded9,10. This result is in the range of published mean volumetric measurements for adult male patients from 15.40 cm3 (SD  7.04 cm3)1 to 24.04 cm3 10 . Investigations on the size of human maxillary sinuses revealed that volumetric values can differ depending on age, gender, state of dentition and even bilateral comparison in the same patient9,10. None of these studies investigated the influence of replicate readings and different examiners on the agreement. The intraclass correlation coefficient (ICC 1,3) is a classical index based on analysis of variance (ANOVA) to quantify intra-examiner agreement5,14,21, while multilevel linear models with variation components such

198

Kirmeier et al.

Table 2. Correlation coefficients ICC (1,3) for the intra-examiner agreement and the ICC (2,2) for the agreement between examiner 1 and examiner 2 and the Bland–Altman calculation. ICC

Bland–Altman ICC coefficient

95% CI lower ! upper limit

Intra-E1

(1,3) 0.9970

0.9948 ! 0.9983

Intra-E2

(1,3) 0.9985

0.9974 ! 0.9992

Intra-E3

(1,3) 0.9994

0.9989 ! 0.9997

Inter-E2–3

(2,2) 0.9958

0.9929 ! 0.9979

Reading

d¯ a

SE of d¯ a

R1 vs R2 R1 vs R3 R2 vs R3 R1 vs R2 R1 vs R2 R2 vs R3 R1 vs R2 R1 vs R3 R2 vs R3 E1 vs E2 E1 vs E3 E2 vs E3

0.087 0.103 0.016 0.057 0.057 0.015 0.012 0.025 0.014 0.217 0.114 0.102

0.081 0.081 0.019 0.053 0.053 0.046 0.029 0.026 0.019 0.083 0.041 0.079

95% CI for d a lower ! upper limit 0.048 ! 0.071 0.079 ! 0.028 0.089 ! 0.062 0.051 ! 0.165 0.051 ! 0.165 0.111 ! 0.081 0.077 ! 0.251 0.060 ! 0.267 0.024 ! 0.056 0.385 ! 0.049 0.031 ! 0.198 0.263 ! 0.058

SDDiffa 0.485 0.483 0.117 0.319 0.319 0.279 0.177 0.157 0.117 0.497 0.248 0.473

95% limits of agreementa 0.883 ! 1.057 0.863 ! 1.069 0.218 ! 0.250 0.581 ! 0.695 0.581 ! 0.695 0.573! 0.543 0.342 ! 0.366 0.339 ! 0.289 0.248 ! 0.220 1.211 ! 0.777 0.382 ! 0.611 1.049 ! 0.845

E1–E3, examiners 1–3; R1–R3, readings 1–3. a cm3.

[()TD$FIG]

as the ICC 2,221 allow estimation of correlation coefficients with relaxed two-way ANOVA assumptions3,7,17. For the intraand inter-examiner reliability coefficients in this study, all ICC values indicate almost perfect agreement. The ICC values are estimations dependent on the variability of the quantitative readings compared with the total variation across all readings. Accordingly, a large between-subject variation of 13.22–29.24 cm3 for the maxillary sinus volumes influences the reliability coefficient and is likely to produce an overestimate17. This classical method of error measurement could not determine whether the results of the three

examiners can be considered equivalent. In part, the high ICCs for these data conceal discord between ratings and examiners as these values do not provide sufficient information on the magnitude of disagreement4,13. The presence of constant bias does not change the value of the correlation coefficient. Such differences may be caused by difficulties in tracking the bony boundaries of the maxillary sinuses for less experienced examiners, especially towards the ethmoidal sinuses or in patients with a pronounced ethmoidal bulla. The magnitude of the acceptable confidence interval can only be judged when the purpose of application is con-

Fig. 3. Bland–Altman plot of the difference between examiner 2 and examiner 3 against their mean for 36 maxillary sinuses; the dotted line represents the mean difference; the solid lines display the lower and upper limit of agreement (d¯  2SDDiff ).

sidered4. In this study, the confidence limits for the intra-examiner variation up to 0.366 cm3 (E3) and 0.695 cm3 (E2) are clinically acceptable for quantitative evaluation of the maxillary sinus with an average volume of 21.99 cm3, while a one-sided limit of up to 1.069 cm3 (E1) indicates insufficient intra-examiner agreement. As recommended by others4,13, the calculation of the mean differences and the 95% limits of agreement should complement the appraisal of agreement. The data for the intra-examiner agreement indicate that this technique is reliable for each examiner over time. The resulting Bland–Altman values for the inter-examiner agreement unmask a systemic bias (0 not included in the 95% CI) for the less experienced examiner (E1) who probably needs more training. This bias suggests a consistent change in the interpretation of the maxillary sinus border by E1. Measurements of the maxillary sinuses appear reliable in the hand of the two experienced examiners (E2 and E3) and both are considered to be interchangeable since a bias of about 0.1 cm3 is not significant for this clinical question. The good results achieved with this time-consuming measurement procedure support its applicability for clinical evaluation of even small changes of the antral volume in the follow-up of sinus augmentation or tooth extraction. It would be a credible goal to fully automatize sinus volume calculation and provide high accuracy at the same time. In conclusion, the method described here has the advantage of small experimental uncertainties if it is carried out by experienced examiners. It can strongly be recommended as a clinical diagnostic tool

Volumetric measurements on maxillary sinuses to gain reliable data on maxillary sinus volumes in situations where the bony boundaries of the maxillary sinuses are intact. Acknowledgements. The authors thank Ms Irene Mischak, Dental Clinic, Graz, Austria, for her valuable statistical advice and Ms Eugenia Lamont for editorial assistance.

Conflict of interest: None. Funding: The study was completely financed by department funding. Ethical approval: Not required. References 1. Ariji Y, Kuroki T, Moriguchi S, Ariji E, Kanda S. Age changes in the volume of the human maxillary sinus: a study using computed tomography. Dentomaxillofac Radiol 1994: 23: 163–168. 2. Barghouth G, Prior JO, Lepori D, Duvoisin B, Schnyder P, Gudinchet F. Paranasal sinuses in children: size evaluation of maxillary, sphenoid and frontal sinuses by magnetic resonance imaging and proposal of volume index percentile curves. Eur Radiol 2002: 12: 1451–1458. 10.1007/s00330-001-1218-9. 3. Barnhart HX, Song J, Haber M. Assessing agreement in studies designed with replicated measurements. Stat Med 2005: 24: 1371–1384. 10.1002/sim.2006. 4. Bland JM, Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med 1990: 20: 337–340. 5. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986: i: 307–310. 6. Dastidar P, Heinonen T, Numminen J, Rautiainen M, Laasonen E. Semiautomatic segmentation of computed tomographic images in volumetric estimation of nasal airway. Eur Arch Otorhinolaryngol 1999: 256: 192–198.

7. Haber M, Barnhart HX. Coefficients of agreement for fixed observers. Stat Methods Med Res 2006: 15: 255–271. 10.1191/0962280206sm441oa. 8. Hounsfield GN. Computerized transverse axial scanning (tomography). 1. Description of system. Br J Radiol 1973: 46: 1016–1022. 9. Ikeda A, Ikeda M, Komatsuzaki A. A CT study of the course of growth of the maxillary sinus: normal subjects and subjects with chronic sinusitis. ORL J Otorhinolaryngol Relat Spec 1998: 60: 147– 152. 10. Jun BC, Song SW, Park CS, Lee DH, Cho KJ, Cho JH. The analysis of maxillary sinus aeration according to aging process; volume assessment by 3-dimensional reconstruction by high-resolutional CT scanning. Otolaryngol Head Neck Surg 2005: 132: 429–434. 10.1016/ j.otohns.2004.11.012. 11. Kawarai Y, Fukushima K, Ogawa T, Nishizaki K, Gu¨ndu¨z M, Fujimoto M, Masuda Y. Volume quantification of healthy paranasal cavity by three-dimensional CT imaging. Acta Otolaryngol 1999: 540: 45–49. 12. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977: 33: 159–174. 13. Lee J, Koh D, Ong CN. Statistical evaluation of agreement between two methods for measuring a quantitative variable. Comput Biol Med 1989: 19: 61–70. 14. Lin L. Overview of agreement statistics for medical devices. J Biopharm Stat 2008: 18: 126–144. 10.1080/ 10543400701668290. 15. Penev P, Sotirov S, Dimitrov D, Tschitelowa N, Gegusskova S, Todorov G, Nakowa V, Koruewa V, Welitschkov S, Batinkov B, Sotirov V. Anthropometrische Untersuchungen u¨ber die Volumina der Sinus maxillares. Stomatol DDR 1981: 31: 20–23. 16. Posnick JC, Bite U, Nakano P, Davis J, Armstrong D. Indirect intracranial volume measurements using CT scans: clinical applications for craniosynostosis. Plast Reconstr Surg 1992: 89: 34–45.

199

17. Rankin G, Stokes M. Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses. Clin Rehabil 1998: 12: 187– 199. 18. Roth DA, Gosain AK, McCarthy JG, Stracher MA, Lefton DR, Grayson BH. A CT scan technique for quantitative volumetric assessment of the mandible after distraction osteogenesis. Plast Reconstr Surg 1997: 99: 1237–1247. 19. Rubin GD, Napel S, Leung AN. Volumetric analysis of volumetric data: achieving a paradigm shift. Radiology 1996: 200: 312–317. 20. Schaeffer JP. The sinus maxillaris and its relation in the embryo, child and adult man. Am J Anat 1910: 10: 313–368. 21. Shrout P, Fleiss J. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979: 86: 420–428. 22. Smith TD, Siegel MI, Mooney MP, Burrows AM, Todhunter JS. Formation and enlargement of the paranasal sinuses in normal and cleft lip and palate human fetuses. Cleft Palate Craniofac J 1997: 34: 483–489. 23. Uchida Y, Goto M, Katsuki T, Soejima Y. Measurement of maxillary sinus volume using computerized tomographic images. Int J Oral Maxillofac Implants 1998: 13: 811–818. 24. Van Hoe L, Haven F, Bellon E, Baert AL, Bosmans H, Feron M, Suetens P, Marchal G. Factors influencing the accuracy of volume measurements in spiral CT. A phantom study. J Comput Assist Tomogr 1997: 21: 332–338. Address: R. Kirmeier Department of Oral Surgery and Radiology School of Dental Medicine Medical University of Graz Auenbruggerplatz 12 A-8036 Graz Austria Tel: +43 316 385 13281 Fax: +43 316 385 6858 E-mail: [email protected]