Int. J. Oral Maxillofac. Surg. 2005; 34: 619–626 doi:10.1016/j.ijom.2005.04.003, available online at http://www.sciencedirect.com
Clinical Paper Orthognathic Surgery
Dolphin Imaging Software: An analysis of the accuracy of cephalometric digitization and orthognathic prediction
G. Power, J. Breckon, M. Sherriff, F. McDonald* Department of Orthodontics, Floor 22, Guys Tower, GKT Dental Institute, Kings College, St. Thomas Street, London SE1 9RT, UK
G. Power, J. Breckon, M. Sherriff, F. McDonald: Dolphin Imaging Software: An analysis of the accuracy of cephalometric digitization and orthognathic prediction. Int. J. Oral Maxillofac. Surg. 2005; 34: 619–626. # 2005 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved. Abstract. The purpose of this study was to examine and compare the reproducibility and reliability of digitization using Dolphin Imaging Software (Version 8.0) with traditional manual techniques. In addition, orthognathic prediction was compared with actual outcomes. Sixty lateral cephalograms were evaluated by two methods: manual tracing and indirect digitization using Dolphin Imaging Software (Version 8.0). Method error (reliability) using duplicate measurements for each method, and comparison of both techniques (reproducibility), were investigated using alternative statistical methods, Bland and Altman (1986) and Lin’s Correlation of Concordance (1989). Each technique was significantly reliable at the 95% level (method error). Comparing the standard deviations of the differences, manual tracing proving more reliable for SNA (1.368 manually, 2.078 digitally), SNB (1.198 and 1.698), SNMx (1.398 and 2.668), and MxMd (1.778 and 2.268), and Dolphin digital tracing more reliable for UIMx (3.498 digitally and 3.978 manually) and LIMd (2.908 and 3.048). However, systematic error in the software’s calculation of LAFH% resulted in measurements 4% larger than manual techniques, a difference which is clinically significant. Comparison of actual outcome and software generated prediction for 26 orthognathic cases demonstrated clinically significant differences for all measurements (rc 0.32 for ANB to 0.91 for LIMd; P < 0.05). The investigation revealed the impact of radiographic magnification when used in an uncalibrated system. These findings indicate that Version 8.0 of Dolphin Imaging Software needs to be re-assessed for software errors that may result in clinically significant miscalculations, and to facilitate compensation of radiographic magnification when using linear measurements.
0901-5027/060619+08 $30.00/0
Accepted for publication 7 April 2005 Available online 23 May 2005
# 2005 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
620
G. Power et al.
Cephalometric radiographs have become an indispensable tool in the orthognathic setting. Traditionally, cephalometric images have been analysed by manually tracing the radiograph, which in addition to being time-consuming, has the disadvantage of being open to random and systematic error when locating landmarks8. The development of computer technology has made digital tracing possible, either by direct digitization of the radiograph or a previously traced image, or by indirect digitization of the image displayed on the monitor. In both methods, the points are located manually, and so human errors in landmark location remain, and digitization of the traced image actually increases the risk of error4. The advantages of digitization include: Manipulation of the image (enlargement and enhancement), allowing for more accurate assessment of poorly defined areas (only indirect digitization). Speed and choice of analysis. Rapid superimposition of serial radiographs. Storage and retrieval of multiple records. Easy comparison of data in studies. Dolphin Imaging Version 8.0 Software involves the indirect digitization of multiple dental, skeletal and soft-tissue landmarks of the scanned cephalogram, using a mouse-controlled cursor. The image can be enhanced and enlarged to aid in landmark location, with the program clearly defining landmarks and demonstrating their expected position, thereby minimising errors in landmark definition5. Once digitization is complete, the software links up the points to give a recognisable traced image, which can be manually manipulated for improved fit, if felt necessary. The analysis of choice is then selected. Orthognathic prediction tracing is of utmost importance for several reasons: The stability and limits of orthodontics and surgery can be assessed. The actual procedure and anteroposterior and vertical movements required can be decided upon, along with model surgery. The effects of surgical movements upon the soft tissues can be ascertained. By superimposition of photographs, patients can be given an idea of surgical outcome, but must be warned that this is an estimate and not a guarantee. This can aid in gaining informed consent. Prior to computerized cephalometrics, prediction involved either alteration of the manual tracings, or sectioning of lateral photographs, both methods being
time-consuming, unrealistic and inaccurate. Computer-aided diagnosis and treatment planning has become far more common in recent years, and while it has been shown that predictive software can work well in average cases, there is variation in prediction quality amongst the several varieties of package available on the market. Dolphin Imaging has become increasingly popular amongst surgeons and orthodontists and a market leader, but a search of the literature found no research into the accuracy of the dentoskeletal cephalometric analysis produced by any versions of the Dolphin Software, or of its accuracy in predicting hard tissue changes from orthognathic surgery. The purpose of the study was firstly to assess the precision of Dolphin Imaging Version 8.0 compared with the manual tracing of the same radiographs, and secondly to investigate the accuracy of the software in predicting the post-operative skeletal and dental relationships in a variety of orthognathic cases. Materials and methods Stage 1—comparison of manual and digital tracing
Data collection—Stage 1
Sixty randomly selected cephalometric radiographs of discharged patients from the Orthodontic Departments at Guy’s Hospital, London, and the Queen Victoria Hospital, East Grinstead were selected, ensuring they fulfilled several criteria to be included: Superimposition of earposts. Patients biting in occlusion. No unerupted or partially erupted teeth that could hinder incisor apex identification. The radiographs selected were of varying qualities and thus ease of landmark identification differed substantially amongst the group.
reference plane. A sheet of semi-matt, fine grade acetate paper was then taped over the radiograph, and landmarks in the Eastman Analysis4, commonly used by surgeons and orthodontists in the UK, were located, using a sharp 5H pencil. Eight measurements were used from the Eastman Analysis: SNA, SNB, ANB, SNMx, MxMd, UIMx, LIMd, LAFH% Measurement technique—the eight angular measurements were rounded to the nearest 0.58 using a Perspex protractor, and linear measurements were measured and rounded to the nearest 0.5 mm using a Perspex ruler (3M UnitekTM Cephalometric Protractor).
Digital tracing The cephalograms were scanned in using a flatbed scanner (Epson Expression 1680 Pro) at 300 dpi, with a 100 mm calibration ruler linked to a Dell Computer (Dell WORKSTATION PWS530 running on Microsoft Windows 2000). Once captured using the Dolphin Imaging Version 8.0 (Dolphin Imaging, Chatsworth, CA), the image was repositioned parallel to the Frankfort Horizontal, and finally stored in the Dolphin Imaging archive. Radiographic images were subsequently opened using the Dolphin Imaging program and digitized on a 17-in. colour monitor at a screen resolution of 1074 728 pixels. The landmarks were digitized as prompted by the Dolphin system directly on-screen using a cross-hair locator controlled by the mouse after locating two fiducial points located 100 mm apart on the calibration ruler. Manipulation and enhancement was used to assist in point identification when difficulty was encountered. Once digitization was complete, the Eastman Analysis was selected from the Analysis Toolbar, and those measurements that had been previously recorded manually were used in subsequent comparative statistics.
Repeat tracings Methods—Stage 1
To allow for optimal landmark identification both manually and digitally, all tracings and digitizations were performed in a darkened room and located by the same operator (GP).
Manual tracing Each radiograph was taped to a lightbox, using the Frankfort plane as the horizontal
So as to determine operator reliability and reproducibility, and to establish reproducibility of both methods, all 60 radiographs were retraced for both methods, with a 1-month interval between each recording. To avoid any errors that could be encountered in the digital methods when capturing the image and orientating it to the Frankfort Horizontal, the same saved image was digitized on both the first and second occasions.
Dolphin Imaging Software Table 1. Surgical procedures Procedure
Number
Mandibular advancement (BSSO)
3
Le Fort I Advancement Impaction
2 2
Bimaxillary osteotomy Maxillary advancement and mandibular setback Maxillary impaction and mandibular advancement Maxillary impaction and mandibular setback
Stage 2—comparison of actual and software-predicted outcome
Data collection—Stage 2
All of the cases used in the second stage of the investigation had undergone orthodontics prior to orthognathic surgery, the surgical correction performed by one of two consultant maxillofacial surgeons. Two lateral radiographs were required for each case: 1. Post-decompensatory orthodontics, prior to surgery 2. Post-surgical, taken within 3 months of surgery Twenty-six subjects fulfilled these criteria. It was decided that for inclusion in the study, the cut-off point of 3 months would be applied so as to prevent postsurgical orthodontics influencing the cephalometric measurements. The surgical procedures to correct the malocclusions were variable (Table 1). Methods—Stage 2
The pre- and post-surgery lateral cephalograms were digitized using the Dolphin System Software, following the method described in Stage 1 (‘Digital tracing’). Four weeks later, all post-operative lateral radiographs were retraced to check for operator reliability and reproducibility. The Dolphin Software was then used to predict the surgical outcome on the digitized pre-surgery radiograph using surgical procedures and measurements taken from the laboratory data used to make surgical wafers. Surgical movements were performed by entering the movement in
5 11 3
millimetres on a display. As the software had difficulty applying the command ‘‘autorotate mandible’’, such movements were applied manually using the ‘‘drag and drop’’ manoeuvre. The predicted result was saved, and the process repeated 4 weeks later. The differences between the predicted results and the actual results of surgery were then noted and compared. Statistical analysis
The data were analysed using Stata 8 (StataCorp 2003, Stata Statistical Software: Release 8.0. Stata Corporation, College Station, TX, USA). The error of the method (reliability) was assessed by comparing initial and repeated measurements using the methods proposed by BLAND & ALTMAN3 and LIN6, and for each variable with each method. Manual and digital tracings were then compared using the same methods (reproducibility). The accuracy of the software-generated prediction was analysed using the Bland– Altman method and Lin’s Concordance Coefficient (reproducibility). The Bland and Altman statistical method3 allows for simple estimation of agreement between two measurements of the same object (reliability), and between two methods (reproducibility). The difference of the paired estimates is plotted against the mean of the difference for each reading. If the two observations are in perfect agreement, all points will lie along a horizontal line (the mean) at zero, method bias resulting in an offset of the points from this line. A confidence interval reference range which would be expected
621
to contain 95% of normally distributed data can be set at approximately two standard deviations of the mean. Measurements lying outside of this limit (random errors) are reported as ‘‘outliers’’ (n) in the tables in the results section. This 95% confidence interval range can be compared with the standard deviation of the Eastman Analysis values, to establish if the data would be significant in the clinical situation. If the differences of the two measurements within the range are not clinically significant, then the techniques being tested, such as manual and digital tracing, can be used interchangeably. Lin’s statistical method6 combines measures of precision and accuracy. Departure from the standard of rc can be measured by how far the observations deviate from the concordance line in the scale of 1 (perfect agreement) to 0 (no agreement) to 1 (perfect but reversed agreement). The bias correction factor (C\), measures how far the best-fit line deviates from the 458 line of perfect concordance in the scale of 1 (no deviation) to (but not including) 0 (very far away). Results Reliability of manual tracing—method error
The confidence interval range for UIMx (15.888) (Table 2) demonstrates errors that would have clinical significance, and the larger mean differences of MxMd ( 1.03) and LIMd (1.03) indicate systematic error (bias). LAFH% (rc = 0.88) and UIMx (rc = 0.90) have the lowest correlation (Table 3), although all measurements correlate at the 95% confidence interval (P < 0.05).
Reliability of digital tracing—method error
The confidence interval range for SNA (8.308), SNB (6.748), SNMx (10.638), MxMd (9.048), UIMx (13.978) and LAFH (4.52%) demonstrate errors that would have clinical significance, and the mean difference of UIMx (2.01) indicates systematic error (bias) (Table 4). SNMx
Table 2. Summary of statistical analysis of Bland–Altman plots for method error (reliability) of manual tracings No. Mean difference S.D. of difference CI range No. of outliers (n/60)
SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
LAFH%
60 0.17 1.36 5.46 5
60 0.48 1.19 4.77 3
60 0.29 1.06 4.25 2
60 0.15 1.39 5.56 4
60 1.03 1.77 7.07 4
60 0.64 3.97 15.88 5
60 1.03 3.04 12.15 2
60 0.48 1.18 4.71 4
622
G. Power et al.
Table 3. Summary of statistical analysis of Lin’s Concordance Correlation Graphs for method error (reliability) of manual tracings SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
0.93 0.001 1.00 1.03 1.90
0.95 0.001 0.99 0.97 3.10
0.94 0.001 1.00 0.98 0.20
0.92 0.001 1.00 0.96 0.41
0.96 0.001 0.99 1.02 1.54
rc P Bias correction factor, C\ Slope Intercept
UIMx (8) 0.90 0.001 0.99 0.91 11.03
LIMd (8)
LAFH%
0.95 0.001 0.99 1.04 3.08
0.88 0.001 0.98 1.11 5.62
Table 4. Summary of statistical analysis of Bland–Altman plots for method error (reliability) of digital tracings No. Mean difference S.D. of difference CI range No. of outliers (n/60)
SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
LAFH%
60 0.61 2.07 8.30 5
60 0.37 1.69 6.74 4
60 0.25 0.87 3.50 3
60 0.73 2.66 10.63 2
60 0.32 2.26 9.04 3
60 2.01 3.49 13.97 2
60 0.96 2.90 11.59 4
60 0.22 1.13 4.52 1
Table 5. Summary of statistical analysis of Lin’s Concordance Correlation Graphs for method error (reliability) of digital tracings rc P Bias correction factor, C\ Slope Intercept
SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
LAFH%
0.85 0.001 0.99 0.98 0.82
0.92 0.001 1.00 0.96 3.04
0.96 0.001 1.00 1.03 0.37
0.74 0.001 0.98 1.05 0.46
0.95 0.001 1.00 0.97 0.55
0.89 0.001 0.97 0.94 9.09
0.96 0.001 1.00 1.02 0.41
0.90 0.001 1.00 0.97 1.89
Table 6. Summary of statistical analysis of Bland–Altman plots comparing manual and digital methods (reproducibility) No. Mean difference S.D. of difference CI range No. of outliers (n/60)
SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
LAFH%
60 1.93 1.68 6.71 5
60 1.58 1.39 5.54 2
60 0.35 0.90 3.59 4
60 1.27 2.15 8.59 4
60 0.34 1.90 7.58 4
60 0.11 3.65 14.61 1
60 1.13 2.35 9.39 3
60 4.03 2.36 9.43 2
Table 7. Summary of statistical analysis of Lin’s Concordance Correlation Graphs comparing manual and digital methods (reproducibility) rc P Bias correction factor, C\ Slope Intercept
SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
0.79 0.001 0.88 1.02 0.77
0.87 0.001 0.93 1.03 0.72
0.95 0.001 0.99 1.00 0.36
0.76 0.001 0.94 1.00 1.29
0.97 0.001 1.00 0.92 1.87
0.91 0.001 1.00 0.95 5.79
0.97 0.001 0.99 1.00 1.01
has the lowest correlation (rc = 0.74) (Table 5). Comparison of manual and digital tracing methods (reproducibility)
The confidence interval range for SNA (6.718), SNMx (8.598), UIMx (14.618) and especially LAFH (9.43%) demonstrate errors that would have clinical sig-
nificance, and the increased mean difference of LAFH (4.03%) indicates systematic error (bias) (Table 6). The rc values indicate low correlation between manual and computer tracing for LAFH% (0.24), SNMx (0.76) and SNA (0.79), with the largest systematic errors occurring in the comparison of LAFH%. The effect of random error when measuring UIMx, demonstrated by the slope of
LAFH% 0.24 0.001 0.43 1.01 3.53
0.95, has resulted in the intercept of 5.79 (Table 7). Comparison of software-generated prediction and actual outcomes of surgery (reproducibility)
The measurements from the software-generated predictions and the actual outcome were used to compare the accuracy of
Table 8. Summary of statistical analysis of Bland–Altman plots comparing software-generated predictions and actual outcome (reproducibility) No. Mean difference S.D. of difference CI range No. of outliers (n/26)
SNA (8)
SNB (8)
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
LAFH%
26 2.53 4.17 16.69 2
26 0.05 2.70 10.81 1
26 2.48 2.63 10.53 1
26 1.62 3.65 14.58 1
26 0.83 3.43 13.73 2
26 0.16 3.76 15.05 2
26 1.75 3.49 13.94 1
26 0.75 1.97 7.87 2
Dolphin Imaging Software
623
Table 9. Summary of statistical analysis of Lin’s Correlation of Concordance Graphs comparing software-generated predictions and actual outcome (reproducibility) SNA (8) rc P Bias correction factor, C\ Slope Intercept
0.55 0.001 0.87 0.87 13.25
software prediction. To minimise the error effect of outliers, the mean of the repeated measurements for either method was used in the calculations of the Bland–Altman and Lin’s Coefficient of Concordance methods. The mean differences for SNA (2.538) and ANB (2.488) are increased, demonstrating error. All confidence intervals were clinically significant (Table 8). There is low concordance (rc) for ANB (0.32), SNMx (0.52), SNA (0.55), LAFH% (0.66) and UIMx (0.68). The bias correction factor (C\) for ANB is increased (0.67), although its P value (0.013) fails to demonstrate systematic deviation at the 95% level. All differences between software generated and actual outcome measurements are clinically significant (Table 9). Discussion
The simple descriptive analyses described by BLAND & ALTMAN3 and LIN6 permit the assessment of agreement between two imperfect clinical measurements, or the repeatability of duplicate observations. The ease of interpretation and complimentary nature of such novel methods is an obvious advantage. It must be noted however, that the ability to demonstrate statistical significance, as many studies tend to do, does not necessarily have any clinical meaning1. The reduced Lin’s Concordance Correlation for SNMx (0.74) and SNA (0.85) demonstrate lower intra-examiner reliability for these points. This could be due to the unreliability of locating both N and A points in the vertical plane, as has been previously documented2,9,11. The Lin’s Concordance for SNMx results from a combination of systematic and random errors, caused by outliers on the Bland– Altman plot. Random errors are reflected by the rotated slope of 1.05. Both SNA and SNMx have a confidence interval range of 8.308 and 10.638, respectively, which would be clinically significant. The confidence interval range for LIMd (11.598) lies just within the standard deviations described in the Eastman Analysis (93 68). This large range is a com-
SNB (8) 0.83 0.001 0.98 1.22 18.1
ANB (8)
SNMx (8)
MxMd (8)
UIMx (8)
LIMd (8)
LAFH%
0.32 0.013 0.67 0.95 2.67
0.52 0.001 0.92 0.96 1.83
0.86 0.001 0.99 0.94 0.90
0.68 0.001 1.00 0.94 6.61
0.91 0.001 0.97 0.88 9.50
0.66 0.001 0.95 0.92 5.98
bination of the effects of locating the lower incisal apex and the errors inherent in estimating the position of gonion, both involved in the measurement LIMd. However, the mean difference (0.96), the Lin’s Concordance (0.96) and the bias correction factor (1.00) demonstrate acceptable reliability when digitizing this point. The ability to locate gonion by construction rather than by estimation would further increase the reliability of this landmark when traced digitally, and so increase the reliability of LIMd and MxMd. The high mean difference of UIMx (2.01) and standard deviation of the difference (3.49), as well as the decreased Lin’s Concordance Correlation (0.89) and bias correction factor (0.97), indicate lower reliability for this measurement. The upper incisor apex is usually reliably located2, although it has been suggested the cursor design may have obscured those peripheral structures that aid in landmark identification, so making visualisation more difficult11. Whilst a confidence interval range of 13.978 would be significant in the clinical setting, the P value of 0.001 demonstrates significant correlation between repeat digital tracings, and its reliability is acceptable.
Table 10. Standard deviation of the differences for manual and digital tracing Variable SNA (8) SNB (8) ANB (8) SNMx (8) MxMd (8) UIMx (8) LIMd (8) LAFH%
Manual
Digital
1.36 1.19 1.06 1.39 1.77 3.97 3.04 1.18
2.07 1.69 0.87 2.66 2.26 3.49 2.90 1.13
Small but clinically significant differences exist when repeating measurements for SNB (CI range of 6.748), MxMd (9.048) and LAFH% (4.528). RICHARDSON10 and SANDLER11 compared traditional manual tracing with computerized cephalometric analysis using the standard deviations of the differences, which was felt gave a more reliable comparison than the mean (Table 10). Comparing this study’s results in the same way, manual tracing is more reliable when measuring SNA, SNB, SNMx, and MxMd, whilst Dolphin digital tracing is more reliable than traditional manual tracing when measuring ANB, UIMx, and LIMd. The values for LAFH% were approximately equal. It
Fig. 1. Graph of Lin’s Concordance Correlation for SNA (8) between manual and digital techniques.
624
G. Power et al.
Fig. 2. Graph of Lin’s Concordance Correlation for SNB (8) between manual and digital techniques.
must be noted that these differences are very small, with the largest difference occurring for SNMx. These observations were not tested statistically, and thus the small differences may be accounted for by the errors incurred when making the measurements. The increased reliability of MxMd when traced manually is not altogether surprising, as it has been well established that gonion is located more accurately by construction, as was performed manually, than by estimation, as when digitizing11. Enhancement when digitizing appears to have facilitated the location of the
incisal apices, as it has been established that the lower incisal apex is an unreliable point to locate2. This is reflected by the increased reliability of UIMx and LIMd using digital methods. This is similar to the findings of SANDLER11, who added that manual tracing of these points could be obscured by the sheet of tracing paper. When examining Lin’s Correlation of SNA (Fig. 1), SNB (Fig. 2) and ANB (Fig. 3), there is a constant deviation of the two methods, with digital measurements being constantly and proportionally larger, resulting from systematic error (Table 6). The systematic errors that occur when measuring SNA (digital mean 1.938
Fig. 3. Graph of Lin’s Concordance Correlation for ANB (8) between manual and digital techniques.
larger than manual) and SNB (digital mean 1.588 larger than manual) largely cancel each other out when calculating ANB, such that ANB is less than 0.58 larger for digital measurements compared to manual (mean difference = 0.358). The standard deviation of the difference for ANB (0.90) is small, leading to the conclusion that this systematic error resulting in larger digital measurements is clinically insignificant, and thus digitization yields results for the measurements SNA, SNB and ANB that are comparable to manual methods (Table 6). Likewise, constant deviation was found when comparing manual and digital measurements of SNMx, but in this case the manual measurements were larger by a little over 18 (mean difference = 1.278). The only points common to SNA, SNB and SNMx are nasion and sella, which are known to be fairly accurate points to measure7,9,10. The deviation suggests nasion is systematically located more posteriorly when digitizing, so tending to make the digital SNA and SNB larger than the manual measurements. It is possible that this occurs because the cursor design obscures landmark identification11. It must be pointed out, however, that these differences are not clinically significant, and the P values show digital methods compare favourably with manual for the measurement of SNA, SNB, ANB, SNMx, MxMd, UIMx, and LIMd at the 0.05 level. Comparison of the standard deviation of the differences for manual and digital tracings (Table 10) implies that both methods were equally reliable when measuring LAFH% (manual 1.18, digital 1.13). However, when examining the Bland–Altman and Lin’s data for LAFH% (Tables 6 and 7) a very different result is observed, and it becomes obvious that there is less agreement between the two methods. The large mean difference (4.03%), the decreased Lin’s Correlation (0.24) and bias correction factor (0.43) all indicate error, with the Lin’s Correlation demonstrating obvious systematic bias, and not random error (Fig. 4). Initially it was thought that different landmarks were being used to calculate LAFH%, although this proved not to be the case. When examining the software’s calculation of LAFH%, it became apparent that the Dolphin Software was consistently erroneous, the method by which Dolphin Software calculates LAFH% remaining unknown. A real example of such an error in calculating LAFH% is shown below:
Dolphin Imaging Software
Fig. 4. Graph of Lin’s Concordance Correlation for LAFH% between manual and digital techniques. Dolphin data: LAFH (Me-Mx) (mm) UAFH (N-Mx) (mm) LAFH/TAFH (%)
76.6 57.6 61.4
The correctly calculated LAFH% using the LAFH (mm) and UAFH (mm) is actually 57.1%. A 4% error in calculating LAFH% is clinically significant, with the software giving the impression of an increased LAFH% when it is in fact within the normal range. Errors of this type have been identified in previous papers and show the importance of rigor of definition in preventing systematic errors2,5.
SNA had the largest mean difference (2.53), standard deviation of the difference (4.17) and range for the 95% confidence interval (16.69) of all measurements between predicted and actual outcome, and is thus the least reproducible measurement. Such a large confidence interval would be clinically significant. The lowest Lin’s coefficient of correlation between the actual and predicted outcomes was for ANB (0.32) (Table 9), which also had an increased mean difference (2.48) and range (10.53) (Table 8). This confidence interval range of 10.538 would be clinically significant. The concordance plot (Fig. 5) is suggestive of systematic error, with few random errors
Fig. 5. Graph of Lin’s Concordance Correlation for ANB (8) between software-predicted and actual outcome.
625
influencing this slope since it lies nearly parallel to the line of concordance (slope = 0.95). The graph shows that in the horizontal plane, either the actual outcome exceeds the movement planned, possibly due to the surgeons consistently moving distances in excess of those planned, or the software not moving the skeletal structures the distance entered into the surgical options display. The surgical movements were presumed correct as precise surgical wafers controlled them. The software’s influence is due to the fact that at no stage was calibration required to compensate for radiographic magnification, which was not discussed in either the user manual or the Help toolbar. When Dolphin were contacted about the discrepancy, the response was ‘‘Dolphin does not explicitly mention the magnification factor is because traditionally the measurements and norms you’d find in publications are based on magnified cephs, but they do not account for the factor either’’ and ‘‘we have in the works to address this issue’’. Whilst magnification factors will not affect angular measurements, inputting uncalibrated linear data such as surgical movements will have an effect. The same effects of magnification were evident in the vertical plane when examining the Lin’s Concordance Correlation for SNMx (Table 9), which also demonstrated systematic error, with the actual outcome consistently larger than the software-generated prediction (slope = 0.95, intercept = 1.838). LAFH% and MxMd have similar plots but are also affected by the effects of random error. Such systematic errors resulting from comparisons without compensation for radiographic enlargement have been previously highlighted5. As soft tissue changes are reported to be more variable and difficult to predict postsurgically, the program’s ability to assess such changes should be assessed. A recent article12 compared the ability of Dolphin Imaging Version 8.0 in simulating the soft tissue outcome after surgery with four orthognathic programs currently dominating the U.S. market, namely Dentofacial Planner Plus Version 2.5b (DFP) (Dentofacial Software), Dolphin Imaging Version 8.0 (DI) (Dolphin Imaging, Chatsworth, CA), Vistadent AT (GAC) (GAC International) (successor to Prescription Portrait/ Planner), OrthoPlan Version 3.0.4 (OP) (Practice Works, Atlanta, GA) (successor to Orthognathic Treatment Planner), and Quick Ceph 2000 (QC) (QuickCeph Systems). This study found that Dolphin simulation to be the second most accurate, though default (non-enhanced) simulations
626
G. Power et al.
resembled the actual result only 10% of the time. All programs simulating soft tissue response to surgery are based upon algorithms that relate the soft tissues to skeletal repositioning. Hence, it is essential that the program performs skeletal movements accurately before comparing actual and simulated soft tissue outcome. In summary, digital tracing using Dolphin 8.0 is significantly correlated with traditional manual tracing techniques for all measurements, but clinically significant differences (according to the Eastman Analysis values) are noted for the measurements SNA, SNMx, UIMx, and LAFH%. The ability to locate gonion by using construction resulted in manual methods being more accurate in the measurement MxMd. The use of enhancement tools assisted in the location of the incisal apices, so making the digital measurements of UIMx and LIMd more reliable than manual. Software errors in the calculation of LAFH% resulted in clinically significant systematic errors in this measurement. The program’s lack of information regarding compensation for radiographic magnification does not affect angular measurements when using Dolphin for diagnostic purposes. However, when considering orthognathic movements, the lack
of compensation for variations in magnification has significant effects. This will result in clinically significant differences for all predicted measurements in both the vertical (as reflected by LAFH%, SNMx and MxMd) and horizontal (reflected by SNA, SNB and ANB) planes. As a result, the current program is not yet as reliable as traditional techniques for planning orthognathic movements.
8.
9.
10.
References 1. Battagel JM. A comparative assessment of cephalometric errors. Euro J Orthod 1993: 15: 305–314. 2. Baumrind S, Frantz RC. The reliability of head film measurements: I. Landmark identification. Am J Orthod 1971: 60: 111–127. 3. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986: 1: 307–310. 4. Cohen AM. Uncertainty in cephalometrics. Br J Orthod 1984: 11: 44–48. 5. Houston WJB. The analysis of errors in orthodontic measurements. Am J Orthod 1983: 83: 382–390. 6. Lin LI-K. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989: 45: 255–268. 7. Midtga˚rd J, Bjo¨rk G, Linder-Aronson S. Reproducibility of cephalometric
11. 12.
landmarks and errors of measurements of cephalometric cranial distances. Angle Orthodontist 1974: 44: 56–61. Mills JRE. The application and importance of cephalometry in orthodontic treatment. The Orthodontist 1970: 2: 32–42. Richardson A. An investigation into the reproducibility of some points, planes and lines used in cephalometric analysis. Am J Orthod 1966: 52: 637–651. Richardson A. A comparison of traditional and computerized methods of cephalometric analysis. Euro J Orthod 1981: 3: 15–20. Sandler PJ. Reproducibility of cephalometric measurements. Br J Orthod 1988: 15: 105–110. Smith JD, Thomas PM, Proffit WR. A comparison of current prediction imaging programs. Am J Orthod Dentofacial Orthop 2004: 125: 527–536.
Address: Fraser McDonald Department of Orthodontics Floor 22, Guys Tower GKT Dental Institute Kings College St. Thomas Street London SE1 9RT UK Tel: +44 20 7188 4415 E-mail:
[email protected]