136
Readers' forum
not have been likely to exert a similar influence on each rater's score, since the order effect varies between people because of their different characters. So we suggest repeating the images in different orders several times (but in the same order to every rater) and then getting a mean rating. The second concern is related to the trend of the dentists' VAS scores. As the result demonstrates, the number of years in practice did significantly affect the ratings of the dentists. But we wondered how the authors drew the conclusion, since we found only 1 dentist with 21 to 30 years in practice in this study, which is too small a sample capacity to get to that. And also, the distribution of the dentists with different years in practice is not equal: 12, 7, and 1 years, respectively. In addition to the concerns above, we have a question. IC (intercanine distance) is defined as the distance between the most distal surfaces of the canines, whereas IL (interlast visible maxillary tooth distance) is defined as the distance between the most distal surfaces of the last visible maxillary teeth in the article. But as Table III illustrates, the mean of IL:SW is smaller than the mean of IC:SW; that puzzled us because IL should be larger than IC. Yinqiu Yan Zeyun Tian Chengdu, Sichuan, China Am J Orthod Dentofacial Orthop 2014;146:135-6 0889-5406/$36.00 Copyright Ó 2014 by the American Association of Orthodontists. http://dx.doi.org/10.1016/j.ajodo.2014.05.012
Author's response
T
hank you to Drs Yan and Tian for the comments and question regarding Part 2 of our work regarding arch-width and buccal corridor changes. Drs Yan and Tian suggested that there will always be bias with the use of VAS ratings and that perhaps the eyes of the subjects could have been covered on the assessment photographs so that the assessors could focus directly on the smile. VAS ratings were indeed used in this work. They have previously been used by many authors and have been described and used many times in other articles appearing in the peer-reviewed literature. When planning this work, we discussed providing everything from the full face, with or without partial blocking out, to an edited smile, with or without the surrounding perioral tissues. We first decided that natural smiles from orthodontic patients would be used rather than computer-generated artificial smiles. Ultimately, we decided to present the whole face,
August 2014 Vol 146 Issue 2
because the assessment of the smile is undertaken naturally in public with the view of everything, including facial shape, eye positions and color, as well as skin and hair quality, color, and style. On reflection, we are still happy that this method does provide at least 1 way in which to validly view the smile in relation to its natural surroundings. The correspondents also questioned the order in which the images were viewed by the assessors. On reflection, we are happy with the set order that was given to each reviewer and do not believe that it would introduce any more bias than other methods. They also pointed out that the “number of years in practice” seemed to affect the ratings of orthodontists and dentists, and they were concerned that there was only 1 dentist in the 21- to 30-year group and that the distribution of subjects into the year groups was not even. That would of course be a real problem if one were statistically testing differences in means for various classification groups. The subjects were placed into these year groups simply for descriptive purposes. Differences in those means were not used for this part of the assessment, however. Instead, continuous correlations were run against “years in practice,” with all raters included, so that the numbers in each group were then not so important. Finally, Drs Yan and Tian asked why the ratio (percentage) of the last molar buccal corridor to the total smile width is smaller on average than the ratio of the canine buccal corridor to the total smile width (Table III). Perhaps they misunderstood the material presented in that table. These are buccal corridor percentages of the total smile width, and in most cases, one would expect the distance from the last molar to the commissure to be less than the distance from the canine to the commissure. Hence, the last molar buccal corridor would make up a smaller percentage of the total smile width. Michael Woods Melbourne, Australia Am J Orthod Dentofacial Orthop 2014;146:136 0889-5406/$36.00 Copyright Ó 2014 by the American Association of Orthodontists. http://dx.doi.org/10.1016/j.ajodo.2014.05.013
Statistical errors in a recent article: Sella turcica in patients with type 1 diabetes
I
read with interest the article by Bavbek and Dincer (Dimensions and morphologic variations of sella turcica in type 1 diabetic patients. Am J Orthod Dentofacial Orthop 2014;145:179-87). Although it is a very good
American Journal of Orthodontics and Dentofacial Orthopedics
137
Readers' forum
research, I believe some major points need correction or clarification. 1.
2.
3.
4.
5.
It was stated that the control subjects were matched with the diabetic patients (p. 180). However, it was not clarified according to which factors they had been matched (eg, age, sex, height). Besides, which statistical tests did they use to verify that their groups are matched (none were defined)? Data normality was rejected by using 2 normality tests (Kolmogorov-Smirnov and Shapiro-Wilks). This is not the best approach, because when the sample becomes sufficiently large (such as this sample), normality tests become overpowered and tend to give significant results with the slightest deviations from normal distribution. Therefore, the lack of normality might be a false-positive error, which should be assessed by better approaches (eg, Q-Q plots). Furthermore, when the sample grows larger and larger, the sample distribution becomes less and less important because the “central limit theorem” (CLT) kicks in. Thus, even if the significant result of normality tests was not due to the false-positive error, the CLT might have already dealt with the normality issue. The lack of any P values for normality tests or any histograms, etc, disallows further evaluations. It is also not clear why 2 different normality tests were used. Even if we disregard the CLT, another issue exists in the normality assessment. The article implies (although never clarifies) that the “whole sample” or at least “each group (n 5 76)” was assessed in terms of normality. It was implied by the “n .50” report for normality tests (p. 181). However, the data normality should not be assessed for the whole sample at once. It is actually the normality of the residuals that matters, not even the normality of the subgroups, let alone the whole sample. Therefore, assessing the sample normality might be incorrect. The study design (2 groups divided into 4 subgroups each according to the variables of bone age and sex) clearly indicates the need for an ANOVA design (or its nonparametric alternative or another multivariate design). However, instead, the authors used only pair-wise comparisons, without any multivariate frameworks. The performed pair-wise comparisons were the Mann-Whitney U and Fisher exact tests. However, the authors stated that the control and diabetic groups were matched. Therefore, they needed to use paired tests instead (ie, a paired t test or a Wilcoxon signed rank test instead of the Mann-Whitney, and a McNemar test instead of the
6.
7.
Fisher). The tests they used assume independence of the groups, which does not hold in this matched set of groups and might render the results incorrect. The authors stated that “the Bonferroni correction was made.” However, throughout the text, there was no indication of using the Bonferroni correction method (despite the need for it). On the other hand, every part of the report clearly indicates the lack of using this method (ie, the alpha was set only at 0.05 and not adjusted to any corrected value). In the last paragraph of page 184, it was stated that “the effect of the interaction. was evaluated.” It is simply impossible to assess an interaction only by pair-wise comparisons. This task needs ANOVAlike tests, which were missing. Vahid Rakhshan Tehran, Iran
Am J Orthod Dentofacial Orthop 2014;146:136-7 0889-5406/$36.00 Copyright Ó 2014 by the American Association of Orthodontists. http://dx.doi.org/10.1016/j.ajodo.2014.05.009
Author's response
A
lthough the writer has the right to choose the title and content of his letter, it seems as if the purpose of the letter is not to allow the authors of this article to answer his questions. Rather, it exhibits a sharp judgment, implying the invalidation of our work, which doesn't seem fair to us. 1.
The study groups were matched according to bone age, which was detected by the method of Greulich and Pyle.1 This was clearly defined in the fourth paragraph of the “Material and methods” section. In addition, the need to match the groups according to bone age, not chronologic age, was mentioned briefly in the “Discussion” (paragraph 3, p. 183) with supporting references. The mean and standard deviation values for bone age and chronologic age of each subgroup were given in Table II. Accordingly, the finding of no statistical difference was clearly stated in the first paragraph of the “Results” section (p. 182). Pairwise comparisons were used to verify that. As P values were .0.05 for pair-wise comparisons and the difference between groups was statistically insignificant, they were not given in the table. 2. We partially do agree with this criticism. The assumption of normality is especially critical when constructing reference intervals for variables2 and
American Journal of Orthodontics and Dentofacial Orthopedics
August 2014 Vol 146 Issue 2