American Journal of ORTHODONTICS Volw~zc
60, Number
ORIGINAL
The reliability
2, August,
1971
ARTICLES
of head film
measurements I. Landmark identification Sheldon San
Baumrind,
Francisco,
D.D.S.,
M.S.,
and
Robert
C. Frank,
D.D.S.
Calif.
E
ver since the work of Broadbentl first captured the imagination of the orthodontic specialty, measurements of head films have played a major role in orthodontic research and treatment planning. Measurements of head films are used for two general purposes-description and prediction. Head film measurements may be used descriptively in several ways: 1. To categorize cases according to type. (For example, one may speak of “a high-angle case” or “a case with an ANB angle of 6 degrees”). 2. To define the degree to which an observed case departs from some accepted norm. (For example, we may compare values in the observed case with the norms of Downs, Steiner, or Tweed). 3. To indicate the extent of changes occurring during treatment. (For example, we may say that “the mandibular plane angle opened 3 degrees” or “the ANB angle was reduced 4 degrees”). Head film measurements have also been used by some practitioners in an attempt to predict growth trends. These workers believe, for example, that, “vertical growers will continue to grow vertically” or that “steep mandibular planes tend to grow steeper.” Unfortunately, it must be said that no rigorous test has yet been successful in demonstrating the propriety of the use of head films in this fashion and that there are both theoretical and practical considerations which imply strongly that head films can never be of more than adjunctive use in growth prediction. From the Division Francisco.
of Orthodontics,
This study was supported SO1 FR-05305 (Pavone) Grant of the Computer
School
of Dentistry,
University
of California,
in part by United States Public Health Service and in part by funds from the Instruction and Center, University of California, San Francisco.
San
Grant 5 Research
111
112
Baurnrind
and
Rruntz
AWL. J. Orthod. August 1971
(It may not be inappropriate to consider briefly why this is so. It is a common observation that conventionally used angular and linear measures are highly intercorrelated. The consequence of this fact upon the predictive power of head films is not so genera]ly understood. A high intercorrelation among head film measures means that many apparently discrete measures are, in fact, markedly overlapping so that two or more measures reflect the same underlying anatomic condition in slightly different terms. For this reason, even sophisticated statistical techniques which interplay large numbers of apparently different head film “variables” do not encompass (or, in statistical terms, “explain”) a sufficient partiOn of the total variation in the growth system to be clinically predictive. The conclusion appears inescapable that the total amount of information contained in head films is not sufficiently large to make clinically meaningful predictions possible from this s~ww &one, even if we were able to assess all the contained parameters perfectly. No doubt information from head films would be predictively useful if we could combine it with information from other diagnostic tools which assess entirely different portions of the variance in the system (for example, model analyses, muscle studies, and case histories), but, unfortunately, at the present time we have no satisfactory mechanism for quantitatively interrelating these different kinds of information.)
Regardless of the use to which head films are to be put, it is important to know how accurately measurements on the films are being made and what are the sources of measurement error. Head film measurements, like all other measurements, involve error, as even the earliest investigators were aware. Two general classes of error occur in the estimation of cranial dimensions from head films. The first may be termed “errors of projection.” These result from the fact that the head film is a two-dimensional shadow of a three-dimensional object. Since the rays which produce the shadow are nonparallel and originate from a very small source, head films are always distorted enlargements, the enlargement factor varying with the plane at which the estimated point lies. Head films are further distorted by foreshortening of distances between points lying in different planes and by radial displacement of all points and structures not on the principal axis (or, as orthodontists call it, the central ray). Bjiirk,” Hixon3 Brodie,4 and Sa1zmann5 have each commented on these projectional errors. Adams,6 Wylie and Elsasser,’ Vogel,” and others have attempted to introduce correction factors for some of these sources of error, but the cumbersomeness of the necessary computations has mitigated against the general use of these adjustments. Meaningful systematic corrections for projectional errors could be obtained either by the use of stereo-head films or by the integration of information from lateral and frontal films (as in Broadbent’s method). At the moment, however, neither of these methods is considered practical for routine clinical use. The second general class of errors in head film measurement may be termed “errors of identification.” These are the errors involved in the apparently straightforward process of identifying specific anatomic landmarks on head films. It has long been intuitively recognized that the precision with which we identify the landmarks used in the standard clinical analyses of Downs9 Steiner,1o Tweed,11 Reidel,lz Holdaway13 and others varies from point to point. But while it seems obvious that, for example, “nasion is easier to identify than is point B,” the nature and magnitude of the differences in the precision with which we identify the different landmarks used in standard cephalometric analyses have never, to our knowledge, been quantitatively measured. The present article
Volume Number
60 2
Reliability
of head film measurements
113
reports on the first of two studies directed toward quantitation of errors in landmark identification and toward assessing the effect of these errors upon subsequent angular and linear measurements. Materials
and
methods
As part of a general project designed to develop rapid and precise new techniques for evaluating the effects of orthodontic treatment, a study was designed to cheek operator reliability in the identification of standard cephalometric landmarks. A sample of twenty lateral skull head films was selected at random from among those available in the records of 122 cases whose orthodontic treatment was started and completed at the University of California School of Dentistry between 1954 and 1964. All films were taken with the right side of the patient’s face oriented toward the x-ray tube and the left side nearer the film, with the putative central ray passing through the porion-porion axis. A transparent plastic template having four needle-point perforations in a rectangular configuration at known distances from each other was constructed from a 60 mil. cellulose acetate sheet. The template was overlayed on each head film individually, and each of the four points was registered on each head film by perforating the head film with a fine sewing needle held in a dental broach holder. The template was placed on the head films in such a way that the rectangle defined by the four perforations contained all or almost all of the landmarks to be measured. Each of the twenty films was then “traced” by each of the five members of the first-year postgraduate class at the University of California Division of Orthodontics. All of the judges had been trained to the same criteria, having just completed the same course of training in cephalometric diagnostics. At the time the presently reported evaluations were begun, the class had been in graduate training for 7 months. The intent of the study was to simulate as nearly as possible the range of operator and head film variation in general clinical orthodontic practice. Since it is well known that the difficulty in locating a given landmark varies from head film to head film, each judge was asked to evaluate the relatively large number of twenty head films a single time, rather than to make replicated assessments of a smaller number of films. In a further attempt to simulate the range of variation which occurs in clinical practice, no conscious effort was made to increase interjudge reliability as to the manner in which landmarks were to be located except for the obtaining of consensual verbal agreement as to the definition for each landmark. On the other hand, it was considered desirable to minimize variation from all sources other than differences in judges’ opinions and differences in head film characteristics. For this reason, all films were evaluated using similar light boxes and tracing acetate and under the same conditions of general illumination. Each judge was asked to overlay each head film with a conventional translucent acetate sheet (0.903 inch matte cellulose acetate). He first recorded on the acetate a three-digit random number identifying the film and his own
114
Baumrind
and
Prank
Am.
J. Orthod. August 1971
two-digit identification number. Then he located, marked, and identified with a pencil sixteen standard cephalometric landmarks which had been consensually chosen and defined by the student group as follows : 1. sSella turcica, the midpoint of the pituitary fossa as determined by inspection. 2. Na-
Nasion, suture.
3. Or-
Orbitale, the inferiormost point inent of the orbital images.
4. A-
Point
A, the deepest point on
bone
between
Upper more tooth’s
incisor anteriorly
6. UIE-
Upper upper
incisor central
edge, incisor.
the
tip
of the
incisal
edge
of the
more
anteriorly
placed
7. LIE-
Lower lower
incisor central
edge, incisor.
the
tip
of the
incisal
edge
of the
more
anteriorly
placed
8. LIB-
Lower incisor apex, the point of most anteriorly positioned lower root-end curvature.
5. UIA-
9. B-
the
Pogonion,
11.
Menton,
M-
root
Point B, mandible incisor.
10. P-
most
antero-inferior
ANS
apex, the point of positioned upper end curvature.
the
anteriormost inferiormost
frontal
orbital
margin,
curvature crest
intersection central
the the
more
prom-
long axis of the contour of that
between the the contour
and
of
on the
bony
the
mandible
the crest
long of
anterior of the
chin
at the at
the
the
mesiobuccal
border lower
of the central
midline. symphysis.
14. GOU-
Gonion (upper), the lowest point on the curvature of the angle of the mandible where the body of the mandible meets the ramus. Estimates were made for both sides of the mandible, The more superior value is considered to represent the left side.
15.
Gonion (lower), value is considered
of
the
of the tooth’s
Mesiobuccal left, distally positioned
the occlusalmost point lower first molar.
of
axis the
13. MBL-
Porion, the used in this ture.
point molar.
the
Mesiobuccal more mesially
16. PO-
occlusalmost lower first
nasofrontal
of the surface of the maxillary of the upper central incisor.
the curvature and the alveolar
on
at the
using
between incisor and
intersection incisor
point
bone
12. MBB-
GOL-
right, the positioned
on
point
on the
of the
the alveolar
and the
the deepest point between pogonion
the
point
same as 14 above but for the to represent the right side.
superiormost study, porion
point of the is a machine
image point
lower
of the rather
mesiobuccal
of
cusp
cusp
the
two
of
of
the
more
angles.
cephalostat ear than an anatomic
the
This
As struc-
rod.
Paired values were recorded for gonion and for the lower permanent first molar. This was done because it was thought that best estimates of “mandibular plane” and “occlusal plane” could later be made by averaging these pairs of values. For the purposes of the present study, ‘however, the location of planes is not a consideration, and the two sets of values for “gonion” and for “mesio-
Volume Number
60 2
Reliability
of head film measurements
115
buccal cusp” may be considered to be redundant estimates of a single phenomenon. The data for the study were thus derived from 100 acetate tracings, five for each of twenty head films. On each tracing, all twenty points (four reference and sixteen landmark) had been located. Therefore, for each of sixteen landmarks, 100 independently located estimates were available. The data-reduction problem was to relate the 100 estimates for each landmark to each other in such a way that meaningful conclusions could be drawn. Since precise physical superimposition of five tracings upon each other was considered impossible, a computational solution was developed, employing a specially devised computer program, Because this computational solution has generalized applicability, its characteristics are to be reported in some detail elsewhere. For the present purposes, it is sufficient to say that the coordinates for each of the twenty points on each tracing were first determined with respect to arbitrary X and Y axes, using an Oscar K “coordinatograph.” The landmark coordinate values for all five tracings for each x-ray film were then mathematically superimposed, registered on the four reference points. The five sets of X and Y values for each landmark were then averaged, yielding a best estimate for that landmark for that film. For each head film, the X axis was next redefined as the line connecting the best estimates of S and N. There now existed for each head film sixteen small five-point scattergrams, oriented to the SN line (one scattergram for each landmark, one point in each scattergram from each tracing). Each of the twenty head films in the study yielded such a set of sixteen scattergrams. The twenty small five-point scattergrams for each landmark (one from each head film) were now superimposed mathematically with their origins and axes in common. This yielded a set of sixteen “hundred-point scattergrams,” each representing the dispersion of estimating errors for a single landmark around the best estimate for that landmark and each oriented to the SN line. Sources Any
of
error
factor capable of modifying the magnitude or distribution of observed deviations other than random differences in judgment as to landmark location must be considered a source of error in terms of the present study. Possible sources of error lay in the areas of (I.) representativeness of head films, (Z) representativeness of judges, (3) machine errors in point location, (4) errors in superimposition of tracings, and (5) errors in location of the major axes. Eepresentativeness of head films. As previously stated, the head films used were selectctd randomly from among those available in the University of California files. There is no rea.snn to consider them unrepresentative of films that the average clinician would consider satisfactory for routine clinical use, although it is necessary to note that films from only two x-ray machines are involved. Represe~tatiwenessof judges. It was not possible to select judges randomly, Instead, all available judges were used. However, it is thought that there is no reason to belive that the judges employed are unrepresentative of competent clinicians with average training in eepbt~iometrics. Machine errors in point location. The Oscar K coordinatograph is a device which measures linear distances as a function of variations in the voltage of an electrical circuit. Coordinate values are obtained to the nearest 0.1 mm. In the mode in which we employed it, the machine Precision of this system is stated to have a range of error in reproducibility no greater than fr 0.1 mm.
116
Baumrkd
and Prantz
Am.
J. Orthod. August 1971
Fig. 1. A, Upper portion of a representative head film with hundred-point scattergrams for sella, porion, nasion, and orbitale superimposed semischematically at appropriate scale. B, Lower portion of a representative head film with hundred-point scattergrams for upper incisor apex, point A, lower incisor edge, upper incisor edge, point B, lower incisor apex, pogonion, menton, mesiobuccal cusp (of lower molar), and gonion superimposed semischematically at appropriate scale. Redundant sets of estimates for gonion and mesiobuccal cusp have been omitted for ease in visualization.
Reliddity
of head film measurements
117
tlltll!l!li 01 2345
mm
L
PORION
SCALE
SELLA
‘f
. i .
. .
i t .
i 3..
e .
.
.
+I-+-
.
.
.
.t .
i ORB
ITALE
Fig. 2. Distribution
POINT
NASION of
estimating
errors,
maxillary
skeletal
A
landmarks.
Errors in szcperi~position of tracings. Tracing values were mathematically translated, rotated, and superimposed on the four registration points using a “least squares best fit” program. Any tracing in which the total X + Y error for any registration point exceeded 1 mm. was rejected. The range of residuals from the true value for the four registration points for each tracing averaged 0.7 mm., with a standard deviation approximately 0.3. To the extent that the landmarks under study were located within the configuration of the registration point rectangle, errors in landmark location introduced by errors in tracing superimposition were. by reason of geometry, smaller than the superimpositional errors themselves. Errors in lnajor axis location. This study has in common with other orthodontic studies a susceptibility to generalized error for all landmarks of a given tracing as a result of chance severe errors in the location of the major axis landmarks (sella and nasion). A measure of security was afforded, however, by the fact that the nasion and sella values utilized in establishing the major axes for each head film were the means of five estimates. error their
With the exception of the problem of representativeness would tend to alter the size rather than the shape effects would be random. NO
puter
consequential program since
errors are its precision
introduced extends
of judges, of scattergram
by the mathematical considerabIy beyond
all these sources of distributions, since
computations three decimal
of our places.
com-
Findings
Using an electronic plotting machine, each of the sixteen “hundred-point scattergrams” was generated automatically from the punch card output of the
118
Baumrind
and Prantz
Am.
J. Orthod. August 1971
i--M 0
12345
mm
MENTON
POGONION
SCALE
GONION Fig.
3. Distribution
of estimating
POINT errors,
mandibular
skeletal
I3
landmarks.
specially devised computer program. Fig. 1 is a semischematic montage intended solely to orient the reader physically with respect to the output data. In this figure, transparencies of fourteen of the sixteen “hundred-point scattertheir appropriate positions upon grams ’ ’ are shown mounted in approximately a single representative head film. The head film and the scattergrams are at the same scale and are correctly proportioned as to size. (Of course, each of the scattergrams actually represents combined data from twenty head films, rather than 100 estimates for this film. This accounts for the fact that some of the scattergrams, particularly that for gonion, do not “fit” too well on this particular head film.) Figs. 2, 3, and 4 are enlarged, precise representations of the fourteen scattergrams seen at smaller scale in Fig. 1. They are uniformly scaled and uniformly oriented with respect to the SN line. Fig. 5 tabulates cumulatively the number and per cent of estimates within stated distances from the head film
Reliability
of hetrd film measwements
0-t LOWER&ISOR
119
mm
UPP~J&l;ClSOR
SCALE
. . -0.
a.,
l e..-
iLOWER
INCISOR
APEX UPPER
Fig. 4. Distribution
INCISOR
... . APEX
MESIO-DUCCAL CUSP LOWER FIRST MOLAR of estimating errors, dental landmarks.
mean for each of the distributions shown in Figs. 2, 3, and 4. While it is not appropriate to overemphasize minor differences in rank, it may be seen clearly from Figs. 2 through 5 that there are large differences in reliability of estimation among the several landmarks and that gonion and lower incisor apex are clearly the least reliable landmarks. Simple statistics for the data means, standard deviations, and standard errors are tabulated in Fig. 6. Here dental and skeletal landmarks are listed separately, arranged within groups in order of increasing magnitude of error. Column 1 of Fig. 6 lists the standard deviations for the error distributions for the various landmarks. However, examination of the scattergrams in Figs. 2, 3, and 4 will have revealed that the distributions of error for most landmarks vary in the X and Y directions. For example, in Fig. 3 it may be seen that the estimates of menton are distributed primarily along the horizontal (or X) axis, while the
Fig.
a
5. Number
and
ORDERED
per
cent
of
estimates
within
32
60 52
91 70
99
13
15
OF GREATEST
13
6 LOWER
IN TERMS
17 18
4 LOWER 6 CUSP(L) 5 LOWER6CUSP(R)
RANK
73 36
1 APEX
76
1 EDGE
LANDMARKS
B. DENTAL
1 UPPER
(L)
IO GONION
5 0
2 LOWER 1 EDGE 3 UPPER 1 APEX
(U)
9 GONION
73 53
35
7 ORBITALE 8 POINTB
31
53
23
6 POGONION
stated
NUMBER
49
84 66
96 83
100
23 21
73
75 79
81
63
4 MENTON 5 POINT A
83
100 87
79 57
65
94
24 21
3 NASION
2 SELLA
99
1.5Omm
97
1 .OOmm
.50mm
75 63
of Errors
and Per Cent Less Than
Number
LANDMARKS
1 PORION
A. SKELETAL
56 58
35 46
from
head
WITHIN film
mean
lmm
for
specified
OF HEADFILM
6 11
6 2
IO 9
4 4
i Ii
OF ESTIMATES distances
2
1 2
3 3
2
-l---T
28 30
2
3
2
landmarks.
MEAN
7 22 39
17
73
66
13 20
10
8 12
6
51
16 34
17
4
79
77
21 27
25
17 19
13
87
85
27 47
47
37
43
21
100
95
65 69
77
79
76
35
100
100
100 100
100
100 100
100
CUMULATIVE LISTING OF NUMBER AND PER CENT OF ERRORS E UAL TO C GREATER 1-l rHAN SPEl FIEI i/AL\ is 4.00 3.00 2.00 1.50 1 .oo .oo 5.00 L50 1 3 25 100 37 6 100
1
Reliability
of head film measurements
121
estimates for pogonion are distributed primarily along the vertical (or Y) axis. For this reason, the X and Y components of the total variance for each landmark were isolated, yielding the separate standard deviations in the X and Y directions shown in columns 2 and 3 of Fig. 5. (This principle is represented graphically in Fig. 6 for the landmarks menton and pogonion.) The estimating errors with which we have been dealing have both magnitude (represented by arithmetic value) and direction (represented by arithmetic sign). The question now arises: What is the mean magnitude of error when single estimates are made of a landmark’s position? This question is important because in the usual clinical situation, at least to date, only a single estimate is made of the location of each landmark. Column 4 of Fig. 5, therefore, lists the estimated mean errors for single estimates for each landmark, obtained as the mean of the absolute (that is, unsigned) values of each group of 100 estimates. (A cautionary note is in order here concerning the process of stacking each group of twenty small scattergrams to make one large one. It will be remembered that the mean of each group of five estimates per landmark per film was considered the best estimate of the true value and was defined as lying at the scattergram origin. Actually, however, while the mean of the five estimates is the best estimate of the true value, there is no assurance that it is, in fact, the trzle V&M, that it, in fact, does lie at the origin. Indeed, the collection of means of the twenty small five-point scattergrams from each landmark may more properly be considered as a single sample estimating the true landmark position, each member of which is based on five replications. Considered in this way, the mean of the twenty small scattergrams for each landmark constitutes a single sample of N = 20 from a possible infinite population of such samples. The precision of the estimate of the mean of these is, therefore, equal to o/vZO, rather than the more favorable CT/ v/100 which appeared at the outset to be the case, and
the confidence interval around the best estimate must be widened accordingly. Approaching this idea in another way, we would note that while each of the 100 estimates for each landmark has been made independently, the coordinate values of the estimates are not independent, since each value depends on the values of the other four members of its group of five. Therefore, for each landmark there are properly only twenty completely independent groups of coordinate values. The computations for standard deviation and for sampling error of the mean recorded in Fig. 5 have been adjusted in such a way as to take into account this lack of complete independence among tracings. All standard deviations have been computed using the formula
where N = the number of head films, K the number of tracings per head the deviation of an individual tracing value from the mean value for that film. The denominator of the variance in the present study, therefore, or 80. The values for standard error of the mean given in Fig. 5, column
We may therefore say that at approximately the 95 best estimate of the average error with which each indicated interval. In terms of the graphic representations of Figs. for the groups of five tracings should be distributed
per cent landmark
film, and each d that landmark for becomes 20(5-l), 4, are
computed
level of confidence is estimated lies
2, 3, and 4, the fact around the scattergram
that
the within
the origin
as
true the means rather
122
Baumrind
and Prantz
A. SKELETAL
Am.
J. Orthod. August 1971
LANDMARKS MEASURES
OF DISPERSION SDyC
SD, b
SD=
1
MEAN ESTIMATING ERROR d
1.
PORION
.53
.36
.38
.39
_+ .I3
2.
SELLA
.64
.44
.46
.48
f
.I4
3.
NASION
1.46
.60
1.33
.73
+
.52
4.
MENTON
1.38
1.25
1.00
-+ .36
5.
POINT
1.41
.55
.59 1.29
1.00
3x .37
6.
POGONION
1.44
.59
1.32
1.06
f
7.
ORBITALE
1.91
1.03
1.09
+
.65
1.27
5
.60
A
.36
8.
POINT
1.97
.64
1.61 1.86
9.
GONION
(U)
4.71
3.33
3.34
3.48
z!z 1.12
0.
GONION
(L)
5.21
3.71
3.53
3.75
f 1.10
1.05 1.32
f f
B
L
B. DENTAL
LANDMARKS
1.
UPPER
2.
LOWER
3. 4.
UPPER 1 APEX LOWER 6 CUSP (L)
5. 6.
LOWER LOWER
a STANDARD
1 EDGE 1 EDGE
6 CUSP (RI 1 APEX DEVIATION
FOR
TOTAL
.50 .59
ERROR
b STANDARD
DEVIATION
FOR
ERROR
IN HORIZONTAL
c STANDARD
DEVIATION
FOR
ERROR
IN VERTICAL
DIRECTION DIRECTION
d SAMPLEMEAN+2SEM
Fig. 6. Estimating than at relative estimating however,
errors
for
selected
the origin itself implies that while to each other, they are all slightly error. The general configuration, remain unaffected.)
the
head
film
landmarks.
scattergrams are correct constricted and all slightly scale, and relative sizes of
in size and shape understate the the scattergrams,
Discussion
Perhaps the most important inferences to be drawn from the foregoing data are the most obvious ones: First, that even when one is replicating assessments of the same head film, errors in landmark identification are too great to be ignored; second, that the magnitude of error varies greatly from landmark to landmark; and, third, that the distribution of errors for most landmarks is not random but is, rather, systematic, in the sense that each landmark has its own characteristic and usually noncircular envelope of error.
IZeliability
VI7
of head film measurements
123
MENTON
POGON
i0N
Fig. 7. left, upper and lower: Distributions of estimating errors for menton and pogonion. Circles enclose areas one, two, and three standard deviations from the origin or best estimate and illustrate the total standard deviation represented in column 1 of Fig. 6. Right, upper and lower: Distributions of estimating errors for menton and pogonion. Rectangles enclose areas one, two, and three standard deviations from the origin or best estimate when the total variance is partitioned into X and Y components, as represented in columns 2 and 3 of Fig. 6. Note that the figures on the right encompass the dot configurations more efficiently and enclose less extraneous space than do those on the left. This illustrates the superiority of describing error in terms of the X and Y components over the use of a single statement for total error. Note, too, that the X and Y statements of columns 2 and 3 of Fig. 6 correctly reflect the fact that the X error component is larger than the Y component for menton, while the opposite is the case for pogonion.
It may be further said that the perceptual task involved in identifying landmarks varies from point to point. Most often the judge is asked to estimate the position of a point on an edge. The precision with which this operation is carried out is, in large part, a function of how sharply the edge folds in the region of the point being estimated. Where the edge folds very sharply (as at the upper incisor edge or the lower incisor edge), the estimates are very good indeed. However, where the edge is a gradual curve (for example, in the region of point A, point B, or gonion), the task is rendered more difficult and the errors tend to br: proportionately larger and to be distributed along the edge itself (that is, along the surface of the skull). This condition also appears to hold true in the ca.ses of pogonion and menton, both of which proved more variable than had been expected.
124
Baumrind
and Prantz
Am.
J. Orthod. August 1971
A corollary factor is the sharpness of the viewed edge-the degree to which the edge contrasts with the surrounding area. Most of the points that orthodontists locate lie upon surfaces of the skull. Those structures which, on the contrary, lie within the confines of the skull have a greater likelihood of being confounded by “noise” from adjacent or superimposed structures. This consideration almost certainly accounts for the difficulty that we have in locating accurately the cusps of posterior teeth. There are, of course, some landmarks for which the confounding “noise” from adjacent structures is so great that the judge is asked to estimate the position of a point for which there is frequently no direct physical evidence at all on the head film. Lower incisor apex is an example of such a landmark (as are, we believe, the apices of the roots of posterior teeth and the positions of various points on the condyle, neither of which were investigated in the present study). In the case of lower incisor apex, the judge is frequently forced to project the position of the point as a conceptual operation based upon his general knowledge of how long a tooth usually is and what is the expected rate of taper, given the perceived conformation of the crown and visible portion of the root. In making such conceptual judgments, the prior experience of the judge is an important factor, and it is in judgments of this sort that an experienced operator should tend to be more reliable than a novice. We do not contend that conceptual judgments are invalid, but one should not be surprised to find them more variable than judgments for perceived points. A further problem appears to be rigor of definition. For example, orbitable had been defined by us in terms of the “more prominent” orbit. Retrospective examination of our data has established that some of the errors in location of this point involved differences in opinion as to which orbit was the more prominent. Gonion is another example of augmentation of error due to weak definition. A detailed examination of the estimates for this point showed definite and consequential systematic interjudge differences resulting from differences in opinion as to where the ramus and the body of the mandible meet. As a result of examining our errors due to definition, we have, ourselves, now redefined gonion as a tangent swung from menton and have redefined orbitale in terms of the more anterior (rather than the more prominent) orbit. Some remarks on porion, sella, and nasion seem in order. With respect to porion, it must be recognized that the observed high reliability is attributable in large measure to the fact that, in our study, porion is defined as a machine point rather than an anatomic one. Sella presents a unique problem among the points in the present study in that it involves visual estimation of the center of a structure. We should not be surprised to find that performance of such a task is quite good, since visual estimation of midpoints is a kind of mental averaging process yielding means of reduced dispersion. Had either the anterior or posterior clenoid processes been estimated, it is likely that the dispersion would have been greater. Nasion is an especially important point, since a very large number of clinically employed angular relationships are based on the line SN, For the most part, the estimates for this point were quite good, but t.here was a disquieting
Volume Number
60 2
Reliability
of head film nleas~~reme?~ts 125
number of gross errors, as may be seen in Fig. 1 and on line 3 of Fig. 4. These outliers, which produce the unexpectedly large standard deviations of line 3 in Fig. 5, are not the result of minor differences in opinion as to the contours of the nasofrontal suture. Rather, they are the result of identifications of entirely different structures. It is obvious that an entire clinical diagnosis can be distorted by one such misjudgment. The question of the ramifying effects of a single bad estimate of nasion brings us to a consideration of the problem of outliers in general. There are those who would contend that the number of outliers for any given landmark is small and tha.t we, therefore, can reasonably discard from consideration, as a rare a,nd unlikely event, that portion of each scattergram of errors which lies more than, say, two standard deviations from the origin. The fallacy of this line of reasoning becomes strikingly apparent, however, the moment we recall that every standard system of head film analysis makes use of measures of a large number of landmarks. For example, the probability that a given estimate for a given landmark will differ by chance from the true value by more than two times the standard deviation for that landmark is approximately 0.05. Therefore, if wt’ estimate t,he positions of two landmarks, the probability that we will be successful both times is 0.95 x 0.95. If we conduct a conventional cephalometric analysis involving est,imations for sixteen landmarks (not an unreasonable number), the probability that we will locate all sixteen landmarks successfully {defined in the same sense) is 0.9516, or 44 per cent. Thus, there are fifty-six chances in 100 that the analysis will contain, on the average, at least one landmark value which deriates by more than two times the standard deviation from the true head film value for that landmark. When we consider the fact that head film tracings are used iI1 pairs for comparisons, the average chance that all thirty-two estimations in the pair will be located with errors of less than two standard deviations falls to 0.44 :i 0.44, or 19.4 per cent. Thus, we establish that the probability of comparing tracings of two films (say, pre- and posttreatment) without having at least one value on at least one tracing in error by more than two standard deviations is slight’lv less than two chances in ten! It may then be seen that the vast majority of head film analyses based on a single estimate for each landmark cannot help but be flawed, t,he more so since we have absolutely no way of gauging which estimates are the bad ones. It is sometimes contended that, while great precision may be required for research procedures, estimating accuracy of lower orders is sufficient for “routine clinical judgments.” Unfortunately, to the extent that clinicians base their actual clinical procedures on values from cephalometric analyses, precisely the opposite is the case. This is true because research conclusions are drawn on the basis of the means of samples of considerable size using statistical tests which have builtin controls and which assess penalties in the event of consequential measurement errors, while in traditional clinical analyses there are no controls whatever for error. How can one introduce such controls into clinical head film analysis? The obvious answer is to replicate measurements. If each of the sixteen values for each head film is measured independently twice and the above two-standard-
Baumrind
126
and
Am.
Frantz
J. Ort?kod. August 1971
deviation criterion is applied, the probability that the mean of any pair of’ independent estimates will still be more than two standard deviations from the true value is the probability of erring in the Same direction (or tail) from the true due in both estimates, namely, 0.025 x 0.025, or 0.625 per cent. The likelihood, on the average, of completing a sixteen-point replicated analysis on a single film without undetectable error becomes l-16 (.000625), or 98 per cent. The average likelihood of completing a comparison of two such films without error is increased to [l-16 (.000625) ] x [l-l6 (.000625) 1, or 98 per cent. Were one to use more than two replications, the likelihood of undetected error would undergo a further exponentially rapid decrease. But how can one make replicated
analyses of head films as a routine
pro-
cedure? Obviously, the answer is that such procedures could be done practically only with the aid of automatic data-reduction systems. The procedure developed at the University of California in the course of the present study constitutes one such system. There are other systems already, and obviously there will be many more in the future. It is not too early, however, to conclude that the control and reduction of estimating errors through replication of analyses will provide a major portion of the rationale for the use of computer equipment in routine clinical head film diagnostics. Summary
Using automatic coordinate-locating equipment and a specially devised computer program, assessments have been made of the reliability of identification of sixteen standard head film landmarks in replicated tracings of the same head film. The distribution of error for each landmark has been represented graphically, and simple statistical analyses have been tabulated and discussed. Large differences in magnitude and configuration of envelope of error were found among the different landmarks, and an attempt has been made to account for the differences observed. The suggestion is made that the impact of the observed errors in landmark location on clinical decisions can be reduced through the routine use of replicated estimates for each landmark. However, such a procedure is considered to be feasible only if automatic data-reducing equipment is employed. A subsequent article will report on the effects of the observed errors in landmark location upon the values of twenty-three conventionally used angular and linear measures. It is to be noted that this study does not address itself to the separate problem of the difficulty in identifying the same structure reliably on different head films of the same subject. We wish to acknowledge the gracious assistance of Dr. Robert Elashoff, Chief, Research San Francisco. Dr. David Goheen, Research Systems Division, University of California, specifically for use in this study. The Systems Division, developed the program ‘xray” programming task was materially facilitated by the generous cooperation of Professor Paul Wolf, Department of Civil Engineering, University of California, Berkeley. REFERENCES
1. Broadbent, B. H.: A new x-ray Orthod. 1: 45-66, 1931.
technique
and its application
to orthodontia,
Angle
Volume Number
60 2
l2eliability
of head
film
measuremeds
127
2. Bjiirk, A., and Solow, B.: Measurements on radiographs, 5. Dent. Res. 41: 672-683, 1962. 3. Nixon, E. H.: Cephalometrics and longitudinal research, AM. J. ORTHOD. 46: 36-42, 1960. 4. Brodie, A. G.: Cephalometrie roentgenoIogy: History, techniques and uses, J. Oral Surg. 7: 185-198, 1949. 5. Salzmann, J. A.: Limitations of roentgenographic cephalometrics, AM. J. ORTHOD. 50: 169-188, 1964. 6. Adams, J. Iv’.: Correction of error in cephalometric roentgenograms, Angle Orthod. 10: 3-13, 1940. 7. Wylie, W. L., and Elsasser, W. A.: Undistorted vertical projections of the head from lateral and posteroanterior roentgenograms, Am. J. Roentgenol. 60: 414-417, 1948. 8. Vogel, C. J.: Correction of frontal dimensions from head x-rays, Angle Orthod. 37: I-S, 7 96i. 9. Downs, W. B.: Variations in facial relationships: Their significance in treatment and prognosis, AM. J. ORTHOD. 34: 8X-840, 1948. 10. Steiner, C.: Cephalometrics for you and me, AM. J. ORTHOD. 39: 729-755, 1953. 11. Tweed, C.: The diagnostic facial triangle in the control of treatment objectives, AK ;1. ORTHOD. 55: 651-667, 1969. 12. Reidel, R.: Analysis of dentofacial relationships, AM. J. ORTHOD. 43: 103-119, 1957. 13. Holdaway, R.: Changes in the relationships of points A and B, AK J. ORTHOD. 42: 176” 193, 7956.