The reliability of head film measurements

The reliability of head film measurements

American Journal of ORTHODONTICS Volw~zc 60, Number ORIGINAL The reliability 2, August, 1971 ARTICLES of head film measurements I. Landmark id...

1MB Sizes 0 Downloads 18 Views

American Journal of ORTHODONTICS Volw~zc

60, Number

ORIGINAL

The reliability

2, August,

1971

ARTICLES

of head film

measurements I. Landmark identification Sheldon San

Baumrind,

Francisco,

D.D.S.,

M.S.,

and

Robert

C. Frank,

D.D.S.

Calif.

E

ver since the work of Broadbentl first captured the imagination of the orthodontic specialty, measurements of head films have played a major role in orthodontic research and treatment planning. Measurements of head films are used for two general purposes-description and prediction. Head film measurements may be used descriptively in several ways: 1. To categorize cases according to type. (For example, one may speak of “a high-angle case” or “a case with an ANB angle of 6 degrees”). 2. To define the degree to which an observed case departs from some accepted norm. (For example, we may compare values in the observed case with the norms of Downs, Steiner, or Tweed). 3. To indicate the extent of changes occurring during treatment. (For example, we may say that “the mandibular plane angle opened 3 degrees” or “the ANB angle was reduced 4 degrees”). Head film measurements have also been used by some practitioners in an attempt to predict growth trends. These workers believe, for example, that, “vertical growers will continue to grow vertically” or that “steep mandibular planes tend to grow steeper.” Unfortunately, it must be said that no rigorous test has yet been successful in demonstrating the propriety of the use of head films in this fashion and that there are both theoretical and practical considerations which imply strongly that head films can never be of more than adjunctive use in growth prediction. From the Division Francisco.

of Orthodontics,

This study was supported SO1 FR-05305 (Pavone) Grant of the Computer

School

of Dentistry,

University

of California,

in part by United States Public Health Service and in part by funds from the Instruction and Center, University of California, San Francisco.

San

Grant 5 Research

111

112

Baurnrind

and

Rruntz

AWL. J. Orthod. August 1971

(It may not be inappropriate to consider briefly why this is so. It is a common observation that conventionally used angular and linear measures are highly intercorrelated. The consequence of this fact upon the predictive power of head films is not so genera]ly understood. A high intercorrelation among head film measures means that many apparently discrete measures are, in fact, markedly overlapping so that two or more measures reflect the same underlying anatomic condition in slightly different terms. For this reason, even sophisticated statistical techniques which interplay large numbers of apparently different head film “variables” do not encompass (or, in statistical terms, “explain”) a sufficient partiOn of the total variation in the growth system to be clinically predictive. The conclusion appears inescapable that the total amount of information contained in head films is not sufficiently large to make clinically meaningful predictions possible from this s~ww &one, even if we were able to assess all the contained parameters perfectly. No doubt information from head films would be predictively useful if we could combine it with information from other diagnostic tools which assess entirely different portions of the variance in the system (for example, model analyses, muscle studies, and case histories), but, unfortunately, at the present time we have no satisfactory mechanism for quantitatively interrelating these different kinds of information.)

Regardless of the use to which head films are to be put, it is important to know how accurately measurements on the films are being made and what are the sources of measurement error. Head film measurements, like all other measurements, involve error, as even the earliest investigators were aware. Two general classes of error occur in the estimation of cranial dimensions from head films. The first may be termed “errors of projection.” These result from the fact that the head film is a two-dimensional shadow of a three-dimensional object. Since the rays which produce the shadow are nonparallel and originate from a very small source, head films are always distorted enlargements, the enlargement factor varying with the plane at which the estimated point lies. Head films are further distorted by foreshortening of distances between points lying in different planes and by radial displacement of all points and structures not on the principal axis (or, as orthodontists call it, the central ray). Bjiirk,” Hixon3 Brodie,4 and Sa1zmann5 have each commented on these projectional errors. Adams,6 Wylie and Elsasser,’ Vogel,” and others have attempted to introduce correction factors for some of these sources of error, but the cumbersomeness of the necessary computations has mitigated against the general use of these adjustments. Meaningful systematic corrections for projectional errors could be obtained either by the use of stereo-head films or by the integration of information from lateral and frontal films (as in Broadbent’s method). At the moment, however, neither of these methods is considered practical for routine clinical use. The second general class of errors in head film measurement may be termed “errors of identification.” These are the errors involved in the apparently straightforward process of identifying specific anatomic landmarks on head films. It has long been intuitively recognized that the precision with which we identify the landmarks used in the standard clinical analyses of Downs9 Steiner,1o Tweed,11 Reidel,lz Holdaway13 and others varies from point to point. But while it seems obvious that, for example, “nasion is easier to identify than is point B,” the nature and magnitude of the differences in the precision with which we identify the different landmarks used in standard cephalometric analyses have never, to our knowledge, been quantitatively measured. The present article

Volume Number

60 2

Reliability

of head film measurements

113

reports on the first of two studies directed toward quantitation of errors in landmark identification and toward assessing the effect of these errors upon subsequent angular and linear measurements. Materials

and

methods

As part of a general project designed to develop rapid and precise new techniques for evaluating the effects of orthodontic treatment, a study was designed to cheek operator reliability in the identification of standard cephalometric landmarks. A sample of twenty lateral skull head films was selected at random from among those available in the records of 122 cases whose orthodontic treatment was started and completed at the University of California School of Dentistry between 1954 and 1964. All films were taken with the right side of the patient’s face oriented toward the x-ray tube and the left side nearer the film, with the putative central ray passing through the porion-porion axis. A transparent plastic template having four needle-point perforations in a rectangular configuration at known distances from each other was constructed from a 60 mil. cellulose acetate sheet. The template was overlayed on each head film individually, and each of the four points was registered on each head film by perforating the head film with a fine sewing needle held in a dental broach holder. The template was placed on the head films in such a way that the rectangle defined by the four perforations contained all or almost all of the landmarks to be measured. Each of the twenty films was then “traced” by each of the five members of the first-year postgraduate class at the University of California Division of Orthodontics. All of the judges had been trained to the same criteria, having just completed the same course of training in cephalometric diagnostics. At the time the presently reported evaluations were begun, the class had been in graduate training for 7 months. The intent of the study was to simulate as nearly as possible the range of operator and head film variation in general clinical orthodontic practice. Since it is well known that the difficulty in locating a given landmark varies from head film to head film, each judge was asked to evaluate the relatively large number of twenty head films a single time, rather than to make replicated assessments of a smaller number of films. In a further attempt to simulate the range of variation which occurs in clinical practice, no conscious effort was made to increase interjudge reliability as to the manner in which landmarks were to be located except for the obtaining of consensual verbal agreement as to the definition for each landmark. On the other hand, it was considered desirable to minimize variation from all sources other than differences in judges’ opinions and differences in head film characteristics. For this reason, all films were evaluated using similar light boxes and tracing acetate and under the same conditions of general illumination. Each judge was asked to overlay each head film with a conventional translucent acetate sheet (0.903 inch matte cellulose acetate). He first recorded on the acetate a three-digit random number identifying the film and his own

114

Baumrind

and

Prank

Am.

J. Orthod. August 1971

two-digit identification number. Then he located, marked, and identified with a pencil sixteen standard cephalometric landmarks which had been consensually chosen and defined by the student group as follows : 1. sSella turcica, the midpoint of the pituitary fossa as determined by inspection. 2. Na-

Nasion, suture.

3. Or-

Orbitale, the inferiormost point inent of the orbital images.

4. A-

Point

A, the deepest point on

bone

between

Upper more tooth’s

incisor anteriorly

6. UIE-

Upper upper

incisor central

edge, incisor.

the

tip

of the

incisal

edge

of the

more

anteriorly

placed

7. LIE-

Lower lower

incisor central

edge, incisor.

the

tip

of the

incisal

edge

of the

more

anteriorly

placed

8. LIB-

Lower incisor apex, the point of most anteriorly positioned lower root-end curvature.

5. UIA-

9. B-

the

Pogonion,

11.

Menton,

M-

root

Point B, mandible incisor.

10. P-

most

antero-inferior

ANS

apex, the point of positioned upper end curvature.

the

anteriormost inferiormost

frontal

orbital

margin,

curvature crest

intersection central

the the

more

prom-

long axis of the contour of that

between the the contour

and

of

on the

bony

the

mandible

the crest

long of

anterior of the

chin

at the at

the

the

mesiobuccal

border lower

of the central

midline. symphysis.

14. GOU-

Gonion (upper), the lowest point on the curvature of the angle of the mandible where the body of the mandible meets the ramus. Estimates were made for both sides of the mandible, The more superior value is considered to represent the left side.

15.

Gonion (lower), value is considered

of

the

of the tooth’s

Mesiobuccal left, distally positioned

the occlusalmost point lower first molar.

of

axis the

13. MBL-

Porion, the used in this ture.

point molar.

the

Mesiobuccal more mesially

16. PO-

occlusalmost lower first

nasofrontal

of the surface of the maxillary of the upper central incisor.

the curvature and the alveolar

on

at the

using

between incisor and

intersection incisor

point

bone

12. MBB-

GOL-

right, the positioned

on

point

on the

of the

the alveolar

and the

the deepest point between pogonion

the

point

same as 14 above but for the to represent the right side.

superiormost study, porion

point of the is a machine

image point

lower

of the rather

mesiobuccal

of

cusp

cusp

the

two

of

of

the

more

angles.

cephalostat ear than an anatomic

the

This

As struc-

rod.

Paired values were recorded for gonion and for the lower permanent first molar. This was done because it was thought that best estimates of “mandibular plane” and “occlusal plane” could later be made by averaging these pairs of values. For the purposes of the present study, ‘however, the location of planes is not a consideration, and the two sets of values for “gonion” and for “mesio-

Volume Number

60 2

Reliability

of head film measurements

115

buccal cusp” may be considered to be redundant estimates of a single phenomenon. The data for the study were thus derived from 100 acetate tracings, five for each of twenty head films. On each tracing, all twenty points (four reference and sixteen landmark) had been located. Therefore, for each of sixteen landmarks, 100 independently located estimates were available. The data-reduction problem was to relate the 100 estimates for each landmark to each other in such a way that meaningful conclusions could be drawn. Since precise physical superimposition of five tracings upon each other was considered impossible, a computational solution was developed, employing a specially devised computer program, Because this computational solution has generalized applicability, its characteristics are to be reported in some detail elsewhere. For the present purposes, it is sufficient to say that the coordinates for each of the twenty points on each tracing were first determined with respect to arbitrary X and Y axes, using an Oscar K “coordinatograph.” The landmark coordinate values for all five tracings for each x-ray film were then mathematically superimposed, registered on the four reference points. The five sets of X and Y values for each landmark were then averaged, yielding a best estimate for that landmark for that film. For each head film, the X axis was next redefined as the line connecting the best estimates of S and N. There now existed for each head film sixteen small five-point scattergrams, oriented to the SN line (one scattergram for each landmark, one point in each scattergram from each tracing). Each of the twenty head films in the study yielded such a set of sixteen scattergrams. The twenty small five-point scattergrams for each landmark (one from each head film) were now superimposed mathematically with their origins and axes in common. This yielded a set of sixteen “hundred-point scattergrams,” each representing the dispersion of estimating errors for a single landmark around the best estimate for that landmark and each oriented to the SN line. Sources Any

of

error

factor capable of modifying the magnitude or distribution of observed deviations other than random differences in judgment as to landmark location must be considered a source of error in terms of the present study. Possible sources of error lay in the areas of (I.) representativeness of head films, (Z) representativeness of judges, (3) machine errors in point location, (4) errors in superimposition of tracings, and (5) errors in location of the major axes. Eepresentativeness of head films. As previously stated, the head films used were selectctd randomly from among those available in the University of California files. There is no rea.snn to consider them unrepresentative of films that the average clinician would consider satisfactory for routine clinical use, although it is necessary to note that films from only two x-ray machines are involved. Represe~tatiwenessof judges. It was not possible to select judges randomly, Instead, all available judges were used. However, it is thought that there is no reason to belive that the judges employed are unrepresentative of competent clinicians with average training in eepbt~iometrics. Machine errors in point location. The Oscar K coordinatograph is a device which measures linear distances as a function of variations in the voltage of an electrical circuit. Coordinate values are obtained to the nearest 0.1 mm. In the mode in which we employed it, the machine Precision of this system is stated to have a range of error in reproducibility no greater than fr 0.1 mm.

116

Baumrkd

and Prantz

Am.

J. Orthod. August 1971

Fig. 1. A, Upper portion of a representative head film with hundred-point scattergrams for sella, porion, nasion, and orbitale superimposed semischematically at appropriate scale. B, Lower portion of a representative head film with hundred-point scattergrams for upper incisor apex, point A, lower incisor edge, upper incisor edge, point B, lower incisor apex, pogonion, menton, mesiobuccal cusp (of lower molar), and gonion superimposed semischematically at appropriate scale. Redundant sets of estimates for gonion and mesiobuccal cusp have been omitted for ease in visualization.

Reliddity

of head film measurements

117

tlltll!l!li 01 2345

mm

L

PORION

SCALE

SELLA

‘f

. i .

. .

i t .

i 3..

e .

.

.

+I-+-

.

.

.

.t .

i ORB

ITALE

Fig. 2. Distribution

POINT

NASION of

estimating

errors,

maxillary

skeletal

A

landmarks.

Errors in szcperi~position of tracings. Tracing values were mathematically translated, rotated, and superimposed on the four registration points using a “least squares best fit” program. Any tracing in which the total X + Y error for any registration point exceeded 1 mm. was rejected. The range of residuals from the true value for the four registration points for each tracing averaged 0.7 mm., with a standard deviation approximately 0.3. To the extent that the landmarks under study were located within the configuration of the registration point rectangle, errors in landmark location introduced by errors in tracing superimposition were. by reason of geometry, smaller than the superimpositional errors themselves. Errors in lnajor axis location. This study has in common with other orthodontic studies a susceptibility to generalized error for all landmarks of a given tracing as a result of chance severe errors in the location of the major axis landmarks (sella and nasion). A measure of security was afforded, however, by the fact that the nasion and sella values utilized in establishing the major axes for each head film were the means of five estimates. error their

With the exception of the problem of representativeness would tend to alter the size rather than the shape effects would be random. NO

puter

consequential program since

errors are its precision

introduced extends

of judges, of scattergram

by the mathematical considerabIy beyond

all these sources of distributions, since

computations three decimal

of our places.

com-

Findings

Using an electronic plotting machine, each of the sixteen “hundred-point scattergrams” was generated automatically from the punch card output of the

118

Baumrind

and Prantz

Am.

J. Orthod. August 1971

i--M 0

12345

mm

MENTON

POGONION

SCALE

GONION Fig.

3. Distribution

of estimating

POINT errors,

mandibular

skeletal

I3

landmarks.

specially devised computer program. Fig. 1 is a semischematic montage intended solely to orient the reader physically with respect to the output data. In this figure, transparencies of fourteen of the sixteen “hundred-point scattertheir appropriate positions upon grams ’ ’ are shown mounted in approximately a single representative head film. The head film and the scattergrams are at the same scale and are correctly proportioned as to size. (Of course, each of the scattergrams actually represents combined data from twenty head films, rather than 100 estimates for this film. This accounts for the fact that some of the scattergrams, particularly that for gonion, do not “fit” too well on this particular head film.) Figs. 2, 3, and 4 are enlarged, precise representations of the fourteen scattergrams seen at smaller scale in Fig. 1. They are uniformly scaled and uniformly oriented with respect to the SN line. Fig. 5 tabulates cumulatively the number and per cent of estimates within stated distances from the head film

Reliability

of hetrd film measwements

0-t LOWER&ISOR

119

mm

UPP~J&l;ClSOR

SCALE

. . -0.

a.,

l e..-

iLOWER

INCISOR

APEX UPPER

Fig. 4. Distribution

INCISOR

... . APEX

MESIO-DUCCAL CUSP LOWER FIRST MOLAR of estimating errors, dental landmarks.

mean for each of the distributions shown in Figs. 2, 3, and 4. While it is not appropriate to overemphasize minor differences in rank, it may be seen clearly from Figs. 2 through 5 that there are large differences in reliability of estimation among the several landmarks and that gonion and lower incisor apex are clearly the least reliable landmarks. Simple statistics for the data means, standard deviations, and standard errors are tabulated in Fig. 6. Here dental and skeletal landmarks are listed separately, arranged within groups in order of increasing magnitude of error. Column 1 of Fig. 6 lists the standard deviations for the error distributions for the various landmarks. However, examination of the scattergrams in Figs. 2, 3, and 4 will have revealed that the distributions of error for most landmarks vary in the X and Y directions. For example, in Fig. 3 it may be seen that the estimates of menton are distributed primarily along the horizontal (or X) axis, while the

Fig.

a

5. Number

and

ORDERED

per

cent

of

estimates

within

32

60 52

91 70

99

13

15

OF GREATEST

13

6 LOWER

IN TERMS

17 18

4 LOWER 6 CUSP(L) 5 LOWER6CUSP(R)

RANK

73 36

1 APEX

76

1 EDGE

LANDMARKS

B. DENTAL

1 UPPER

(L)

IO GONION

5 0

2 LOWER 1 EDGE 3 UPPER 1 APEX

(U)

9 GONION

73 53

35

7 ORBITALE 8 POINTB

31

53

23

6 POGONION

stated

NUMBER

49

84 66

96 83

100

23 21

73

75 79

81

63

4 MENTON 5 POINT A

83

100 87

79 57

65

94

24 21

3 NASION

2 SELLA

99

1.5Omm

97

1 .OOmm

.50mm

75 63

of Errors

and Per Cent Less Than

Number

LANDMARKS

1 PORION

A. SKELETAL

56 58

35 46

from

head

WITHIN film

mean

lmm

for

specified

OF HEADFILM

6 11

6 2

IO 9

4 4

i Ii

OF ESTIMATES distances

2

1 2

3 3

2

-l---T

28 30

2

3

2

landmarks.

MEAN

7 22 39

17

73

66

13 20

10

8 12

6

51

16 34

17

4

79

77

21 27

25

17 19

13

87

85

27 47

47

37

43

21

100

95

65 69

77

79

76

35

100

100

100 100

100

100 100

100

CUMULATIVE LISTING OF NUMBER AND PER CENT OF ERRORS E UAL TO C GREATER 1-l rHAN SPEl FIEI i/AL\ is 4.00 3.00 2.00 1.50 1 .oo .oo 5.00 L50 1 3 25 100 37 6 100

1

Reliability

of head film measurements

121

estimates for pogonion are distributed primarily along the vertical (or Y) axis. For this reason, the X and Y components of the total variance for each landmark were isolated, yielding the separate standard deviations in the X and Y directions shown in columns 2 and 3 of Fig. 5. (This principle is represented graphically in Fig. 6 for the landmarks menton and pogonion.) The estimating errors with which we have been dealing have both magnitude (represented by arithmetic value) and direction (represented by arithmetic sign). The question now arises: What is the mean magnitude of error when single estimates are made of a landmark’s position? This question is important because in the usual clinical situation, at least to date, only a single estimate is made of the location of each landmark. Column 4 of Fig. 5, therefore, lists the estimated mean errors for single estimates for each landmark, obtained as the mean of the absolute (that is, unsigned) values of each group of 100 estimates. (A cautionary note is in order here concerning the process of stacking each group of twenty small scattergrams to make one large one. It will be remembered that the mean of each group of five estimates per landmark per film was considered the best estimate of the true value and was defined as lying at the scattergram origin. Actually, however, while the mean of the five estimates is the best estimate of the true value, there is no assurance that it is, in fact, the trzle V&M, that it, in fact, does lie at the origin. Indeed, the collection of means of the twenty small five-point scattergrams from each landmark may more properly be considered as a single sample estimating the true landmark position, each member of which is based on five replications. Considered in this way, the mean of the twenty small scattergrams for each landmark constitutes a single sample of N = 20 from a possible infinite population of such samples. The precision of the estimate of the mean of these is, therefore, equal to o/vZO, rather than the more favorable CT/ v/100 which appeared at the outset to be the case, and

the confidence interval around the best estimate must be widened accordingly. Approaching this idea in another way, we would note that while each of the 100 estimates for each landmark has been made independently, the coordinate values of the estimates are not independent, since each value depends on the values of the other four members of its group of five. Therefore, for each landmark there are properly only twenty completely independent groups of coordinate values. The computations for standard deviation and for sampling error of the mean recorded in Fig. 5 have been adjusted in such a way as to take into account this lack of complete independence among tracings. All standard deviations have been computed using the formula

where N = the number of head films, K the number of tracings per head the deviation of an individual tracing value from the mean value for that film. The denominator of the variance in the present study, therefore, or 80. The values for standard error of the mean given in Fig. 5, column

We may therefore say that at approximately the 95 best estimate of the average error with which each indicated interval. In terms of the graphic representations of Figs. for the groups of five tracings should be distributed

per cent landmark

film, and each d that landmark for becomes 20(5-l), 4, are

computed

level of confidence is estimated lies

2, 3, and 4, the fact around the scattergram

that

the within

the origin

as

true the means rather

122

Baumrind

and Prantz

A. SKELETAL

Am.

J. Orthod. August 1971

LANDMARKS MEASURES

OF DISPERSION SDyC

SD, b

SD=

1

MEAN ESTIMATING ERROR d

1.

PORION

.53

.36

.38

.39

_+ .I3

2.

SELLA

.64

.44

.46

.48

f

.I4

3.

NASION

1.46

.60

1.33

.73

+

.52

4.

MENTON

1.38

1.25

1.00

-+ .36

5.

POINT

1.41

.55

.59 1.29

1.00

3x .37

6.

POGONION

1.44

.59

1.32

1.06

f

7.

ORBITALE

1.91

1.03

1.09

+

.65

1.27

5

.60

A

.36

8.

POINT

1.97

.64

1.61 1.86

9.

GONION

(U)

4.71

3.33

3.34

3.48

z!z 1.12

0.

GONION

(L)

5.21

3.71

3.53

3.75

f 1.10

1.05 1.32

f f

B

L

B. DENTAL

LANDMARKS

1.

UPPER

2.

LOWER

3. 4.

UPPER 1 APEX LOWER 6 CUSP (L)

5. 6.

LOWER LOWER

a STANDARD

1 EDGE 1 EDGE

6 CUSP (RI 1 APEX DEVIATION

FOR

TOTAL

.50 .59

ERROR

b STANDARD

DEVIATION

FOR

ERROR

IN HORIZONTAL

c STANDARD

DEVIATION

FOR

ERROR

IN VERTICAL

DIRECTION DIRECTION

d SAMPLEMEAN+2SEM

Fig. 6. Estimating than at relative estimating however,

errors

for

selected

the origin itself implies that while to each other, they are all slightly error. The general configuration, remain unaffected.)

the

head

film

landmarks.

scattergrams are correct constricted and all slightly scale, and relative sizes of

in size and shape understate the the scattergrams,

Discussion

Perhaps the most important inferences to be drawn from the foregoing data are the most obvious ones: First, that even when one is replicating assessments of the same head film, errors in landmark identification are too great to be ignored; second, that the magnitude of error varies greatly from landmark to landmark; and, third, that the distribution of errors for most landmarks is not random but is, rather, systematic, in the sense that each landmark has its own characteristic and usually noncircular envelope of error.

IZeliability

VI7

of head film measurements

123

MENTON

POGON

i0N

Fig. 7. left, upper and lower: Distributions of estimating errors for menton and pogonion. Circles enclose areas one, two, and three standard deviations from the origin or best estimate and illustrate the total standard deviation represented in column 1 of Fig. 6. Right, upper and lower: Distributions of estimating errors for menton and pogonion. Rectangles enclose areas one, two, and three standard deviations from the origin or best estimate when the total variance is partitioned into X and Y components, as represented in columns 2 and 3 of Fig. 6. Note that the figures on the right encompass the dot configurations more efficiently and enclose less extraneous space than do those on the left. This illustrates the superiority of describing error in terms of the X and Y components over the use of a single statement for total error. Note, too, that the X and Y statements of columns 2 and 3 of Fig. 6 correctly reflect the fact that the X error component is larger than the Y component for menton, while the opposite is the case for pogonion.

It may be further said that the perceptual task involved in identifying landmarks varies from point to point. Most often the judge is asked to estimate the position of a point on an edge. The precision with which this operation is carried out is, in large part, a function of how sharply the edge folds in the region of the point being estimated. Where the edge folds very sharply (as at the upper incisor edge or the lower incisor edge), the estimates are very good indeed. However, where the edge is a gradual curve (for example, in the region of point A, point B, or gonion), the task is rendered more difficult and the errors tend to br: proportionately larger and to be distributed along the edge itself (that is, along the surface of the skull). This condition also appears to hold true in the ca.ses of pogonion and menton, both of which proved more variable than had been expected.

124

Baumrind

and Prantz

Am.

J. Orthod. August 1971

A corollary factor is the sharpness of the viewed edge-the degree to which the edge contrasts with the surrounding area. Most of the points that orthodontists locate lie upon surfaces of the skull. Those structures which, on the contrary, lie within the confines of the skull have a greater likelihood of being confounded by “noise” from adjacent or superimposed structures. This consideration almost certainly accounts for the difficulty that we have in locating accurately the cusps of posterior teeth. There are, of course, some landmarks for which the confounding “noise” from adjacent structures is so great that the judge is asked to estimate the position of a point for which there is frequently no direct physical evidence at all on the head film. Lower incisor apex is an example of such a landmark (as are, we believe, the apices of the roots of posterior teeth and the positions of various points on the condyle, neither of which were investigated in the present study). In the case of lower incisor apex, the judge is frequently forced to project the position of the point as a conceptual operation based upon his general knowledge of how long a tooth usually is and what is the expected rate of taper, given the perceived conformation of the crown and visible portion of the root. In making such conceptual judgments, the prior experience of the judge is an important factor, and it is in judgments of this sort that an experienced operator should tend to be more reliable than a novice. We do not contend that conceptual judgments are invalid, but one should not be surprised to find them more variable than judgments for perceived points. A further problem appears to be rigor of definition. For example, orbitable had been defined by us in terms of the “more prominent” orbit. Retrospective examination of our data has established that some of the errors in location of this point involved differences in opinion as to which orbit was the more prominent. Gonion is another example of augmentation of error due to weak definition. A detailed examination of the estimates for this point showed definite and consequential systematic interjudge differences resulting from differences in opinion as to where the ramus and the body of the mandible meet. As a result of examining our errors due to definition, we have, ourselves, now redefined gonion as a tangent swung from menton and have redefined orbitale in terms of the more anterior (rather than the more prominent) orbit. Some remarks on porion, sella, and nasion seem in order. With respect to porion, it must be recognized that the observed high reliability is attributable in large measure to the fact that, in our study, porion is defined as a machine point rather than an anatomic one. Sella presents a unique problem among the points in the present study in that it involves visual estimation of the center of a structure. We should not be surprised to find that performance of such a task is quite good, since visual estimation of midpoints is a kind of mental averaging process yielding means of reduced dispersion. Had either the anterior or posterior clenoid processes been estimated, it is likely that the dispersion would have been greater. Nasion is an especially important point, since a very large number of clinically employed angular relationships are based on the line SN, For the most part, the estimates for this point were quite good, but t.here was a disquieting

Volume Number

60 2

Reliability

of head film nleas~~reme?~ts 125

number of gross errors, as may be seen in Fig. 1 and on line 3 of Fig. 4. These outliers, which produce the unexpectedly large standard deviations of line 3 in Fig. 5, are not the result of minor differences in opinion as to the contours of the nasofrontal suture. Rather, they are the result of identifications of entirely different structures. It is obvious that an entire clinical diagnosis can be distorted by one such misjudgment. The question of the ramifying effects of a single bad estimate of nasion brings us to a consideration of the problem of outliers in general. There are those who would contend that the number of outliers for any given landmark is small and tha.t we, therefore, can reasonably discard from consideration, as a rare a,nd unlikely event, that portion of each scattergram of errors which lies more than, say, two standard deviations from the origin. The fallacy of this line of reasoning becomes strikingly apparent, however, the moment we recall that every standard system of head film analysis makes use of measures of a large number of landmarks. For example, the probability that a given estimate for a given landmark will differ by chance from the true value by more than two times the standard deviation for that landmark is approximately 0.05. Therefore, if wt’ estimate t,he positions of two landmarks, the probability that we will be successful both times is 0.95 x 0.95. If we conduct a conventional cephalometric analysis involving est,imations for sixteen landmarks (not an unreasonable number), the probability that we will locate all sixteen landmarks successfully {defined in the same sense) is 0.9516, or 44 per cent. Thus, there are fifty-six chances in 100 that the analysis will contain, on the average, at least one landmark value which deriates by more than two times the standard deviation from the true head film value for that landmark. When we consider the fact that head film tracings are used iI1 pairs for comparisons, the average chance that all thirty-two estimations in the pair will be located with errors of less than two standard deviations falls to 0.44 :i 0.44, or 19.4 per cent. Thus, we establish that the probability of comparing tracings of two films (say, pre- and posttreatment) without having at least one value on at least one tracing in error by more than two standard deviations is slight’lv less than two chances in ten! It may then be seen that the vast majority of head film analyses based on a single estimate for each landmark cannot help but be flawed, t,he more so since we have absolutely no way of gauging which estimates are the bad ones. It is sometimes contended that, while great precision may be required for research procedures, estimating accuracy of lower orders is sufficient for “routine clinical judgments.” Unfortunately, to the extent that clinicians base their actual clinical procedures on values from cephalometric analyses, precisely the opposite is the case. This is true because research conclusions are drawn on the basis of the means of samples of considerable size using statistical tests which have builtin controls and which assess penalties in the event of consequential measurement errors, while in traditional clinical analyses there are no controls whatever for error. How can one introduce such controls into clinical head film analysis? The obvious answer is to replicate measurements. If each of the sixteen values for each head film is measured independently twice and the above two-standard-

Baumrind

126

and

Am.

Frantz

J. Ort?kod. August 1971

deviation criterion is applied, the probability that the mean of any pair of’ independent estimates will still be more than two standard deviations from the true value is the probability of erring in the Same direction (or tail) from the true due in both estimates, namely, 0.025 x 0.025, or 0.625 per cent. The likelihood, on the average, of completing a sixteen-point replicated analysis on a single film without undetectable error becomes l-16 (.000625), or 98 per cent. The average likelihood of completing a comparison of two such films without error is increased to [l-16 (.000625) ] x [l-l6 (.000625) 1, or 98 per cent. Were one to use more than two replications, the likelihood of undetected error would undergo a further exponentially rapid decrease. But how can one make replicated

analyses of head films as a routine

pro-

cedure? Obviously, the answer is that such procedures could be done practically only with the aid of automatic data-reduction systems. The procedure developed at the University of California in the course of the present study constitutes one such system. There are other systems already, and obviously there will be many more in the future. It is not too early, however, to conclude that the control and reduction of estimating errors through replication of analyses will provide a major portion of the rationale for the use of computer equipment in routine clinical head film diagnostics. Summary

Using automatic coordinate-locating equipment and a specially devised computer program, assessments have been made of the reliability of identification of sixteen standard head film landmarks in replicated tracings of the same head film. The distribution of error for each landmark has been represented graphically, and simple statistical analyses have been tabulated and discussed. Large differences in magnitude and configuration of envelope of error were found among the different landmarks, and an attempt has been made to account for the differences observed. The suggestion is made that the impact of the observed errors in landmark location on clinical decisions can be reduced through the routine use of replicated estimates for each landmark. However, such a procedure is considered to be feasible only if automatic data-reducing equipment is employed. A subsequent article will report on the effects of the observed errors in landmark location upon the values of twenty-three conventionally used angular and linear measures. It is to be noted that this study does not address itself to the separate problem of the difficulty in identifying the same structure reliably on different head films of the same subject. We wish to acknowledge the gracious assistance of Dr. Robert Elashoff, Chief, Research San Francisco. Dr. David Goheen, Research Systems Division, University of California, specifically for use in this study. The Systems Division, developed the program ‘xray” programming task was materially facilitated by the generous cooperation of Professor Paul Wolf, Department of Civil Engineering, University of California, Berkeley. REFERENCES

1. Broadbent, B. H.: A new x-ray Orthod. 1: 45-66, 1931.

technique

and its application

to orthodontia,

Angle

Volume Number

60 2

l2eliability

of head

film

measuremeds

127

2. Bjiirk, A., and Solow, B.: Measurements on radiographs, 5. Dent. Res. 41: 672-683, 1962. 3. Nixon, E. H.: Cephalometrics and longitudinal research, AM. J. ORTHOD. 46: 36-42, 1960. 4. Brodie, A. G.: Cephalometrie roentgenoIogy: History, techniques and uses, J. Oral Surg. 7: 185-198, 1949. 5. Salzmann, J. A.: Limitations of roentgenographic cephalometrics, AM. J. ORTHOD. 50: 169-188, 1964. 6. Adams, J. Iv’.: Correction of error in cephalometric roentgenograms, Angle Orthod. 10: 3-13, 1940. 7. Wylie, W. L., and Elsasser, W. A.: Undistorted vertical projections of the head from lateral and posteroanterior roentgenograms, Am. J. Roentgenol. 60: 414-417, 1948. 8. Vogel, C. J.: Correction of frontal dimensions from head x-rays, Angle Orthod. 37: I-S, 7 96i. 9. Downs, W. B.: Variations in facial relationships: Their significance in treatment and prognosis, AM. J. ORTHOD. 34: 8X-840, 1948. 10. Steiner, C.: Cephalometrics for you and me, AM. J. ORTHOD. 39: 729-755, 1953. 11. Tweed, C.: The diagnostic facial triangle in the control of treatment objectives, AK ;1. ORTHOD. 55: 651-667, 1969. 12. Reidel, R.: Analysis of dentofacial relationships, AM. J. ORTHOD. 43: 103-119, 1957. 13. Holdaway, R.: Changes in the relationships of points A and B, AK J. ORTHOD. 42: 176” 193, 7956.