Evaluation of the peer assessment rating (PAR) index as an index of orthodontic treatment need

Evaluation of the peer assessment rating (PAR) index as an index of orthodontic treatment need

ORIGINAL ARTICLE Evaluation of the peer assessment rating (PAR) index as an index of orthodontic treatment need Allen R. Firestone, DDS, MS,a F. Mich...

80KB Sizes 10 Downloads 130 Views

ORIGINAL ARTICLE

Evaluation of the peer assessment rating (PAR) index as an index of orthodontic treatment need Allen R. Firestone, DDS, MS,a F. Michael Beck, DDS, MA,b Frank M. Beglin, DDS, MS,c and Katherine W. L. Vig, BDS, MS, FDS, D Orthd Columbus, Ohio, and Napa, Calif The need for orthodontic treatment has an objective component based on occlusal traits and a subjective component based on the esthetic impact of the occlusion. An occlusal index that measures the objective deviation from normal or ideal occlusion might be sufficient to mirror the subjective opinion of orthodontists about treatment need. The objective of this study was to determine whether the American (US) and United Kingdom (UK) weightings of the peer assessment rating (PAR) index are valid instruments with which to determine treatment need. Fifteen orthodontists rated the need for orthodontic treatment of 170 casts. Their collective decision was compared with the PAR value for the cast determined by a calibrated examiner. A range of suggested treatment cutoff points from the literature was used to generate receiver operating characteristic (ROC) curves and optimized cutoff points. The cutoff points were 17 for both the US PAR and the UK PAR, and sensitivity, specificity, and kappa were 92%, 86%, and 0.77 for the US PAR and 92%, 89%, and 0.80 for the UK PAR. The area under the ROC curve was 97% for the US PAR and the UK PAR. Both the US PAR and the UK PAR scores were excellent predictors of orthodontic treatment need as determined by a panel of orthodontists. An occlusal index used to measure deviation from normal or ideal occlusion might perform as well as indexes of treatment need in predicting orthodontists’ evaluations of treatment need. (Am J Orthod Dentofacial Orthop 2002;122:463-9)

T

raditional orthodontic diagnosis is a qualitative, descriptive procedure unsuited to quantitative evaluation of treatment need. As a result, several quantitative systems of assessing malocclusion and evaluating treatment need have been developed in the last 50 years.1,2 These indexes are systems of procedures that generate and summarize data about the malocclusion and return a numeric value. For indexes of treatment need, there is a point below which the deviation from normal or ideal occlusion, ie, the malocclusion, is so minor that there is no need for treatment; all values above that point indicate malocclusions for which treatment is needed. In effect, an index with a cutoff point functions as a diagnostic test for treatFrom The Ohio State University, Columbus. a Associate professor, Department of Orthodontics. b Associate professor, Department of Oral Biology. c Former resident, Department of Orthodontics, now in private practice, Napa, Calif. d Professor and chair, Department of Orthodontics. Reprint requests to: Dr Allen Firestone, College of Dentistry, 305 West 12th Ave, PO Box 182357, Columbus, OH 43218-2357; e-mail, [email protected] Submitted, Novenber 2001; revised and accepted, April 2002. Copyright © 2002 by the American Association of Orthodontists. 0889-5406/2002/$35.00 ⫹ 0 8/1/128465 doi:10.1067/mod.2002.128465

ment need. The gold standard for determining treatment need has generally been considered to be the expert opinion of orthodontists. Most indexes constructed to measure treatment need measure various features of the occlusion, award points for each trait depending on the deviation from normal or ideal occlusion, multiply these points by a factor depending on the feature’s importance, and sum the points for all features of the malocclusion. Most indexes of treatment need are health based; ie, the underlying assumptions are that malocclusion and its features are associated with ill health later in life. Some other indexes are based on esthetic impairment because the assumed psychosocial consequences of malocclusion are the most significant sequelae.3,4 Some indexes combine the presumed dental health components of malocclusion with esthetic components.1,5 Representative indexes from each of these broad categories have been shown to agree with professional orthodontic opinion about the need for orthodontic treatment.6-8 To our knowledge, none of the indexes has been validated regarding its underlying principles; ie, persons with high treatment need who did not receive orthodontic treatment have not been shown to suffer 463

464 Firestone et al

more dental health or psychosocial problems than treated patients with no residual need for treatment. Orthodontic indexes have been categorized according to their stated purpose as diagnostic, epidemiologic, treatment need, treatment complexity, and treatment outcome.2,9 All, however, include measuring and weighting the occlusal features as a basic part of the process.3,5,10-12 Thus, it would seem that whatever the stated purpose of the orthodontic index, as long as it measures or categorizes occlusal features systematically, it can serve as an index of treatment need. In fact, indexes have been used for other than their stated purposes.13,14 The peer assessment rating (PAR) index is an occlusal index designed and validated as an instrument to measure how much a patient deviates from normal alignment and occlusion.15 The PAR index was designed to measure the success or the outcome of treatment by comparing the severity of the initial malocclusion with the result on pretreatment and posttreatment casts.15 The index measures maxillary and mandibular anterior alignment (crowding and spacing), buccal segment occlusion (anterioposterior, transverse, and vertical), overjet (including anterior crossbite), overbite, and midline discrepancies. The index has been validated in the United Kingdom (UK PAR) and, with different weightings and eliminating the mandibular anterior alignment component, in the United States (US PAR).15,16 The PAR index has been used widely for evaluating the effects of treatment in a variety of circumstances.17-19 However, there is disagreement about using the PAR index in determining treatment need.14,20 One group of investigators concluded that it is unsuitable as an index of treatment need.20 Another group developed a model for the PAR index that was highly correlated with orthodontists’ subjective opinions of treatment need.14 There have been no reports of a large panel of orthodontic experts used as a gold or truth standard against which to compare the PAR index as an index of treatment need. The aim of the present study was to assess the validity of the PAR index used to determine treatment need when compared with the expert opinion of a panel of orthodontists. MATERIAL AND METHODS

One hundred seventy casts were used in this study. They represented treated and untreated study cases from the orthodontic departments of the University of Pittsburgh and The Ohio State University. This set of casts has previously been used, in part or in whole, to validate other indexes in western Pennsylvania and

American Journal of Orthodontics and Dentofacial Orthopedics November 2002

central Ohio.6-8 These casts represent a full spectrum of malocclusion types and severities. A description of the severity of the malocclusions has been presented previously; 84 of the 170 casts were classified as having little or moderate need of treatment.6 Of the 17 casts in the mixed dentition, most were in the late mixed dentition and generally had only 1 to 4 deciduous second molars remaining. The remaining casts were in the permanent dentition. Volunteers were solicited from members of the American Association of Orthodontists whose practices were within a 20-mile radius of Columbus, Ohio. Those invited to participate had a minimum of 5 years experience as practicing orthodontists, and 15 volunteers were selected to be raters. The design of the study was approved by the Human Subjects Institutional Review Board of The Ohio State University. Sample size calculations were based on the validation study of the PAR index.15 The minimum number of study casts was 160, and the minimum number of raters was 11, to achieve a study with a power of 0.80 at an alpha level of 0.05. The 15 orthodontists rated the 170 casts and recorded the need for treatment of each as a score from 1 to 7 on an adjectival scale on which 1 equals no or minimal need and 7 equals very great need. At a second session, about 30 days after the first, each rater again assigned a score to each of 40 casts in a subset to test intrarater reliability. At the beginning of both sessions, the following verbal and written instructions were given to the raters: You are the orthodontic consultant for a private corporation for which a fund has been established to provide orthodontic treatment for personnel. You are to evaluate these study casts of personnel and answer the following question: In your opinion, to what extent does this occlusion need orthodontic treatment? Please circle the corresponding number (1 to 7). At the end of the second session, each rater was asked to answer the following question: “On the 7-point scale that you have used throughout this rating session, indicate the score at or above which you feel orthodontic treatment is indicated.” This score was termed the indicated treatment point (ITP) and was recorded for each rater. One examiner (A.R.F.), trained and calibrated in the PAR index, scored the 170 study casts using the PAR index. Values for the components of PAR were recorded, and both the US PAR and the UK PAR were calculated. A month later, the same 40 casts scored by the raters was scored again by the calibrated examiner to test intraexaminer reliability.

Firestone et al 465

American Journal of Orthodontics and Dentofacial Orthopedics Volume 122, Number 5

STATISTICAL ANALYSIS

The simple kappa statistic was used to assess agreement of the PAR index with the expert panel. Weighted kappa statistics were used to assess both intrarater and interrater reliability.21 The kappa statistic is a measure of agreement that has been corrected for chance agreement.22 A kappa of 0 indicates no agreement beyond chance, and a kappa of 1 indicates perfect agreement. Interexaminer reliability was evaluated by the intraclass correlation coefficient.23 Intraexaminer and intrarater reliability was calculated with the initial and 1-month-later scores for a subset of the 170 casts scored by each member of the expert panel and the calibrated examiner. Interrater reliability was calculated by comparing all raters of the entire 170 sets of casts during the first round. The truth or gold standard was determined in the following manner. The mean ITP for the 15 raters was calculated, and the mean rater score for each cast was calculated. Finally, the mean rater score for each cast was compared with the mean ITP value, and, if the score was below the mean ITP score, the cast was assigned to the no-treatment category. If the mean rater score for a cast was equal to or greater than the mean ITP value, it was assigned to the treatment category. The PAR index has no cutoff point for treatment need. Initial estimates were calculated from data in the literature that a cutoff point between 10 and 20 would result in maximum accuracy.14,24 The cutoff point varied between 10 and 22. Each study cast was assigned to the treatment or the no-treatment category by comparing the calibrated examiner’s score with the cutoff point. For each cast, the mean decision of the raters, the gold standard, was compared with the decision assigned by the calibrated examiner using the PAR index. From these comparisons, the following values were calculated: sensitivity, specificity, positive and negative predictive values, accuracy (percentage of agreement), and kappa statistic. Sensitivity is the percentage of all cases needing treatment that the index so identified. Specificity is the percentage of all cases not needing treatment that the index so identified. Positive and negative predictive values are the percentages of cases that the index correctly identified as needing (positive) or not needing (negative) treatment. Accuracy was defined for this study as the percentage of agreement with the decisions of the expert panel; this measure did not take into account agreement due to chance. Correlation between each index and the expert panel’s score was evaluated with the Spearman coefficient (rho). An optimized cutoff point for the PAR index was deter-

mined by plotting a receiver operating characteristic (ROC) curve. Finally, several indexes of treatment need—the dental aesthetic index (DAI), the handicapping labiolingual deviation with the California modification (HLD[CalMod]), the index of orthodontic treatment need (IOTN) including the dental health (DHC) and the esthetic (EC) components, and the index of complexity, outcome, and need (ICON)—were used to evaluate the casts for treatment need, and those scores were compared with the decisions of the expert panel using the indicators of diagnostic accuracy outlined above.1,3,5,10 These data were derived when the indexes had been validated against the same gold standard of orthodontic treatment need as determined by the 15 orthodontists and the 170 casts used in this study.6,7 RESULTS

Both the calibrated examiner and the panel of orthodontic experts showed high levels of reliability. The calibrated examiner demonstrated high intraexaminer reliability for the 40 casts that had been evaluated twice. The intraclass correlation coefficient (lower confidence boundary, upper confidence boundary) was 0.988 (0.981, 0.993) and 0.990 (0.984, 0.995) for the UK PAR and the US PAR, respectively. The 15 raters had a high level of interrater reliability; the overall weighted kappa value was 0.81 (0.81, 0.82).6 For intrarater reliability, the 15 raters also achieved high levels of reliability. The overall weighted kappa value was 0.92 (0.90, 0.93) for the 40 casts that had been evaluated twice by each rater.6 The mean ITP for the 15 raters (⫻ ⫾ SD) was 3.53 ⫾ 0.74. Casts with a mean score equal to or greater than 3.53 were assigned to the treatment category, and the remaining casts, with scores were below 3.53, were placed in the no-treatment category. There were 108 (64%) casts in the treatment category and 62 (36%) casts in the no-treatment category. Based on the index score as determined by the calibrated examiner and the cutoff point for the US PAR and the UK PAR (10-22), each of the 170 casts was assigned to treatment or no-treatment categories. The diagnostic performance of the indexes at various cutoff points is shown in Table I. Accuracy of the decisions to treat or not treat at the various cutoff points was 84.1% to 90.6% for the US PAR and 83.5% to 90.6% for the UK PAR. The overall agreement (simple kappa coefficient) with the gold standard (the orthodontists’ decisions) ranged from 0.622 to 0.788 for the US PAR and 0.670 to 0.793 for the UK PAR. For comparison, for 4 indexes of treatment need, the value of indicators of diagnostic accuracy at their

466 Firestone et al

Table I.

American Journal of Orthodontics and Dentofacial Orthopedics November 2002

Performance of indexes at various cutoff points

Index US PAR

UK PAR

Cutoff point

Sens

Spec

PPV

NPV

Acc

Kappa

Misclassified

10 12 13 14 15 16 17 18 19 20 21 22

100.0 100.0 100.0 99.1 98.2 94.4 91.7 87.0 87.0 86.1 83.3 78.7

56.5 66.1 74.2 74.2 77.4 80.7 85.5 90.3 90.3 93.6 93.6 95.2

80.0 83.7 87.1 87.0 88.3 89.5 91.4 94.0 94.0 95.9 95.7 96.7

100.0 100.0 100.0 97.9 96.0 89.3 85.5 80.0 80.0 79.5 76.3 72.0

84.1 87.7 90.6 90.0 90.6 89.4 89.4 88.2 88.2 88.8 87.1 84.7

0.622 0.713 0.785 0.773 0.788 0.767 0.772 0.753 0.753 0.768 0.734 0.691

27 21 16 17 16 18 18 20 20 19 22 26

10 12 13 14 15 16 17 18 19 20 21 22

100.0 99.1 99.1 96.3 95.4 94.4 91.7 88.9 87.0 81.5 77.8 76.9

66.1 69.4 72.6 79.0 82.3 82.3 88.7 90.3 90.3 91.9 95.2 95.2

83.7 84.9 86.3 88.9 90.4 90.3 93.4 94.1 94.0 94.6 96.6 96.5

100.0 97.7 97.8 92.5 91.1 89.5 85.9 82.4 80.0 74.0 71.1 70.2

87.7 88.2 89.4 90.0 90.6 90.0 90.6 89.4 88.2 85.3 84.0 83.5

0.713 0.729 0.758 0.777 0.793 0.780 0.798 0.776 0.753 0.698 0.680 0.670

21 20 18 17 16 17 16 18 20 25 27 28

Sens, Sensitivity; Spec, specificity; PPV, positive predictive value; NPV, negative predictive value; acc, accuracy. Table II.

Comparison of PAR with other indexes

Index US PAR UK PAR DAI‡ HLD (Cal Mod)‡ IOTN DHC‡ IOTN EC‡ ICON§

Cutoff point*

Sens

Spec

PPV

NPV

Acc

Kappa

Misclassified

Area

Rho†

17 17 31 26 4 8 44

91.2 91.7 77.8 41.7 75.9 48.2 94.4

85.5 88.7 91.9 98.4 98.4 100.0 85.5

91.4 93.4 94.4 97.8 98.8 100.0 91.9

85.5 85.9 70.4 49.2 70.1 52.5 89.8

89.4 90.6 82.9 62.4 84.1 67.1 91.2

0.772 0.798 0.654 0.330 0.684 0.404 0.808

16 16 29 64 27 56 15

96.7 96.7 94.9 94.0 96.4 94.4 96.7

0.91 0.91 0.86 0.85 0.87 0.87 0.91

‡Values from Beglin et al.6 §Values from Firestone et al.7 *At or above which the patient requires treatment. †Correlation with treatment need (expert panel cast score); all significant at P ⬍.0001. Sens, Sensitivity; Spec, specificity; PPV, positive predictive value; NPV, negative predictive value; Acc, accuracy; Area, area under ROC curve, Rho, correlation coefficient.

suggested cutoff points are given in Table II. These data were derived when the indexes were validated against the same gold standard of orthodontic treatment need as determined by the 15 orthodontists and the 170 casts.6,7 The area under the ROC curves for the US PAR and the UK PAR (Figs 1 and 2), 97%, indicates the high validity of the index—ie, the degree to which the index reflects the decisions of the gold standard panel of orthodontists. A perfect diagnostic test would have an area under the ROC curve of 1.0, or 100%.

The ROC curves can be used to locate an optimized cutoff point: the point most superior and to the left on the curve (Fig 1). This is equivalent to plotting sensitivity and specificity of the PAR index at all possible cutoff points; the PAR value where the specificity and sensitivity curves cross would be the optimum cutoff point. Optimized cutoff points for the US PAR and the UK PAR were 17 and 17, respectively. At these cutoff points, sensitivity, specificity, and kappa were 91.7, 85.5, and 0.77 for the US PAR and 91.7, 88.7, and 0.80

American Journal of Orthodontics and Dentofacial Orthopedics Volume 122, Number 5

Fig 1. ROC curve for US PAR components and weightings.

for the UK PAR. The number of misclassified cases was 18 and 16, respectively, for the US PAR and the UK PAR. When the decisions of the individual orthodontists were compared with the decision of the entire group, the mean number of cases misclassified was 16, with a range from 2 to 39 misclassified cases. DISCUSSION

The accuracy of the treatment need decisions with the PAR index was very good (Table I). A perfect diagnostic test would have an area under the ROC curve of 100%. The area under the ROC curves for the US PAR and the UK PAR was 97%, graphically demonstrating how well the index reflects the decisions of the panel of orthodontists. At the optimized value derived from the ROC curve in Figure 1, the PAR index was more accurate and in better agreement with the orthodontists’ opinion about the need for treatment than most of the other indexes. In fairness, the results achieved with the other indexes increased dramatically when the cutoff points were optimized for accuracy with the specific ROC curve for that index generated at various cutoff points.6,7 The area under the ROC curve for the other indexes was equal to or greater than 94%, demonstrating that they, too, can accurately reflect the decisions of the panel of orthodontists.6,7 These results confirm and extend the results of previous investigators concerning the accuracy of the PAR index as an index of treatment need.14,24 For our study, the gold or truth standard was determined by 15

Firestone et al 467

Fig 2. ROC curve for UK PAR components and weightings.

orthodontists in private practice. Our results can apply generally to decisions made by orthodontists in the central Midwest. The optimal cutoff value for the US PAR, maximizing agreement with the decisions of the orthodontic experts, was 17; this differs somewhat from that indicated by previous investigators.14 In this earlier study, 4 orthodontists each evaluated a subset of patients clinically, and the PAR index value was obtained from plaster casts made from wax bites. A third of the sample was judged to require orthodontic treatment by the examining orthodontist. These factors differed from the present study in which all 15 orthodontists constituting the expert panel examined each cast and whose mean value determined treatment need. Furthermore, both the orthodontists and the PAR examiner evaluated the same plaster models of the complete dentition. Finally, the sample was selected to present a significant number of borderline cases.6,25,26 Thus, although both studies found the PAR index to be in excellent agreement with orthodontists’ subjective assessments of orthodontic treatment need, differences in the optimal cutoff point for determining treatment need are almost certainly the result of methodologic differences between the 2 studies. A study that compared 3 indexes, including the PAR index, in ranking orthodontic treatment need concluded that the PAR index was unsuitable for estimating treatment need.20 Neither of the other indexes, the indication index (II) and the modified index of the Swedish Medical Health Board (ISMHB), has

468 Firestone et al

previously been validated with a large sample of practicing orthodontists.27,28 It is difficult to evaluate the results from this study because only the correlation between the decisions of the indexes was evaluated but not the validity of the decisions. Theoretically, it is possible to have perfect correlation and zero validity. Although the PAR index was shown to be as accurate as or more accurate than indexes of treatment need in the present study, all the other indexes were used at their recommended cutoff points. These indexes and their cutoff points have not been determined by a panel of orthodontic experts or have not been validated for American orthodontic opinion. When the cutoff points were optimized to most closely mirror the orthodontists’ expert opinion, the measures of the accuracy of the index increased greatly and were similar to the values for the PAR index in the present study.6,7 Several indexes’ suggested cutoff points were not at the point most superior and to the left on the ROC curve when compared with the opinion of the orthodontists.6 However, the area under the ROC curve best describes the validity of a diagnostic test in its most general terms. The significance of the area under an ROC curve in the present study can be described as representing the probability that a randomly chosen subject needing orthodontic treatment is correctly rated or ranked with greater need than a randomly chosen subject with no need for orthodontic treatment.29 It has been proposed that an ROC curve is a more meaningful measure of the value of a diagnostic test than accuracy, or the percentage of cases for which the index is correct. This is because, unlike accuracy, the ROC curve does not depend on the prevalence of a disease in the population, nor will 2 tests with the same accuracy but different sensitivity and specificity give the same ROC curves.30 In effect, a diagnostic test is described by its ROC curve. By changing the cutoff point, one can move along the curve but not change the curve. Two people with the same index score near the cutoff point might differ in their malocclusions and in their need for treatment. This is true for all diagnostic tests for a disease: test values for healthy and diseased populations overlap. Some costs and benefits can be assigned to using any particular value as a cutoff point. Important considerations are the economic and the personal consequences of missing the disease (false negative) and the consequences of incorrectly identifying the disease (false positive).30,31 It is then a question of policy where the cutoff point will be placed. The condition of having a malocclusion, which is not a disease with a progressive deterioration if untreated, is very common and generally self-limiting.32

American Journal of Orthodontics and Dentofacial Orthopedics November 2002

Only in the most extreme cases does a malocclusion have health consequences. On the other hand, malocclusion has a lengthy treatment with substantial direct and indirect burden-of-care costs. It is not unreasonable or surprising that cutoff points in indexes used to allocate limited resources are set to stringent criteria to limit the consumption of these resources. Previous studies have shown that several indexes are very good diagnostic instruments for determining treatment need.1,6,7,15 Based on the ROC curve for the index, the cutoff points can be set to closely mirror the decisions of a large panel of orthodontic experts.6,7 This in turn suggests that each index, regardless of whether it is based on dental health, esthetic concerns, or a combination of both, has a certain commonality that reflects the opinion of practicing orthodontists. This is true even for the DAI, an index that is based on the esthetic opinion of students and their parents. We suggest that because all the indexes rely on measuring components of malocclusion common to orthodontic diagnosis and give precedence to anterior components or components that reflect anterior status, all indexes measure the same components. An alternative explanation is that each index is unique, and all are equally good, but this explanation is unlikely to be true. Other investigators have shown the limitations of occlusal indexes.1,2,16 The limitations of validation studies have also been documented.25 Cutoff points could be set to represent national, geographic, or local differences in orthodontic opinion,6,16,33,34 or to include orthodontic opinion from large multinational areas.1,7 We therefore conclude that, at least in central Ohio and western Pennsylvania, several indexes, including the PAR index, are valid reflections of expert orthodontic opinion of treatment need. The practitioner should determine which index is easiest to use at an acceptable level of reproducibility. The PAR index was developed as an outcome measure to be applied to models and thus is not suitable for the clinic setting. Finally, it remains up to the user of the index to set the cutoff point that best matches his or her intentions. Setting more rigorous cutoff points reduces the number of people incorrectly assigned as needing treatment (false positives), but it also increases the number of patients who are incorrectly assigned to the no-treatment category (false negatives). REFERENCES 1. Daniels C, Richmond S. The development of the Index of Complexity, Outcome and Need (ICON). Br J Orthod 2000;27: 149-62. 2. Otuyemi OD, Jones SP. Methods of assessing and grading malocclusion: a review. Aust Orthod J 1995;14:21-7.

American Journal of Orthodontics and Dentofacial Orthopedics Volume 122, Number 5

3. Cons NC, Jenny J, Kohout FJ. DAI: the dental aesthetic index. Iowa City: College of Dentistry, University of Iowa; 1986. 4. Howitt JW, Stricker G, Henderson R. Eastman esthetic index. N Y State Dent J 1967;33:215-20. 5. Brook FH, Shaw WC. The development of an index of orthodontic treatment priority. Eur J Orthod 1989;11:309-20. 6. Beglin FM, Firestone AR, Vig KWL, Beck FM, Kuthy RA, Wade D. A comparison of the reliability and validity of 3 occlusal indexes of orthodontic treatment need. Am J Orthod Dentofacial Orthop 2001;120:240-6. 7. Firestone AR, Beck FM, Beglin FM, Vig KWL. Validity of the Index of Complexity, Outcome and Need (ICON) in determining orthodontic treatment need. Angle Orthod 2002;75:15-20. 8. Younis JW, Vig KWL, Rinchuse DJ, Weyant RJ. A validation study of three indices of orthodontic treatment need in the United States. Community Dent Oral Epidemiol 1997;25:358-62. 9. Shaw WC, Richmond S, O’Brien KD. The use of occlusal indices: a European perspective. Am J Orthod Dentofacial Orthop 1995;107:1-10. 10. Draker HL. Handicapping labio-lingual deviations: a proposed index for public health purposes. Am J Orthod 1960;46:295-305. 11. Salzman JA. Handicapping malocclusion assessment to establish treatment priority. Am J Orthod 1968;54:749-65. 12. Parker WS. The HLD (CalMod) index and the index question. Am J Orthod Dentofacial Orthop 1998;114:134-41. 13. Pickering EA, Vig P. The occlusal index used to assess orthodontic treatment. Br J Orthod 1975;2:47-51. 14. McGorray SP, Wheeler TT, Keeling SD, Yurkiewicz L, Taylor MG, King GJ. Evaluation of orthodontists’ perception of treatment need and the peer assessment rating (PAR) index. Angle Orthod 1999;69:325-33. 15. Richmond S, Shaw WC, O’Brien KD, Buchanan IB, Stephens CD, Roberts CT, et al. The development of the PAR index (Peer Assessment Rating): reliability and validity. Eur J Orthod 1992; 14:125-39. 16. DeGuzman L, Bahiraei D, Vig KW, Vig PS, Weyant MS, O’Brien KD. The validation of the peer assessment rating index for malocclusion severity and treatment difficulty. Am J Orthod Dentofacial Orthop 1995;107:172-6. 17. Willems G, Heidbuchel R, Verdonck A, Carels C. Treatment and standard evaluation using the Peer Assessment Rating Index. Clin Oral Investig 2001;5:57-62. 18. Pangrazio-Kulbersh V, Kaczynski R, Shunock M. Early treatment outcome assessed by the Peer Assessment Rating Index. Am J Orthod Dentofacial Orthop 1999;115:544-50.

Firestone et al 469

19. Riedmann T, Berg R. Retrospective evaluation of the outcome of orthodontic treatment in adults. J Orofac Orthop 1999;60:108-23. 20. Bergstro¨ m K, Halling A. Comparison of three indices in evaluation of orthodontic treatment outcome. Acta Odontol Scand 1997;55:36-43. 21. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intra-class correlation coefficient as measures of reliability. Educat Psychol Meas 1973;33:613-9. 22. Cohen AJ. A coefficient of agreement for nominal scales. Educat Psychol Meas 1960;20:37-46. 23. Fleiss JL. Design and analysis of clinical experiment. New York: Wiley; 1986. 24. Younis JW. Validation of the index of orthodontic treatment need in the United States [thesis]. Pittsburgh: University of Pittsburgh; 1995. 25. Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299:926-30. 26. Swets JA, Getty DJ, Pickett RM, Seltzer SE, McNeil BJ. Enhancing and evaluating diagnostic accuracy. Med Decis Making 1991;11:9-18. 27. Lundstro¨ m A. Need for treatment in cases of malocclusion. Trans Eur Orthod Soc 1977:111-23. 28. Swedish National Board of Health and Welfare. Kungl. Medicinalstyrelsens cirkula¨ r den 21 febr. 1966, Swedish National Board of Health and Welfare. angående anvisningar fo¨ r journalfo¨ ringen inom folktankvårdens tandregleringsvård. Vol. MF nr 71. Stockholm; 1967. 29. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36. 30. Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-98. 31. McNeil BJ, Keeler E, Adelstein SJ. Primer on certain elements of medical decision making. N Engl J Med 1975;293:211-5. 32. Proffit WR, Fields HW Jr, Moray LJ. Prevalence of malocclusion and orthodontic treatment need in the United States: estimates from the NHANES III survey. Int J Adult Orthod Orthognath Surg 1998;13:97-106. 33. Richmond S, Daniels CP. International comparisons of professional assessments in orthodontics: part 1—treatment need. Am J Orthod Dentofacial Orthop 1998;113:180-5. 34. Richmond S, Daniels CP. International comparisons of professional assessments in orthodontics: part 2—treatment outcome. Am J Orthod Dentofacial Orthop 1998;113:324-8.