Reliability
of ratings in evaluation of crowns
M. Helft, D.M.D.,* R. Pilo, D.M.D.,** and H. Baharav, D.M.D.**
H. S. Cardash, B.D.S., L.D.S., R.C.S., ***
Tel Aviv [Jniversity, Maurice and Gabriela Goldschleger School of Dental Medicine, Tel Aviv, Israel
M
anual dexterity of dental students is assessedby faculty instructors during the preclinical and clinical training of the dental student. A student’s grade may sometimes depend more on the standard set by the instructor than on the quality of the product. Disagreement in evaluating student’s work usually falls into two main categories: the severity of the standard set by each instructor and the evaluation of the student’s performance compared with that of other students. One of the tasks of the instructor of preclinical and clinical dentistry is to develop evaluation skills by teaching the student to apply the criteria learned to the end product. Effective learning requires clear and uniform standards of evaluation by the instructors. This, in turn, provides feedback for the students and improves their evaluation skills. The problem of nonuniform evaluation by instructors has been known for some time.’ Lilley et a1.2investigated the correlation in evaluation of different instructors in practical examinations in restorative dentistry. In three studies conducted at various times, they found an interinstructor
correlation
varying
between 0.11 (lowest)
to
0.7 (reasonable). Gaines et a1.3 examined ratings of instructors in the Departments of Fixed Prosthodontics and Operative Dentistry. The products evaluated were wax carvings of complete crowns on stone dies. A five-point rating scale was used: 1 to 2? poor; 3 to 4, average; and 5, outstanding. A significiant difference was found between the raters (F instructors = 13.71, p < .OOl). The intraclass correlation coefficient was 0.26. Evaluation by instructors of the Department of Fixed Prosthodontics was examined by O’Connor and Lorey.“ Students were required to prepare a molar tooth with a chamfer finishing line. The products were evaluated according to six criteria on a five-point scale by 10 instructors. A low reliability estimate of 0.3 was found for the average of single pairs of raters. The accuracy of fit of the margin of a crown is considered to be one of the principal factors in determining the prognosis of the restoration. Clinical methods *Head, Department of Oral Rehabilitation. **Department of Oral Rehabilitation. ***Senior Clinical Lecturer, Department of Oral Rehabilitation. THE JOURNAL
OF PROSTHETIC
DENTISTRY
Table I. Comparison of differences in variability in mean scores assigned by raters Criterion
F
P
Adaptation of margin Thickness of margin
3.90 6.07
<.OOl <.OOl
suggested for margin examination include examination with an explorer, radiographic examination, and examination with the aid of an impression technique.5,6 Evaluation of the crown in the mouth is more difficult than on the die or on an extracted tooth, because in many patients the crown margin is subgingival, making access and visibility limited. Even supragingival margins are not usually visible interproximally. This study presents the results of examination of crowns on extracted teeth by instructors of the Department of Oral Rehabilitation at the Tel Aviv University dental school. METHOD
AND
MATERIAL
Crowns cemented on 30 extracted teeth were evaluated by 17 instructors. The teeth were extracted for periodontal reasons 10 to 20 years after the patients were treated with the crowns. At the time of extraction, care was taken not to apply direct pressure on the crowns to avoid marginal distortion. The teeth were stored in 70% alcohol. Each tooth was placed in a separate container. No discussion was allowed before each instructor completed a questionnaire evaluating the crown margins according to two criteria: the adaptation of the margin to the tooth and the thickness of the margin. Each criterion was rated according to a five-point scale: 1 = very poor, 2 = poor, 3 =acceptable, 4 = good, and 5 = excellent. The ratings of the instructors were compared by using a one-way analysis of variance. RESULTS The average rating and standard deviation by the instructors are illustrated in Figs. 1 and 2. The results indicate that the instructors were approximately divided into two groups: those who were highly critical and those who graded more leniently. The largest average difference in the rating of the first criterion was between 647
HELFT ET AL
INSTI)UCT~
k
Fig. 1. Average rating and standard deviation of first criterion (adaptation of margin) by instructors.
It -- T 4
5
i
6
L
Q
7
i7
lo
INSTRUCTOR
Na
Fig. 2. Average rating and standard deviation of second criterion (thickness of margin) by instructors.
instructors No. 3 and No. 8 and for the second criterion, between No. 1 and No. 8. The one-way analysis of variance for both criteria is shown in Table I. A significant difference was observed in the evaluation of the two criteria by individual instructors. The difference was more apparent in evaluating the thickness of the margin (F = 6.07) than in evaluating the adaptation of the margin (F = 3.90). DISCUSSION Since all instructors used the same rating scale, it was expected that the difference in the evaluation would be 648
negligible. This expectation was incorrect; the instructors used different standards of rating. The inconsistency may be related to two sources of unreliability, limited interinstructor correlation and an inadequate rating scale. A rating scale lacking objective criteria and standardization gave rise to different interpretations, measurement errors, and variance. In this study, the rating scale did not define the objective criteria and was therefore open to individual interpretation. This was most apparent in the rating of the thickness of the margins. A point scale based on numbers only encouraged subjectivity and failed in its main purpose, to NOVEMBER
1987
VOLUME
58
NUMBER
5
RELIABILI’TY
OF RATINGS
IN EVALUATION
OF CROWNS
provide a feedback from the instructor to the student on his abilities. Research using point scales with clear definitive criteria demonstrated that a greater uniformity of evaluation was possible. In the second part of the study by Gaines et a1.,3the five-point rating scale of excellent to weak was changed to a scale less open to individual interpretation. Thus, the wax patterns were evaluated as 5 = completely smooth, 3 = slight roughness, and 1 = major nicks and scratches. A one-way analysis of variance showed no statistical difference between the instructor’s evaluation (F = 1.74, fi > .05) and the intraclass correlation coefficient for individual ratings increased from 0.26 to 0.56. Other research suggested rating scales based on comparison stimuli. Products for which ratings have been established by consensus serve as standards or definition points for evaluation. O ’Connor and Lorey4 used photographic slides of crown preparations as comparison stimuli that defined adequate attainment (three of a five-point scale) of criteria. A rating of 4 or 5 was given if the product was better in quality than the preparation shown and 1 or 2 if inferior to the standard. The study showed that the estimate of agreement between pairs of raters inctreased from 0.30 in the first rating session to 0.66 in the second rating session, supporting the contribution of exemplars to interinstructor reliability. O ’Connor and Lorey4 also examined the effect of comparison stimuli on the rating of a central unilateral pinledge preparation. Surprisingly, the use of an exemplar did not improve interinstructor reliability on this occasion. The explanation given was that single exemplars are effective only for criteria for which errors are discrepancies in one direction from the ideal. Thus, when rating axial walls, it is obvious that the longer the wall the better. If it is shorter than that of the exemplar, a lower rating is awarded. To evaluate a chamfer however, errors are more likely to be deviations in two directions, too wide or too narrow. The authors of the study recommended the use of three exemplars to overcome this problem, one showing the ideal and two exemplars showing the limits of acceptability in the two directions,. Mezger et al.’ used color slides as standards for clinical evaluation of crowns and compared tarnish, corrosion, attrition, and the gingival condition buccal and lingual to the restoration. The comparison stimuli were effective in improving interinstructor reliability only for the first two parameters and less for the gingival condition. Treatment carried out by dental students cannot be
THE JOURNAL
OF PROSTHETIC
DENTISTRY
assessedaccurately by an absolute scale, because dentistry is not an exact science. All methods used to obtain uniform evaluation at best reduce variance in rating but cannot eliminate it completely. A method providing interinstructor reliability will eliminate the relationship of the rating to the instructor. This study emphasizes the difficulty of assessing crown margins in the mouth not only because of limited accessand visibility, but also because of a lack of defined criteria enabling uniform assessment. CONCLUSION To enhance interinstructor correlation, care must be taken to ensure that the rating scale is clear and definable. Such a rating scale should not be based solely on a general grade. Written, definable criteria or exemplars that will serve as standards or comparison stimuli should also be used.
REFERENCES Brown RK. Research in the use of a rating scale as a means of evaluating the personalities of senior dental students. J Dent Res 1930;10:271-9. 2. Lilley JD, Ten Bruggen Cate HJ, Holloway PJ, Holt JK, Start KB. Reliability of practical tests in operative dentistry. Br Dent J 1968;125:194-7. 3. Gaines WG, Bruggers H, Rasmussen RH. Reliability of ratings in preclinical fixed prosthodontics: effect of objective scaling. J Dent Educ 1974;38:672-5. 4. O ’Connor P, Lorey RE. Improving inter-rater agreement in evaluation in dentistry by the use of comparison stimuli. J Dent Educ 1974;42:174-9. 5. Assif D, Antopolski B, Helft M, Kaffe I. Comparison of methods of clinical evaluation of the marginal fit of complete cast gold crowns. J PROSTHET DENT 1985;54:20-4. 6. Weyns W, DeBoever J. Radiographic assessmentof the marginal fit of cast restorations. J PROSTHET DENT 1984;51:485-9. 7. Mezger PR, Van T’Hof MA, Letzel H, Eschen S, Leempoel PJB, Snack PA, Vrijhoef MA. Methodological aspectsin clinical evaluation of cast restorations with color slides. J Oral Rehabil 1985;12::435-42, 1.
Reprint requestto: DR. RAPHAEL PILO THE MAURICE AND GABRIELA GOLDSCHLECER SCHWL OF DENTAL MEDICINE TEL AVIV UNIVERSITY TEL AVIV, ISRAEL
Contributing author B. Ben-Shemen, D.M.D., Instructor, Department of Oral Rehabilitation, Tel Aviv University, Maurice and Gabriela Goldschleger School of Dental Medicine, Tel Aviv, Israel
649