ORIGINAL ARTICLES Quality of dental care: development of standards
Howard Bailit, DMD, PhD Meni Koslowsky, PhD Joseph G rasso, MS, DDS, Farmington, Conn Stanley Holzman, DDS Robert Levine, MS, DDS, West Hartford, Conn Paula Valluzzo, BS Paula Atwood, BA, Farmington, Conn
Standards of dental care were developed by a group of nine private general dental practitioners and dental school faculty members. Five assessors were trained to use the standards that were then tested for reliability, variability, validity, and prac ticality on a sample of 47 patients. The results showed high reliability for most items and moder ate variability and validity. The time necessary to train assessors and evaluate individual patients was reasonable. The potential use of this quality assessment system is discussed in terms of contin uing education, peer review committees, and den tal program evaluation.
R ecen t passage o f legislation mandating the de velopm ent o f professional standards review or ganizations and the increasing involvem ent o f insurance com panies in monitoring the quality o f care make it apparent that providers o f dental care eventually w ill be involved in som e form o f organized review system . A major challenge fac ing organized dentistry is to ensure that the sy s tem is acceptable to both the dentists and the public. M any factors might affect the profession’s support o f a particular evaluation system . First, the dental profession must have a significant role in developing the system . Second, the m ethod o f evaluation m ust be objective. Third, the cost 842 ■ JADA, Vol. 89, October 1974
o f evaluation m ust be reasonable in terms o f time and m oney. Fourth, the basic dentist-patient relationship must be kept intact. With regard to the first factor, the A m erican D ental A ssociation and state associations have influenced, to som e degree, the evaluation pro cedures o f third parties. In general, how ever, m ost insured programs function independently o f organized dentistry. With the expected in crease in prepaid dental care, the dental profes sion might consider the formation o f founda tions that contract with third parties to monitor the cost and quality o f dental care in a given state or area. A precedent for this type o f organiza tion has already been established in m edicine.1 A t the national level, dentists m ust have som e role in determining P SR O policy; this organiza tion will probably be responsible for evaluating the quality o f dental services purchased with fed eral and state funds. In terms o f objectivity, one o f the bases for an evaluation system m ust be reliable criteria o f care; that is, similar judgm ents m ust be made by tw o or more assessors w ho independently evalu ate the sam e patient. F or reliability to be achieved, written criteria m ust be tested under controlled conditions. Som e progress has been made in this area, notably the work o f Abramow itz,2 Friedm an,3 Schonfeld and co-w orkers,4 S oricelli,5 and C o n s,6 although much remains to be done. The cost m ust also be reasonable. M ethods o f evaluation m ust not rely solely on the use o f den tists, the m ost expensive form o f dental man power. Through the u se o f com puters and non professional raters, the review system can min im ize the tim e practicing dentists must spend away from direct patient care. Finally, although the evaluation system may have som e effect on dentist-patient relationships, the mutual respect and trust that exist betw een m ost dentists and patients must be maintained.
T his brief review is a general background to the present paper on an experim ental quality review system for general dentistry, developed jointly by a group o f com m unity practitioners and dental school faculty, the latter serving mainly as technical advisers. T his paper d e scribes the process o f developm ent and the test ing o f the criteria. E xcept for exam ples, the ac tual criteria are not presented. ■ Fram ew ork fo r evaluating quality: F or a com m on fram e o f reference, it is necessary to discuss briefly the concept o f quality and the various dim ensions o f quality that can be eval uated. A com m only accepted definition o f quality is the one o f L ee and J on es7: “ G ood medical care is the kind o f medicine practiced and taught by the recognized leaders o f the m edical profes sion at a given tim e or period o f social, cultural, and professional developm ent in a com m unity or population group.” Several aspects o f this definition need em phasis. First, it is clear that the definition o f qual ity depends on the professional leaders in a given com m unity, and although there will be general agreem ent on som e standards in all com m unities, individual differences will exist. T h ese regional differences should be incorporated into any na tional standards. A second point is that standards are not con stant over tim e. Even within a single com m unity, standards o f care will change as n ew biologic or technical discoveries are made in the diagnosis and treatm ent o f disease. T his is also true for changes in social and cultural values that often define what society and the profession consider abnormal conditions requiring treatment. W ith this definition in mind, there are several general conceptual m odels for the m easurem ent o f the quality o f care, including th ose o f F ried m an,3 R oem er,8 and S chon feld.9 A ll have many points in com m on, but the m odel o f D onabedian10 has received w ide acceptance and will be used in this study. Donabedian proposed that the quality o f care can be evaluated from essentially three different but related aspects: the structure, the p rocess, and the outcom e o f care. T he outcom e o f receiv ing dental service is usually m easured by the D M F , periodontal, and other morbidity indexes. T he outcom e o f care is often considered the m ost important m easure o f quality, but sin ce many factors other than dental care affect the dental health o f the patient, outcom e m easures have
som e lim itations. F or exam ple, if the patient is not conscientious about oral hygiene, the best dentistry may deteriorate. T h e second dim ension o f quality is the process o f care. Regardless o f the outcom e o f care, did the dentist plan the proper treatment for the pa tient and w as the treatment adequately carried out? Criteria must be available for com parison to make judgm ents on the process o f care. T h e structural dim ension o f care focu ses on the setting in which care takes place. A sse ss ment can be made o f such things as physical fa cilities and equipm ent, qualifications o f pro viders, and administration o f the p ractice.11 It is assum ed that if the structure is adequate, then the process o f care is likely to be acceptable. There is a relationship among the three dimen sions o f care; though one does not necessarily cause the other, they are related “ as links in a ch ain .” 10 Within a proper structure, dentists can deliver adequate care that should result in healthy oral tissues. T he present project is concerned with all three aspects o f quality and especially the interrelationships am ong them; how ever, this paper is directed to the developm ent and testing o f standards for the “ process o f care.”
Methods
■ Development o f criteria and standards: Since the evaluation o f the process o f care depends on criteria developed by leaders in the dental com m unity, the first step was to select a standards com m ittee. On the advice o f the Hartford (Conn) D ental S o ciety ’s execu tive com m ittee, a ninemem ber com m ittee was formed: one dentist was a m em ber o f the C onnecticut State Board o f D ental Exam iners, tw o w ere on the executive com m ittee, four w ere active members o f the so ciety, and tw o w ere U niversity o f C onnecticut dental school faculty in the department o f gen eral dentistry. T hey all considered them selves to be general dental practitioners and had at least seven years’ experience in practice. In addition, the com m unity m embers o f the standards com mittee w ere generally w ell known am ong their peers, and it was assum ed that criteria and stan dards developed by this group would be accept able to local dentists. A t the first m eeting o f the com m ittee a deci sion was made to try to reach unanimity when Bailit—others: QUALITY OF DENTAL CARE
■ 843
ever possible on criteria, but in areas o f contro versy a majority vote w ould be binding. A lso , it was realized that the com m ittee w as rather large and a subcom m ittee w ould be needed to do m ost o f the staff work. T he subcom m ittee consisted o f one dental school faculty m em ber and tw o com m unity practitioners. c r i t e r i a o f c a r e . A n exten sive bibliography w as available on evaluation procedures and cri teria used in other investigations. From a general perspective, the work o f P eterson and co-w ork ers12 on general m edical practitioners w as par ticularly helpful, as was a manual o f dental cri teria and standards developed by Friedm an.3 For the m ost part, the actual content o f the cri teria w as based on the practical experience o f the clinicians on the com m ittee; reference was also made to dental textbooks. A nother useful source, dealing with som e o f the conceptual problem s that had to be resolved before criteria and standards w ere developed, w as a book by D on ab ed ian .13 Four o f these conceptual prob lem s d eserve com m ent. First w as the question o f whether the criteria should be norm ative or empirical. Should the cri teria reflect what dentists should do in practice (normative) or what they actually do (empirical)? O bviously, these are not necessarily the sam e. T he com m ittee decided that criteria and stan dards should not be unrealistic. “ Ideal dentist ry” or the “ best possible dentistry” are intan gibles that are probably not definable and that, in any event, suggest a level o f care that is seldom a ch iev ed .11 T he m ore important issu e is w hether the care is adequate. T hus, the criteria are nor m ative in that they are oriented to what the av erage dentist should do to provide adequate care. T he second major issu e w as w hether to d evel op criteria for all aspects o f general dentistry or for m ore limited parts o f it. Since general dental practitioners deal with many different typ es o f problem s, it w ould be im possible to develop cri teria for every conceivable situation. O ne approach to this conceptual problem that has recently received attention is the “ tracer m ethod .” 14 H ere, criteria are developed for cer tain specific disease entities and providers are evaluated with respect to these conditions. It is assum ed that the level o f care provided for the tracer d iseases approxim ates the level o f care for all diseases. In dentistry, the tracer method could mean evaluation o f only the quality o f amalgam restor ations, if it is assum ed that the quality o f endo 844 ■ JADA, Vol. 89, October 1974
dontic or crown and bridge treatm ent w ould be comparable. Since there is little evid en ce about the relationships in quality among different treat ment areas, limiting the criteria to four or five conditions did not seem appropriate. Rather, the com m ittee decided to prepare criteria and standards for the com m on conditions that gen eral dental practitioners treat. T hen, after eval uation o f the standards clinically, it w ould be possible to determine the feasibility o f a m ore limited approach. A third conceptual problem w as the division o f the process o f care into its com ponents. There are m any w ays in which this could be d one, but a logical method is by the sequence o f events in the treatm ent o f the average patient. T hus, the first com ponent would be the quality o f the his tory and examination; second, the diagnosis; third, the treatment plan; and fourth, the treat ment. T he final issue w as the m ethods available to collect information about patient care. T o som e extent, the m ethods used w ould define the areas o f patient care that could be evaluated. Three approaches to data collection w ere considered: observation o f the dentist, record audit, and pa tient exam ination. O f th ese three m ethods, ob servation o f the dentist w hile he w as treating a patient was discarded as being too exp en sive and probably unacceptable to m ost practitioners. T he evaluation o f dental records including radio graphs seem ed a m ore reasonable approach, since all dentists keep records and these records can be evaluated relatively inexpensively. T h e other major source o f data w as the clin ical exam ination o f patients. F or the evaluation o f the technical quality o f m ost services related to the teeth per se, patient exam ination should provide an acceptable method because teeth do not repair and a permanent record o f the treat ment rendered is left. This is not true for ser vices involving the soft tissu es, which can repair without any permanent changes in size, shape, or color. c o m p o n e n t s o f c a r e . With th ese general guidelines in mind, the subcom m ittee started to develop criteria and standards for each o f the four com ponents. — H istory and examination: For sim plicity, it was decided to limit the criteria initially to the exam ination o f a new patient seeking com pre h en sive care. F or these patients, the history and exam ination should be m ost com plete. F iv e major elem ents w ere identified as n eces
sary in an adequate history and examination: the chief complaint and, if associated with a spe cific problem, a description of the present illness; a personal history; the past medical history; the past dental history; and the dental examination (including use of diagnostic aides). For adequate care to be provided, some description of these five sections should be found in the dental rec ords. The information does not have to appear in the categories or sequence suggested. The im portant issue is whether the appropriate data exist somewhere in the record. For each of the five areas, specific criteria were developed to assess the adequacy of the information (illustration). It became apparent that adequacy involved two aspects—the level o f detail and its accuracy. Although it is useful to determine if the information recorded was ac curate, in this report only the level of detail in the record is evaluated. — Diagnosis: Evaluation of the diagnosis pre sented several practical problems because, in many instances, dentists do not specifically list diagnoses. In a few situations such as soft tissue lesions, specific diagnoses might be in the record, whereas for most other conditions they are not. A lso, if the treatment plan is done well, it is rea sonable to assume that the diagnoses were ade quate. Therefore, since it is almost impossible to obtain direct evidence on diagnoses and since the quality o f the diagnoses is covered, to some extent, in the treatment plan assessment, a de cision was made not to have a specific section on the diagnostic phase of care. —Treatment plan: It is necessary to separate the evaluation o f the plan of treatment from the actual treatment. The former includes the judg ments o f the dentist on what should be done for the patient, whereas the latter reflects the tech nical quality of the treatment provided. This is an important issue, since the dentist should not be rated for the same performance in two differ ent parts of the evaluation. Information on the treatment plan is elicited from the treatment progress notes and from the examination of the patient. The presence of a specific treatment plan in the record would be of great value but, except for the more complex and difficult situations, most dentists probably plan their treatment but do not record it. The criteria for the treatment plan assessment are divided into ten categories primarily on the basis of the information obtained in the history and examination. The categories are personal,
medical, extraoral tissues, preventive services, restorative services, intraoral soft tissues, perio dontal services, occlusion, and sequence of treatments (illustration). —Treatment: The assessment of the technical quality o f treatment can be done with relative objectivity where there is a permanent change in hard tissues such as in restorative dentistry. H owever, the treatment o f tissues such as the periodontium that can repair may be impossible to evaluate. In a period of several months after treatment, the tissues can return to their previous state for reasons that may be independent o f the dentist’s therapy; the actual therapy cannot be directly evaluated, only the outcome o f care. As a result o f these technical limitations, criteria for assessing periodontal care were not developed. In addition to periodontal services, endodontics and oral surgery presented somewhat similar problems so that their assessment was limited to just a few criteria. Most o f the information to assess the technical quality of treatment was based on the examina tion of patients. For a few areas such as endo dontics and oral surgery, radiographs and treat ment notes were the main source of data. A s previously implied, the treatment criteria are divided in terms of the primary categories of services commonly provided in general dental practice. These include restorations, endodon tics, crowns and fixed bridges, partial dentures, complete dentures, and oral surgery (illustra tion). With the subcommittee doing much of the de velopmental work, the meetings of the standards committee proved to be productive. Each set of criteria was discussed in depth and, although there was often controversy over particular is sues, it was seldom necessary to have a formal vote. M ost decisions were reached by consen sus, and in six months, a first draft of the criteria was completed. ■ Scoring system : Both quantitative and cate gorical (qualitative) scoring system s were de signed. q u a n t i t a t i v e s c o r e . Before the criteria could be used in the evaluation of patients, it was necessary to develop a reliable scoring system that was easily understood by the assessors. At first, each criterion was scored on a three-point scale: 1, unsatisfactory; 2, adequate; and 3, su perior. A value of 9 was assigned when no deci sion could be made; this was usually because of Bailit— others: QUALITY OF DENTAL CARE ■ 845
Examples of criteria used to assess history and examination, treatment plan, and treatment.
History and examination M e d ica l h is to ry : Foran adequate medical history, these data should be in the record: 1. A general description of the patient’s general health, including past serious illnesses. 2. In addition to a general state ment of the patient’s health, specific references should be made to these conditions: sensitivity to drugs and other allergies, rheumatic fever, bleeding problems, liverand kidney disease, heart disease, diabetes, and pregnancy status (if fem aieo f child-bearing age). 3. The names of the patient’s personal physicians (if any). 4. The date of the patient’s last visit to a physician for a physical examination. 5. The medications being taken by the patient (if any) and the reasons for taking them.
Treatment plan S equence o f tre a tm e n t: Inm ost instances there is an orderly se quence of priorities in planning a patient’s care. Although some pa tients will not require certain treat ments such as referral to a physi cian, treatm ent plans should follow the general sequence listed here. As a broad guideline, the patient’s chief complaint should be dealt with on th efi rst visit. Of course, if the complaint concerns the need for dentures or other treatments that cannot be provided until extensive dental or medical care is completed, the chief com plaint cannot be re sponded to at the beginning of treat ment. This sequence of treatments is recommended:
846 ■ JADA, Vol. 89, October 1974
1. Alleviation of acute condi tions (pain, bleeding, or acute infec tion) or the chief complaint. 2. Consultation with or referral to a physician for systemic evalua tion. 3. Placement of the patient on antibiotics or other premedications. 4. Institution of primary pre vention prog rams such as oral hy giene instruction, prophylaxes, plaque control, or fluoride treat ments. 5. Control of deep caries that may cause pulpal exposure. (The sequence of items 6 through 10 is interchangeable.) 6. Extractions and soft tissue surgery. 7. Treatm ent of teeth endodontically. 8. Treatm ent of the periodon tium. 9. Treatm ent of orthodontic problems. 10. Adjustment of occlusion. 11. Operative treatm ent of teeth (restorations). 12. Replacement of teeth prosthetically. 13. Recall examinations and maintenance treatment.
Treatment C o m p le te d e n tu re s : The factors considered in judging complete dentures are retention, stability, vertical dimension, extension of flanges, occlusion, placement of teeth over ridges, and appearance. 1. Retention Retention in the maxillary den ture should be sufficient to allow the patient to perform the normal mouth functions of talking, eating, and opening w ithoutdislodgingthe denture. Retention in the maxillary den ture should be strong enough to
resist light hand pressure in a downward direction. 2. Stability For both the maxillary and man dibular dentures, there should be only slight movement in a plane hor izontal with the ridge, when light twisting pressure is placed on the denture by the hand. 3. Vertical dimension The teeth should not come in contact when the patient talks. There should not be excessive free-way space with overclosure when the teeth are in contact. 4. Extension of the flanges The flanges of the denture should b e to th e d e p th o fth e m u c o buccal folds without displacement of tissue. On the lingual aspect of the mandibular denture, the flanges should make contact with the floor of the mouth at rest and should not dislodge the denture when the tongue is extended to moisten the surface of the lower lip. 5. Occlusion There should be bilateral con tact of all molar teeth. There should be no movement of denture bases when the teeth are in light occlusion. With repeated closure, the teeth should meet without sliding. 6. Placement of posterior teeth The buccal cusps of the molars should be placed over the alveolar ridge in the m andibular denture. 7. Appearance The shade of the teeth should blend with the patient’s remaining natu ral teeth (if any) in the opposite arch. The labial position of the maxil lary anterior teeth should provide adequate support to the lips.
lack of information. Preliminary attempts to use this system showed that assessors had difficulty making reliable judgments between the second and third classes, especially in evaluating the treatment plan and treatment. Therefore, for these latter two components o f care, the scale was reduced by removing the third classifica tion (superior). In the rating of the treatment plan and treat ment, a problem arose if the dentist had multiple opportunities to meet specific criteria. For ex ample, treatment-planning criteria were devel oped on the materials used to restore teeth. Since the dentist usually restored more than one tooth and often met the standards for some teeth but not others, the question arose as to whether the treatment plan in this area was satisfactory. Arbitrarily, a general rule was established. If 90% or more of the decisions were correct, then a satisfactory score was assigned. By the same token, if less than 90% o f the decisions were cor rect for that item, an unsatisfactory score was assigned. In practical terms this meant that if a dentist used the correct materials in nine of ten teeth restored for a given patient, a satisfactory score was assigned for that criterion. If only five restorations were evaluated and one was not ade quate, an unsatisfactory score was assigned. One other issue on scoring deserves mention. Many components o f care involve more than one criterion. A s an example, assessors were asked to evaluate the use of restorative materials. For this item, four of five specific criteria related to the material’s biological effects on pulpal tissue, strength, esthetics, and so on are available. The issue then is scoring the use of restorative ma terials if three of the four standards are met but one is consistently unmet. It can be argued that in this situation use of materials should receive an adequate rating. Y et, from a clinical perspec tive, if one important aspect of material use is consistently not achieved, it is likely that this aspect of the treatment plan is clinically inade quate. A s an example, for anterior teeth a dentist might use a synthetic filling material that has ex cellent strength, matches the color of the natural tooth, but causes a permanent, adverse pulpal re action even with a cavity liner or base. The use of materials should be given a low score even though most o f the criteria are met. Thus, an “ adequate” score would be assigned to an item if all criteria within that item were met 90% or more of the time. c a t e g o r i c a l s c o r e . For the validation o f the quantitative evaluation system just described, a
five-point categorical (or qualitative) scale was devised to provide an overall, rather than an item by item, measure o f quality. The assessors gave their general impression of the quality of care provided after examining the patient and briefly reviewing the record (treatment notes and radio graphs). Then, the quality of care was rated by one of five general categories. — Category 1: The dentist has met few of the standards commonly accepted for adequate care. The treatment is clearly o f the low est quality, technically and judgmentally. — Category 2: The dentist has some area where his work is adequate, but overall the level of care has not reached an acceptable level. — Category 3: The dentist has achieved ade quate care in all phases of the treatment plan and treatment. — Category 4: The dentist has done superior work in most of the major phases o f care. There are, however, one or two areas where the work is adequate but not superior. — Category 5: The dentist has provided out standing care in all phases o f treatment. The assessor who performed the overall eval uation was not involved in the detailed (quanti tative) evaluation o f that particular patient. In addition to its use as a validity check, a ma jor advantage of the categorical approach is the speed with which an assessment can be com pleted; it requires about five minutes. Conse quently, it is crucial to compare both systems and to judge whether One can be used as a substi tute for the other. ■ Training o f assessor: Two types o f assessors were trained. For the history and examination, two research assistants without any background in dentistry or other health disciplines were taught to rate the dental records and to abstract them for use by the dental assessors. The rationale for using nondental personnel for this task was that decisions were being made, not on the appropriateness of treatment, but rather on the amount of detail in the record, and nonprofessional record assessors could do this job at less cost than could trained dentists. Three members o f the standards committee— two from the community and one from the dental school—were trained as assessors to evaluate the treatment plan and treatment. Although the assessors were already familiar with the evalua tion system , they were asked to take a test to examine their didactic knowledge o f the criteria. After the assessors had successfully completed Bailit—others: QUALITY OF DENTAL CARE ■ 847
the test, patients were examined clinically, and preliminary data were collected on the criteria and scoring system. It soon became apparent that som e o f the criteria were not practical, could not be rated reliably, or for some other reason were not usable. The standards committee met several times to modify the criteria. The usual procedure during the training period was for two or three patients to be scheduled for an afternoon; each patient was rated indepen dently by the three assessors. Then, the asses sors met to discuss any differences among them. This quickly led to the assessors’ developing comparable interpretations o f the standards and a better understanding of the entire process of evaluation. A fter two months o f training and after modifi cations were made in the criteria, a manual was prepared outlining in detail the method of eval uation. A t this time, two additional assessors were recruited, one a member of the standards committee, the other a community practitioner. ■ Examination o f patients: With both sets of assessors trained, the criteria were evaluated for reliability. Approximately 50 patients were sche duled for examination. These participants were active dental patients from three sources: the private practices o f community dentists, the den tal school clinic, or the Veterans Administra tion hospital (these patients received their care in the private practices o f local dentists). The pa tients selected were above the age o f 18, had com pleted a course of treatment (were in a main tenance phase of care), and had been treated for more than one condition. In regard to the latter, most patients had, as a minimum, received care requiring restorative and periodontal therapy. A description of the evaluation process fol lows. First, before the clinical evaluation, the research assistants rated the history and exam ination based on the dentist’s records. They then abstracted the record by writing a brief descrip tion o f the treatment given at each visit in se quence. This allowed the dental assessor to eval uate the plan o f treatment without wasting time trying to organize the material in the original rec ord; it also prevented the dental assessor from knowing the name of the dentist who treated the patient. Immediately before the clinical assessment, the patient was asked to complete a short ques tionnaire noting the history of dental treatments, any outstanding medical problems, and payment 848 ■ JADA, Vol. 89, October 1974
sources for care. A lso, attitudinal questions were used to assess the patient’s willingness to practice good oral hygiene and to pay for need ed services. N ext, the patient was then seen by the dental assessor who evaluated the treatment plan and treatment. H is sources o f information were the abstracted treatment notes, the patient question naire, and his own clinical examination of the patient. After the first assessor had finished, the patient was seen independently by a second as sessor who proceeded to do the same type of evaluation. Finally, a third clinical assessor eval uated the patient using the qualitative scoring system. ■ D ata analysis: The process variables were examined in terms of their reliability, variability, validity, and practicality. The ideal evaluation system would be highly consistent and stable (reliable), would highly discriminate between patients (variable), would be meaningfully inter pretable (valid), and easily learned and adminis tered (practical). — Reliability: Two measures were used to de termine interrater reliability. The first was sim ply a descriptive indicator that compared the per centage o f agreement between any two judges across all items. The value of this measure can range from 0 (perfect disagreement) to 100% (per fect agreement). A chance finding would be 25% in history and examination and 33% in treatment plan and treatment. H ere, a chance result refers to the number of ways two judges can agree, di vided by the sum o f possible agreements and dis agreements. Another indicator of reliability was the com parison o f mean scores (on items 1,2, and 3 only) between judges for the same record or patient. The t statistic was used to test the null hypothe sis that means for the two judges were equal. A significant value indicated lack of agreement be tween judges on a particular patient or record. —Variability: Before any practical meaning can be assigned to a criterion, variance must be present. An item that everyone agrees on and has been assigned the same value (for example, 2) does not distinguish between dentists. A s the evaluation of such an item does take some finite amount o f time, it reduces the efficiency of the instrument. The method used to measure variability was the standard deviation. It was calculated for each item across all patients; items rated 9, no
judgment, were excluded. The range o f vari ability scores with this data was 0 to a value slightly greater than 1. A value o f 0 meant that only one score had been assigned to the item (that is, either 1, 2, or 3) whereas a value near 1 signified that an equal number o f Is, 2s, and 3s had been assigned. —Validity: The validity of an instrument an swers the question, “ Is the instrument measur ing what it is supposed to measure?” Three mea sures of validity are of concern here: content or consensual, predictive, and concurrent. The first refers to the appropriateness of the items in the instrument. This criterion was met by having all items reviewed and approved by the standards committee. The second measure investigates the predictive ability of the items in regard to other variables such as outcome indexes. The only estimate o f validity statistically de termined in this paper, concurrent validity, de scribes the relationship between the quantita tive and qualitative scores. This was accom plished by calculating the product moment cor relation between the sum of the treatment plan and treatment scores (a quantitative measure) and the qualitative score assigned to each pa tient. — Practicality: Practicality refers to several different measures including the length of time required to learn the criteria and to evaluate a particular patient clinically. These considera tions are crucial for wide dissemination of the system.
Results
■ D escription o f sample: The sample consisted o f 32 men and 15 women, ranging in age from 20 to 65 with a mean age of 30.6. Since about half the patients were recruited from the Veterans Administration dental clinic, the distribution o f age and sex is clearly skewed to men aged 20 to 25 years. Thus, the patient population cannot be considered representative o f the total popula tion o f patients in the region. Each patient additionally was classified ac cording to areas of treatment planning and treat ment that could be assessed. For example, how many patients had restorative or endodontic treatment that could be evaluated? From Table 1 it is evident that most patients had received restorative, periodontal, and some type o f pros-
Table 1 ■ Percentage of patients assessed by category
of evaluation: treatment planning and treatment. C a te g o ry o f e v a lu a tio n T re a tm e n t plan M e d ic a l E x tra o ra l S o ft tis s u e P re v e n tiv e R e s to ra tiv e P e rio d o n ta l O c c lu s a l P ro s th e tic s P e ria p ic a l T re a tm e n t R e s to ra tiv e C row n and b rid g e R e m o v a b le p a rtia l d e n tu re s C o m p le te d e n tu re s E n d o d o n tic s O ral s u rg e ry
% p a tie n ts
48.9 10.6 2.1 84.0 89.4 63.8 10.6 66.0 21.3 89.4 53.2 21.3 12.8 30.0 12.8
thetic services. H owever, few could be evalu ated for intra- or extra-oral soft tissue lesions, occlusal problems, and oral surgery treatment. This assessment could not be made for two main reasons: either the patient never needed treat ment (for example, periapical lesion), or if he did, there was not enough information on the extent o f the problem or how the dentist handled it (for example, soft tissue lesion). Often, it was impos sible to distinguish between these two reasons, but the end result was the same: no judgment could be made. ■ Reliability: First, so that the reliability o f the record audit of the history and examination could be measured, the audits o f the research assist ants were compared with those of a dentist. When it was apparent that there was little differ ence between the non-dentist and dentist audi tors, ten records were chosen at random and evaluated twice for detail by the two research assistants. The average agreement between as sessors for all items was 95%. Similarly, the per centage agreement between assessors for each item ranged from 80% to 100%. All the demo graphic items such as sex, age, and occupation were scored identically by both judges. Those relating to charting o f caries, periodontal exam ination, and oral hygiene evaluation were the most troublesome and received a lower reliability of 80%. These results were supported by the findings of no significant differences in mean scores be tween judges for any o f the records. It is apparent that for the assessment of history and examina tion a high degree o f reliability was achieved. The analysis of reliability for treatment plan showed similar results. The percentage agree ment between assessors across all items ranged from 79% to 96% (Table 2). Although some pairs Bailit—others QUALITY OF DENTAL CARE ■ 849
Table 2 ■ Reliability between assessors (percentage agreement) for all items: treatment plan and treatment. % a g re e m e n t P a ir no.
A s s e s s o rs
No. p a tie n ts
A /B A/C A/D A/E B/C B/D B/E C/D C/E D/E
4 4 9 6 2 2 1 4 2 13
1 2 3 4 5 6 7 8 9 10
T re a tm e n t plan T re a tm e n t (N = 2 9 item s) (N = 5 0 ite m s ) 94 96 90 94 94 92 94 92 95 92
86 85 85 86 83 83 96 81 79 92
Table 3 ■ Proportion of items classified by percentage of agreement between assessors.* C om ponent of e v a lu a tio n H is to ry & e x a m in a tio n 31 ite m s T re a tm e n t plan 29 item s T re a tm e n t 50 ite m s
% a g re e m e n t b etw e en a s s e s s o rs ------------------------------------------------------------------------------<80% 8 0 % -9 0 % 9 0 % -9 9 % 100%
0.10
0.25
0.21
0.41
0.35
0.03
0 .0 6
0.28
0.48
0.18
0.65
* Each c e ll re p re s e n ts th e p ro p o rtio n o f in d iv id u a l ite m s w ith in a s p e c ifie d ra n g e o f a g re e m e n t.
o f dentists saw very few patients in common, others such as D and E, who saw a total of 13 patients together, had a 92% agreement score. For the reliability between assessors for each item, only a fifth of the items had less than 90% agreement; the lowest o f these was 59% for a spe cific criterion related to the control of periodon tal inflammation. As is evidenced from Table 3, more than a third of the items were agreed on at least 90% o f the time. Finally, o f a total of 4 7 t tests, each determined by a comparison of the two assessments for each patient, only four were significant (2.4 are ex pected by chance). For the treatment plan, then, the reliability was high regardless o f the measure used. For the evaluation o f treatment, the percen tage agreement between assessors for all items ranged from 90% to 96% (Table 2). Similarly, more than half of the items had agreement per centages higher than 90 and only three items were agreed on less than 80% o f the time (Table 3). This occurred for criteria related to embra sure space in amalgams and the marginal ridge heights in crowns or fixed bridges. The agree ment level for these items was 79%. In the calculation of the 47 t tests, 11 showed significance beyond the 5% level. This is about nine more than would have been expected by chance. A close examination of the data showed 850 ■ JADA, Voi. 89, October 1974
that judges C and D , who did the least number of evaluations, had the highest proportion o f signifi cant differences compared with those o f their partners. The lack of practice probably contrib uted to their unreliability. Even with this result, close to 80% of the t tests were not significant; this indicates that, in general, treatment assess ment for a particular patient was quite reliable. ■ Variability: Since each patient was seen twice, scores by each assessor were assigned to one of two groups. This procedure yielded two stan dard deviations for each item across all patients. These values were averaged to obtain a better indication of the “ true” population standard deviation. Although it is quite difficult to clearly define an acceptable value for the standard deviation, for practical purposes 0.25 was set as a lower limit. Items with standard deviations below this cutoff, representing about 25% o f the maximum score, would serve as poor discriminators and predic tors. A s can be seen from Table 4, about a third of the items on history and examination and treatment had standard deviations below 0.25 and more than half the items in the treatment plan component failed to reach this arbitrary cutoff. A close examination o f the items in each com ponent showed where the low standard devia tions were most prominent. For example, in the history and examination nearly all the demo graphic criteria had little or no variance. Simi larly, the medical items in the treatment plan were very low on the standard deviation. Finally, in treatment, the items relating to extractions showed little, if any, variability. ■ Validity: A s a measure o f concurrent validity the summed treatment plan and treatment scores were compared to the qualitative score for each patient. A product moment correlation o f 0.30 was found between the measures. This was sig nificant at the 0.05 level. Although there is some
Table 4 ■ Percentage of items classified by standard deviation.* S ta n d a rd d e v ia tio n C om ponent of e v a lu a tio n
No. item s in co m p o n e n t
=£0.25
> 0 .2 5 < 0 .7 5
^ 0 .7 5
H is to ry and e x a m in a tio n T re a tm e n t plan T re a tm e n t
31 29 50
32% 58 38
26% 31 34
42% 11 28
*Each c e ll re p re s e n ts th e p e rc e n ta g e o f ite m s w ith in s ta n d a rd d e v ia tio n range.
a s p e c ifie d
indication that the two scores were tapping the same underlying variable, 91% of the variance is unaccounted for; the process standards con tain unique variance unexplained by a simple evaluation measure. ■ Practicality: The training time for assessors, both professional and nonprofessional, averaged about 20 hours. About 18 more hours were spent by each clinical assessor in practicing the system with patients. The time for examination of pa tients varied indirectly with the number o f pa tients seen. A t the beginning o f the evaluation sessions, about 40 minutes were required to do the treatment plan and treatment, but by the end o f the sessions, very few patients required more than 20 minutes of the examiner’s time. The qual itative measure required no more than five min utes at any time. These times are all based on the particular as sessors, recorders, and patients used in the pres ent test run. However, these results are probably not unique and may be generalized to most set tings.
Discussion Perhaps the most important results of this study are the insights gained on the process of devel oping criteria and standards and not on the con tent of the criteria per se. Private dental practi tioners, without special expertise or training, were able to formulate criteria of care within a six-month period. Some o f these same practi tioners then used these criteria in the assess ment o f patients. While technical advice was available from dental faculty with a background in tests and measurements, this type of advice is probably obtainable in most communities; this suggests that many dental groups could develop standards. A s noted in the introduction, there are many advantages to the development o f criteria at a local as well as a national level. First o f all, there is a greater probability that the criteria will be acceptable to community dentists. A second ad vantage is the educational value o f having a group of practitioners who know and respect each other discuss the elements of adequate care. Third, the development of standards may be the beginning of a community peer review system that goes be yond problems o f patient or third party com plaints, and becomes a form o f continuing edu
cation. Objective evaluation o f the quality of care provided in a practice by colleagues en ables the individual dentist to receive feedback on his clinical strengths and weaknesses. This assessment should be carried out in an atmos phere o f helpful concern rather than of sanctions and punishment—the usual connotations asso ciated with peer review. Even the dentist doing the assessment will find peer review a learning experience. In fact, it is recommended that all dentists should, at some time, serve as asses sors. The preliminary results o f testing the criteria for reliability, validity, and practicality were en couraging. A ssessor reliability was relatively high for most items. For the few categories in which there were problems, it is our impression that the criteria were not explicit and the asses sors were unsure of themselves. These criteria have been modified by the standards committee, and the reliability for these items should improve. In terms of the variability scores, the standard deviations were not as high as expected; this in dicates that many items were always assigned the same value by the assessors. T o some extent, this may have led to artificially high reliability values for some items. Several reasons could account for the low var iability of some items. The standards may have been set too low or high; however, a more im portant reason is probably the mix of patients evaluated. With a larger number o f patients cov ering a wider range o f clinical problems, greater variance would, in all likelihood, result. Practically, the total time needed for the den tist to evaluate one patient was less than 25 min utes. O f equal significance, the time necessary to train a practicing dentist to be an assessor was also reasonable. These results suggest that this system , or a modified version of it, could be in stituted on a larger scale. It could serve as a form o f continuing education, as a framework for pa tient assessment by peer review committees, and as one level of quality assessment for den tal care programs. With reference to the last item, there must be different levels o f evalua tion when large numbers of patients are involved. Certainly, it is unrealistic to suppose that a sig nificant percentage o f patients in a program could be examined as part of a quality monitor ing system. The cost and inconvenience to both providers and patients would be too great. In stead, the initial or screening review will prob ably depend on information obtained from insur ance claim forms or patient records. Then, on Bailit—others: QUALITY OF DENTAL CARE ■ 851
the basis o f these results, selected dentists would be subject to a more detailed review. It is at this point that the evaluation system proposed here would be o f value. Mention was made previously of problems associated with patient sampling. This whole question needs considerable study before any organized evaluation system can be effected. Currently, there is little knowledge of the num ber and types o f patients who must be examined to obtain a stable estimate of quality in the aver age dental practice. Certainly, assessment of two or three patients from an active list of 1,500 to 2,000 probably yields little information on the quality o f the entire practice. This point should be emphasized since some recent publications on the quality of dental care have suggested that a low but significant percentage of dentists are delivering inadequate care; unfortunately these results are based on samples o f only a few pa tients from each practice.15,16 Although this con clusion might be correct, it is premature to make such statements without further evaluation. A related issue that has both theoretical and practical significance is the relationship of qual ity among different treatment areas. For exam ple, if it is assumed that the average general den tist spends most o f his time providing restora tive, periodontal, and prosthetic services, is the quality of care for each o f these treatment areas approximately the same within a particular prac tice? From a theoretical point of view , this prob lem is o f interest because it leads to speculations on the reasons for any quality differences found. Are they a result of personal preferences of the dentist? D oes he enjoy one area but not the oth er? Are they a factor of his education and train ing or, perhaps, the amount of time spent in pro viding specific services? Obviously, these are not mutually exclusive categories and probably all reasons are relevant. Practically, if the quality of care seen in one or two types of services is indicative o f quality in the entire range of services provided by a den tist, then the evaluation process can be greatly restricted and limited to the evaluation o f a few patients of a specific type. H owever, if this is not true, then it is important that enough patients o f different types be seen to obtain a valid es timate of quality. Although these problems seem somewhat esoteric at this time, they may soon assume major importance. The attempt to validate the criteria with use of an independent evaluation o f the patient and 852 ■ JADA, Vol. 89, October 1974
rating o f the quality o f care on a scale of from one to five resulted in a low correlation (r=0.30) be tween the qualitative and quantitative scores. In the five to ten minutes allowed for the qualita tive assessm ent, it is our impression that the ex aminers tended to focus on one or two items that did not meet acceptable standards of care and to weigh these items heavily in the final score they assigned. This suggests that it is necessary to have an orderly schedule to follow in the exam ination o f patients. O f course, the value o f the qualitative versus the quantitative assessment systems cannot be decided finally until the predictive validity of each method is evaluated—the relationship be tween the process scores (qualitative and quan titative) and scores based on indexes of patient health. Although unlikely, it is possible that the qualitative score correlates more closely with the number of D M F teeth or periodontal index than does the quantitative score. This issue is currently under investigation.
Summary This paper presents a description o f the process of development and testing of clinical criteria and standards for general dentistry. A standards committee made up of nine dental practitioners established criteria for three components of care: the history and examination, treatment planning, and treatment. The standards establish what the average general dental practitioner should do to provide adequate care. Criteria could not be de veloped for all services performed by dentists, so only the more common ones were considered. After a limited pretest of the criteria, five dentists were trained to use them in the evalua tion o f patients. Two independent assessments of 47 dental patients were made in an effort to determine the reliability and variability o f speci fic items and the practicality of the entire sys tem. In addition, the validity o f the criteria was evaluated by a comparison of the scores as signed on the quantitative review system to those obtained independently on the basis of a qualita tive rating scheme. The results indicated a high degree of reliabil ity for most items. H owever, the variability was often quite low; this may be explained partially by the limited number and diversity of patients evaluated.
The time needed to train assessors and to ex amine patients was within practical limits; thus, this system is feasible on a larger scale. With further refinements in the review procedures, the time and, therefore, the expense of evalua tion can probably be reduced even more. The correlation coefficient between the quan titative and qualitative scoring systems was low but statistically significant. This indicates that the simpler qualitative method cannot be sub stituted for the quantitative approach. The major point to be emphasized from this study is that criteria and standards of dental care can be developed and tested by local dental groups. These standards may serve as the basis for any government-sponsored evaluation sys tem. A uthor’s note: G . Ryge and M. Snyder pub lished an investigation (“ Evaluating the clinical quality o f restorations” ) on the development and testing o f process standards for assessing the technical quality o f restorations (JA D A 87:369 Aug 1973).
This w ork was supported in part by contract NIH-72-4207 from the Division o f Dental Health, National Institutes o f Health, US P ublic Health Service. The au thors acknow ledge the assistance of Dr. Earle Yeamans and Robert Villanova; the members o f the standards com m ittee: Dr. Nathan Dubin, Dr. Stanley Holzman, Dr. Robert Levine, Dr. Calvin Mass, Dr. S edrick Rawlins, Dr. Robert Villanova, and Dr. John Zazzaro; the Hartford Dental Society and the State Board o f Dental Examiners; Dr. Michael Zazzaro, secretary o f the board; and Ms. Joan Jannace and Ms. Sharon Zarcaro. Dr. B ailit is professor and head of the departm ent o f behavioral sciences and com m unity health, School o f Dental Medicine, University o f C onnecticut Health Center, Farm ington, Conn 06032.
Dr. Koslowsky is assistant professor, departm ent of behavioral sciences and com m unity health, and Dr. Grasso is assistant pro fessor, departm ent o f general dentistry, School o f Dental Med icine, University o f C onnecticut Health Center. Dr. H olzman and Dr. Levine are private practitione rs and members o f the Hartford Dental Society. Ms. Valluzzo and Ms. A tw ood are research as sistants in the departm ent of behavioral sciences and com m un ity health, U niversity o f C onnecticut Health Center. 1. Egdahl, R.H. Foundations fo r medical care. New Engl J Med 288:491 March 8, 1973. 2. Abram ow itz, J. Planning fo r the Indian Health Service. J P ublic Health Dent 31:70 S pring 1971. 3. Friedman, J.W. A guide fo r th e evaluation of dental care. Los Angeles, School o f P ublic Health, University o f C alifornia, 1972. 4. S chonfeld, H.K., and others. Professional dental standards fo r the con tent of dental exam inations. JADA 77:870 O ct 1968. 5. S oricelli, D.A. M ethods o f adm inistrative control fo r the pro m otion o f quality in dental programs. Am J P ublic Health 58:1723 Sept 1968. 6. Cons, N.C. Method fo r posttreatm ent evaluation of the qual ity of dental care. J P ublic Health Dent 3 1 :104 S pring 1971. 7. Lee, R.I., and Jones, L.W. The fundam entals o f good med ical care. P ublication of the C om m ittee on the Costs of Medical Care, no. 22. Chicago, C hicago University Press, 1933. 8. Roemer, M.l. Evaluation o f health service program s and lev els of measurement. HSMHA Health Rep 86:839 Sept 1971. 9. Schonfeld, H.K., and others. The content o f good dental care: m ethodology in a form ulatio n fo r clinical standards and audits, and prelim inary find in gs. Am J P ublic Health 57:1137 July 1967. 10. Donabedian, A. Evaluating the quality o f medical care. M ilbank Mem Fund Q 44:166 July (Suppl) 1966. 11. Friedman, J.W. Study and appraisal guide fo r dental care programs. Berkeley, School o f P ublic Health, D ivision o f P ublic Health and Medical A dm inistration, University of C alifornia, May 1963. 12. Peterson, O.L., and others. An analytical study o f North C arolina general practice: 1953-1954. J Med E duc31:1 Dec 1956. 13. Donabedian, A. A guide to medical care adm inistration. V olum e II: Medical care appraisal—qu ality and utilization. New York, The American P ublic Health Association, Inc., 1969. 14. Kessner, D.M.; Kalk, C.E.; and Singer, J. Assessing health qu ality—the case fo r tracers. New Engl J Med 288:189 Jan 25, 1973. 15. B eilin, L.E.; and Kavaler, F. P olicing p u blicly funded health care fo r poor quality, overutilization, and fraud— the New Y ork C ity M edicaid experience. Am J P ublic Health 60:811 May 1970. 16. Denenberg, H.S. A shopper's guide to dentistry. Harrisburg, Pennsylvania Insurance Department, 1973.
Bailit—others: QUALITY OF DENTAL CARE ■ 853