Development of a sensitive clinical facial grading system BRENDA G. ROSS, MSc[PT], GAETON FRADET, MD, FRCSlC], and JULIAN M. NEDZELSKI, MD, FRCS[C).
Toronto, Ontario, Canada Clinicians require a reliable and valid method of evaluating facial function after facial nerve injury. This tool should be clinically relevant and easy to administer, provide a quantitative score for reporting purposes, and be sensitive enough to detect clinically important change over time or with treatment. The proposed facial grading system has all essential information, including precise definitions for each item, presented on one page. The facial grading system is based on the evaluation of resting symmetry, degree of voluntary excursion of facial muscles, and degree of synkinesis associated with specified voluntary movement. Different regions of the f a c e are examined separately with the use of five standard expressions. All items are evaluated on point scales, and a cumulative composite score is tabulated. Construct validity was addressed by comparing the proposed facial grading system to prerehabilitation and postrehabilitation treatment scores of 19 patients with varying degrees of facial nerve injury. All patients had documented change in a controlled study of feedback training. The proposed system reports results in a more continuous manner with a wider response range than the House-Brackmann grades. Each component of the grading system is sensitive to change and individually contributes to a change in the composite score. Tests of interrater reliability are currently near completion. (OTOLARYNGOLHEADNECKSURG1996;'1'14:380-6.)
T h e lack of a universally accepted grading system for facial palsy is a widely recognized problem despite 20 years of proposals and debate. H2 Existing systems vary dramatically in their approach. Inadequate attention has been paid to the question of validity and responsiveness of the different grading systems. Clinicians require an objective, reliable, and valid clinical tool to accurately describe a patient's facial function, to monitor status over time, and to assess the course of recovery and the effects of treatment. For a grading system to have clinical usefulness, it must be easy to administer, require little time or equipment, and be sensitive enough to From SunnybrookHealth ScienceCentre Toronto (Ms. Ross and Dr. Nedzelski), the Centre Medical Quatre Bourgeois (Dr. Fradet), and the Department of Otolaryngology(Dr. Nedzelski), University of Toronto. Dr. Fradet is now at the Centre Medical Quatre Bourgeois. Presented at the VII International Symposium on the Facial Nerve, Cologne, Germany,June 10, 1992. Received for publication March 21, 1995;accepted July 17, 1995. Reprint requests: Julian M. Nedzelski, MD, FRCS(C), Department of Otolaryngology,SunnybrookHealth Science Centre, A217, 2075 BayviewAve., Toronto, Ontario, Canada M4N 3M5. Copyright © 1996by the American Academyof OtolaryngologyHead and Neck Surgery Foundation, Inc. 0194-5998/96/$5.00 + 0 23/1/67890 380
detect clinically important change. One limitation of the commonly used House-Brackmann (H-B) facial grading system (FGS) is that the range of scoring does not reflect clinically important change. The purpose of this study was to develop a clear, well-defined system that provides an accurate description of facial motor function and is responsive to clinically important change. Specific objectives were to identify the necessary components of a FGS, construct a measure, and carry out basic tests of validity and responsiveness. An additional objective was to test the scoring and weighting components of the instrument. METHODS C o n t e n t Validity
Important content areas and underlying components of the FGS were identified by a literature review, by judgmental appraisal by clinicians, and by the clinical justification of its attributes. 13Construction of the final measure was achieved after use with a broad spectrum of patients with acute and chronic facial nerve paresis and by informal exchange with physicians, facial physiotherapists, and patients. Raters were asked to comment on the clarity of the proposed FGS, the suitability of the rating scales, the importance of each component, the appropri-
Otolaryngology Head and Neck Surgery Volume 1t4 Number 3
ateness of the weighting of the variables in determining the composite score, and the quality of the basic data. Patients were also asked to rate the importance of selected items to be included in the measure. Similarities of existing FGSs include evaluation of resting symmetry, scleral exposure, degrees of facial movements (including movement of landmarks), secondary effects of synkinesis, contracture, and hemifacial spasm. 4 Important content for this proposed FGS was identified as (1) evaluation of the face at rest compared with the normal side, (2) degree of maximal excursion of facial muscles compared with the normal side, and (3) degree of synkinesis associated with specified voluntary movement (Fig. 1). The different regions of the face are examined separately with five standard facial expressions, and responses are graded with the use of point scales. Resting symmetry is assessed by comparison of the palpebral fissure (normal, narrow, wide), nasolabial fold (normal, absent, less or more pronounced) and the corner of the mouth (normal, drooped or pulled up/out) to the normal side (Fig. 1). For example, if the nasolabial fold is less or more pronounced than the normal side, a 1 is scored; if it is absent, a 2 is scored. Then different regions of the face are examined separately, with five standard expressions used to evaluate the symmetry of voluntary movement and the degree of synkinesis associated with the movement. The five standard expressions reflect the motor function of the five peripheral branches of the facial nerve. The symmetry of voluntary movement for each standard expression is graded on a scale from 1 to 5, depending on the degree of muscle excursion compared with that of the normal side (Fig. 1). To avoid the problem of a scale requiring discernments that are too difficult to make, a five-point scale was chosen. 13 The degree of synkinesis associated with each standard expression is graded on a four-point scale from 0 (no synkinesis) to 3 (severe synkinesis). Each dimension is totaled, and the components are combined to obtain one overall composite score. The scores were weighted to result in a composite score of 100 for normal facial function and a score of 0 for complete facial paralysis. The scores are weighted as follows: the resting symmetry score is multiplied by five, and the voluntary movement score is multiplied by four. The composite score is derived by subtracting the resting symmetry score and synkinesis score from the voluntary movement score.
ROSS et ai.
381
This scoring scheme reflects a common language for both clinicians and patients. It was tested by comparing individual dimension change scores (i.e., rest, voluntary movement, and synkinesis) to composite change scores and also by correlational tests of the individual components of the measure. Construct Validity
Because there is no gold standard or reference criterion available for grading facial nerve paresis, the main focus of quantitative appraisal of the FGS was to evaluate the performance of the measure in discriminating and identifying change in a group of patients with previously documented changes in facial function. The second strategy for appraising the validity of the FGS was to compare the grading system with existing cognate measures, with the H-B system, and with linear measurement of facial movements. Nineteen patients with unilateral longstanding facial nerve paresis composed our study groups. Each patient was a participant in a previous prospective controlled study of facial rehabilitation14 and had undergone detailed facial assessment before and after treatment. The outcome measures for this previous study were (1) linear measurements, (2) a detailed visual assessment of videotapes by a blinded, independent appraiser, and (3) the H-B facial nerve grading system. Statistically significant improvement of facial function was documented in this patient group with respect to linear measurement of facial expressions (p < 0.01) and the detailed visual assessment (p < 0.03). For this validation study, an independent, blinded visual assessment was performed with the same pretreatment and posttreatment videotapes with the use of the proposed FGS. Pearson correlation coefficients were calculated for the relationships between voluntary movement scores of the FGS and linear measures and for the relationships between the individual components of the FGS. Student's t test was used to compare means of the FGS and its components and of the H-B grades before and after treatment. RESULTS
Tests of correlation were carried out to determine the interrelationship of the different components of the FGS (i.e., rest, voluntary movement, and synkinesis). When change scores from one component of the grading system were correlated with every other component, we found that the components were relatively independent of one another (Table 1).
382
Otolaryngology Head a n d Neck Surgery March 4996
ROSS et al.
Facial Grading System Comparedto normalside Eye
Degree of muscleEXCURSION comparedto normalside
(chooseone only) normal net.row wide eyelid surgery
O 1 1 1
Cheek (naso-lablalfold) normal absent less pmnounced more pronounced
0 2 1 1
._~~
•-*!Standard
Mouth normal 0 corner drooped 1 corner pulled up/out 1
RatethedegreeofINVOLUNTARYMUSCLE CONTRACTIONassociatedwitheachexpression #
-~'
"-" ~
Patient'sname
Dx
~
~-~
ii@iiii
i ~ i:'~= ' =:;:=:iT"I"iiiMiiii"Iii' "i iMili iT]'iiii'iiiIiiiiiiiiiiiiiiiiiiiiTiTi"iiiii~i[i!i
..............;~ ~!~,~L~:~:~,~:,~:`~`~`~L~=~``~,:~`~:,~`~,~`~,~,~,~,~.......... ,~,~i"!;~;~;~`~, iiiiiiiiiiiiiiii1 !liiiiiiiiiili
!iii i11 ii i
Gentleeye clssum(OCS)
1
2
3
4
5
Q
0
iiii
1
i+
2
iI
i I ii~: :~:=;~:I
3
U
iiN iil;]iiiiiiiiiiiiili!iiiiiiiiiiiiiiii;liiiili]ii!ii!iNlii iiii!iiiilii!ii!iiiiii!iiiiiiii!;iiiiiiilii!iiii!iiiiiiiii Snarl (LL~LS)
1
2
3
4
5
Q
0
1
2
3
Q
ii i i i ili iil ii ii: iiii i!ii'; iiiiiiiiiiiiiiiiii ll iiii i!',ii!iiiiiii',ii:iii,
::i~.~.~. ,5~ ~ ,~ ~: i:i:i:i;i:[:i:~:i:i:i:i:!:i:~i:i:i:i:i:i:~:i:~:i:[:i:::i:i:i:i:i::i: i:!i:i:i:~:i:i:!!:i:i:i:ii:i:i:i:i:i:~:!1:~:!i:)I: i:i:i:i:i:i;if:!:i:i:i:;i:i:IIA:i:i:i:i:i:i:i:~i:i:i:i:i:i::~II~i;
Resting symmetry score
~.~
/
A="
~ ~
~
Total X $
i;i:iiii:i
Q
Voluntarymovementscore:
Total x 4
~:...................... ~
[~
Synklneslsscore:
Total Q
:~i~ii~i!~ii~i~i~i~i~i~i~i~iiiiii~ii~iii~iii~iii~iili~lii:iiil ~:~~%iiiiiiiiii ~!i!ii~iii!ii!i!i~!i!iiii~~i~i~ iiiliiii~iiiiiiiiiiiiiiii!iii!illiliii!iiiiii~il
Date
Fig. 1. Facial grading system.
Table
1. Pearson correlations between components of FGS Component
Voluntary movement
Synkinesia
Rest
Composite score
Voluntary movement Synkinesia Rest Composite score
1,0000 -0.0763 (p = 0.7492) 0,0255 (p = 0.9150) 0.5595 (p = 0.0103)
1.0000 0.2136 (p = 0.3659) -0.6681 (p = 0.0013)
- 0.6470 (p = 0.0020)
1.0000
Each dimension makes a distinctive contribution to the grading system. There were no significant correlations between the individual components of the FGS; however, each of the components demonstrated equally significant correlation to the composite score. The multiplyers were also tested by comparing the individual dimension change scores to composite change scores. Figure 2 illustrates that each component of the grading system is sensitive to change and contributes to a change in the composite score. The composite score behaves the same way as the individual dimension scores without the multiplyers. The FGS was sensitive to changes with rehabilitation intervention, as illustrated in Fig. 3. With
1.0000
Student's t test, the comparison of mean FGS composite scores before and after treatment detected highly statistically significant change (p = 0.0000). Highly significant change is detected similarly in each of the three components of the FGS before and after treatment (Fig. 2). The voluntary movement score demonstrated a statistically significant increase (p = 0.0000), whereas the resting asymmetry score and synkinesis score were significantly decreased (p = 0.0040 and p = 0.0010, respectively). No statistically significant difference was found with the H-B grade before and after treatment (p = 0.54) (Fig. 4). Pretreatment and posttreatment linear measurement scores were compared with the voluntary
Otolaryngology Head a n d Neck Surgery Volume 114 Number 3
100 o 0 0 09 ¢-
.o co
ROSS et al.
Resting Asymmetry
Voluntary Movement
Synkinesis
383
vI V
,o r-
00
E
IV
O
£3
40
m
20
-r
!11
19 o~
0 "5
O
0
pre )ost p=O.O000
pre post p=0.0040
ii
Pre
pre post p=0.0010
Fig. 2. Paired t test shows highly significant c h a n g e in e a c h c o m p o n e n t of the FGS before a n d after treatment.
Post p=0.1036
Fig. 4. Paired t test illustrates no statistically significant differe n c e found with the H-B grading system as an o u t c o m e measure before a n d after treatment (p = 0.1036),
100 o Pre FGS * Post FGS
100 80 O O
03 (,9 O I3.
E O o
o
o CO
60
80
o~"
o 60 E O o CO 4O
40
20
S
IJ -
20 Pre
Post p= 0.000
Fig, 3. Paired t test shows a highly statistically significant difference before a n d after treatment with the FGS composite score as an o u t c o m e measure (p = 0.0000).
movement scores of the FGS for forehead wrinkle, eye closure, smile, and snarl (Table 2). There is significant correlation between linear measures and the FGS vohmtary movement scores only with forehead movement. The FGS succeeds in reporting results in a more continuous manner with a wider response range than the gross H-B grades. There was no definitivechange in the majority of patients with the H-B grading system as an evaluative tool (Fig. 4). In fact, most of the patients were rated as grade 3 before and after rehabilitation treatment (Fig. 5). The range of FGS values is illustrated within each H-B grade,
0
I
II
I
III
IV
House-Brackmann Grade Fig. 5. Individual pretreatment a n d posttreatment FGS scores within e a c h H-B grade. Open circles are pretreatment scores, a n d closed circles are posttreatment scores.
which highlights the sensitivity of the proposed FGS. The FGS detected a definitive change in all patients that was similar in direction and magnitude to changes previously documented. The FGS was also sensitive to the severity of dysfunction and to the gradient of change before and after treatment. Figure 6 illustrates the changes of the composite score before and after rehabilitation treatment in a positive direction. Figures 7, 8, and 9 illustrate the changes within each dimension of the FGS. The voluntary movement dimension behaves in a similar
384
Otolaryngology Head and Neck Surgery March 1996
ROSS et al.
100
25
O
Oo O
0
0 (/') rj) 0 o_
O O
E
0 0
"E
O
O0
E
O
0
> 0
O
0
0 0
O c
0
O
0
0
I
I
I
0
I
I,
1O0
I
I
I
25
Voluntary Movement (Post)
Composite Score (Post)
Fig. 6. Pretreatment composite scores plotted against posttreatment scores depicting positive change.
I
5
Fig. 7. Voluntary movement dimension pretreatment scores plotted against posttreatment scores illustrating improvements in voluntary movement with treatment.
Table 2. Pearson correlations between linear measures and voluntary m o v e m e n t scores of FGS before and after treatment (n = ~5)
Facial expression
Pretreatment
Forehead wrinkle Eye closure Snarl Smile
0.5464 0.2087 0.4638 0,3080
Posttreatment
Difference between pretreatment and posttreatment values
i
0.5869 0.4454 0,1302 0.1916
0.7480* 0.4321 0.0946 0,1272
*p = 0.001.
manner to the composite score (Fig. 7). There is noticeable improvement in the synkinesis dimension after treatment (Fig. 8). There is only nominal change in the resting symmetry dimension illustrated in Fig. 9, in which each data point represents more than one patient. These results confirm that varying levels of change are detected by the individual components of the FGS and, in turn, are reflected in the composite score. DISCUSSION
A uniform method of evaluating and reporting facial motor function is fundamental. Investigators would agree that the basic terminology used to describe facial function has been standardized to some degree. When clinical results are described, the functional parameters of resting symmetry, voluntary movement, and synkinesia must be evaluated. An FGS should accurately describe a patient's facial function and monitor their status over time to assess the course of recovery and the effects of treatment but be brief enough to be compatible with clinical practice.
The results of this validation study support the use of the proposed FGS as a valid measure of change in facial motor function. The chosen external criteria demonstrate relationships with the FGS similar in strength and direction to those previously documented. The FGS detected significant change when change was believed to have taken place and sufficient sensitivity to detect varying levels of change. There is also evidence to support the scoring and weighting of the FGS. The allotment of scores is important when constructing an FGS. Each expression in our FGS is assigned an equal value; therefore muscle groups of lesser and greater importance earn equal representation in the final computation. The aggregation and weighting of each dimension in determining the final score of a quantitative measure is an important consideration) 5The aggregation of different components into a single output scale is a distinctive clinimetric strategy that allows a complex phenomenon to be cited, with one rating for the entire phenomenon. 13An aggregated index is often used for clinical communication, with the main
Otolaryngology Head and Neok Surgery Volume 114 Number 3
ROSS et ai.
385
15
0
EL
0 0 0
EL d)
0
0
~
i
rr
c
CO
o
K
I
SynkinesisPost
I
I
15
Fig. 8. Synkinesis dimension pretreatment scores plotted against posttreatment scores illustrating substantial improvements in synkinesia as a result of treatment.
advantage being that a single overall rating is achieved with distinctive specifications for the components. Because the purpose of this measure is to evaluate treatment, an overall judgment is required as to whether a patient is the same, better, or worse. Therefore a summary score of facial function is of more value than individual dimension scores. A composite score also allows for the grouping of information from a number of subjects for statistical analysis. Each component of this FGS has differential weighting in determining the final score (Fig. 1). The face validity of the multiplyers is such that an individual with complete paralysis would have a composite score of 0, and an individual with normal facial function would have a score of 100. The voluntary movement score purposefully has a smaller point scale (0 to 4) than the synkinesis score (0 to 15), so its weighting ( x 5) to the composite score is appropriate. The results of this study confirm that the multiplyers do not obscure the component data because the composite score behaves the same way as the individual dimension scores. A summary score from 0 to 100 reflecting the continuum of complete paralysis to normal function is preferable for clinical communication and can be justified from these study results. In this proposed grading system, the resting symmetry dimension is slower to change with rehabilitation treatment than the voluntary movement and synkinesis dimensions (Fig. 9). In a previous study that compared different facial grading systems, a major variable of disagreement was of resting tone. 4 Patients with relatively poor recovery may still have
OI° 0
RestPost
.3_
4
Fig. 9. Restingsymmetrydimensionpretreatmentscoresplotted against posttreatment scores. Each d a t a point represents more than o n e patient. There is only nominal c h a n g e in the resting symmetry dimension as a result of treatment.
satisfactory tone, but differences in the eye, nasolabial fold, and corner of mouth compared with the normal side should be apparent. Patients will not receive top scores for only satisfactory tone. Resting symmetry is more a function of volume of facial reinnervation and is less influenced with rehabilitation treatment compared with synkinesia and voluntary movement. There must be a significant increase in muscle strength or in viable motor units to see change in the resting dimension scores. Smith et al. s compared the validity and consistency of nine leading FGSs, including the H-B system, in a small group of 10 patients evaluated by four clinicians. Validity was evaluated by comparing each grading system with the standard of the overall impression or best clinical judgment of the observers. The Stennert system was found to have the most agreement with best clinical judgment (r = 0.78), and the House system had the poorest agreement (r = 0.75); however there was no significant difference between any of these systems. Best clinical judgment or overall impression is a poor test of criterion validity. The H-B system was developed as a gross scale. Its basic purpose was to rate patients according to general categories and not to give specific details about facial functionY The advantage of this system is its wide application and its relatively good interobserver reliability. Although a scale with a small number of categories is easier to use, it does not discriminate well when used to detect changes with treatment TM or to detect differences between patients. 12
386
Otolaryngology Head and Neck Surgery March 1996
ROSS et al.
The FGS showed similar relationships to the H-B system when compared with the Burres-Fisch linear measurement index. 1°This mathematical analysis of anatomic features is an objective, quantifiable approach to assessing facial motor functionY 2 The indexes are continuous variables and therefore are feasible as an outcome measure for controlled study. I4 Limitations of using linear measurement for evaluating facial motor function include the following. (1) Linear measurements of facial movement can be highly variable from test to test and from day to day. ~a (2) Linear excursion of facial movement does not address the complexity of synkinesia. The excursion of facial movement may be in an opposite direction to the synkinetic movement, or the synkinesia may be a synergistic associated movement.7 (3) The methods are time-consuming, and the calculations of the linear measurement index are complex. (4) When linear measurement indexes were compared with H-B grades, there was only moderate correlation (0.63 to 0.77 with two observers of 18 patients), 12 and measurements were consistently 20% less than the House evaluations? Linear measurement indexes and FGSs produce different information.9m Computerized, quantitative analysis systems for facial motion have been developed for the purpose of producing equal-interval continuous data for the assessment of facial motion? s The basis of these systems is the quantitative analysis of facial images as they change over short intervals of time (i.e., seconds). Neely et al. 7 demonstrated reasonable correlations between the computer-generated parameters and the H-B grades. These systems are sensitive to changes and agree with qualitative human assessment, 7'16but they tend to be complicated and expensive and to produce variable results. In contrast, our proposed FGS is simple, concise, and inexpensive; requires no equipment; and is brief enough to be compatible with clinical practice. CONCLUSION
Establishing validity is an ongoing process, and the more evidence that is gathered with different methods, the more certain the validity of the measure. Continued efforts to increase the database of information with this proposed FGS will allow a better understanding of what a change in score means clinically and what constitutes a clinically important change in function with spontaneous recovery after facial nerve injury or after-medical treatments. The proposed FGS as an evaluative measure is
simple and quick to administer, does not require any specialized equipment, and provides a quantitative score for reporting purposes. Most importantly, on the basis of our results, our FGS is responsive to clinically important change. It succeeds in reporting results in a more continuous manner with a wider response range than the gross H-B grades. Each component of the grading system is sensitive to change and individually contributes to a change in the composite score. Tests of interrater reliability will provide further evidence of the attributes of this proposed FGS. We thank Jack Williams, PhD, Department of Clinical Epidemiology,for valuable guidance and assistance in the area of measurement and biostatistics. REFERENCES 1. May M. Reporting recovery of facial function. In: May M, ed. The facial nerve. New York: Thieme, 1986. 2. House J. Facial nerve grading systems. Laryngoscope 1983; 93:1056-68. 3. House J, Brackmann D. Facial nerve grading system. OTOtaRVNGOL HEAD NECK SURG 1985;93:146-7. 4. Burres S, Fisch U. The comparison of facial grading systems. Arch Otolaryngol Head Neck Surg 1986;112:755-8. 5. Smith IM, Murray JAM, Cull RE, Slattery J. A comparison of facial grading sYstems. Clin Otolaryngol 1992;17:303-7. 6. Buchwald C, Tos M, Thomsen J. Observer variations in the evaluation of facial nerve function after acoustic neuroma surgery. J Otolaryngol Otol 1993;107:1119-21. 7. Neely JG, Cheung JY, Wood M, et al. Computerized quantitative dynamic analysis of facial motion in the paralyzed and syukinetic face. Am J Otol 1992;13:97-107. 8. Johnson PC, Brown H, Kuzon, etal. Simultaneous quantification of facial m o v e m e n t s - t h e maximal static response assay of facial nerve function. Ann Plast Surg 1994;32:171-9. 9. Croxson G, May M, Mester SJ. Grading facial nerve function: House-Brackmann versus Burres-Fisch methods. Am J Otol 1990;11:240-6. 10. Burres SA. Facial biomechanics: the standards of normal. Laryngoscope 1985;95:708-14. 11. Wood DA, Hughes GB, Secid M, Good TL. Objective measurement of normal facial movement with video microscaling. Am J Otol 1994;15:61-5. 12. Jansen C, Devriese PP, Jennekens GI, Wijnne HJ. Lip-length and snout indices in Bell's palsy. Acta Otolaryngol (Stoclda) 1991;11:1065-9. 13. Feinstein AR. Clinimetrics. Haven: Yale University Press, 1987. 14. Ross BG, Nedzelski JM, McLean JA. Efficacy of feedback training in long-standing facial nerve paresis. Laryngoscope 1991;101:744-50. 15. Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. New York: Oxford University Press, 1989. 16. Neely JG, Jekel JF, Cheung JY. Variations in maximum amplitude between and within normal subjects. OTOLARYr~GOL HEAD NECK SURG 1994;110:60-3.