JOURNAL
OF EXPERIMENTAL
CHILD
PSYCHOLOGY
23, 1-22 (1977)
Developmental Changes in the Representation of Faces R. DIAMOND AND S. CAREY Massachusetts Institute of Technology Children from age 6 to 16 judged which of two photographs of unfamiliar faces showed the same person as an inspection photograph. Recognition accuracy improved markedly between ages 6 and 10 with little change thereafter. Six- and eight-year-old children were especially susceptible to error when certain disguises were provided, in both memory and simultaneous presentation conditions. In contrast, when the stimuli depicted familiar faces, six-year-old children made few errors and showed no susceptibility to confounding paraphernalia. We concluded that young children encode new faces in terms of striking, relatively isolated, features. By age 10 or 12 the adult capacity has emerged, enabling configurational representation of a face from very little exposure to it.
We distinguish individual faces despite their high degree of physical similarity; we recognize particular faces unseen for many years; we identify persons whose appearance has been transformed by age or fashion. How are these feats accomplished? The answer to this question involves specifying what aspects of faces are encoded in memory, and how these internal representations are used in the process of face recognition. The representation of an individual’s face probably includes idiosyncratic isolated features-a scar, a bald head, particularly large ears, distinctive eyeglasses. But it appears unlikely that lists of such distinguishing features could comprise our entire internal description of a face: Some do not survive the transformations which recognition survives, and the rest probably would not suffice to individuate the hundreds of faces each of us is capable of recognizing. However, individual faces also present another kind of distinguishing feature-the spatial relations among their various parts. For instance, “the ratio of the distance We are grateful to Dr. Hans-Lukas Teuber for first engaging our interest in the problem of face perception. We would also like to thank Marla Eby, Jean Melamed, John Samuelson, and Barbara Faulkener for assistance in testing and in the analysis of data. In addition, we wish to express thanks to the children who served as subjects, to Dr. Richard Goodman and Mr. Stephen Porter of the Wellesley, Massachusetts School System, to Mr. Richard Barnes, Mr. Charles Johnson, Mr. Eugene Sullivan and Mrs. Elaine Engelberg of the Lexington, Massachusetts School System, to Mrs. Ruth Stokes and teachers at the Cambridge Friends School in Cambridge, Massachusetts, and to Gene Gray of the Mason Rice School in Newton, Massachusetts. Support was provided from Grant Foundation Inc., Spencer Foundation grants to Hans-Lukas Teuber, and NIH Grant I-ROl-HDO9179-01 to the authors. Requests for reprints should be addressed to: Dr. Susan Carey, Department of Psychology, Massachusetts Institute of Technology, Cambridge, Mass. 02139.
Copyright 0 1977 by Academic Press. Inc. All rights of reproduction in any form reserved.
ISSN 00224965
2
DIAMOND
AND
CAREY
the hairline is above the chin to the distance the bridge of the nose is above the center of the upper lip” might serve to individuate a face. Represented in terms of a number of these spatial relations, each face forms a Gestalt as unique as a particular snowflake. In this paper we will document the use of these two kinds of information in face perception and suggest that representation of each has a separate developmental history. The study of the effects of stimulus inversion provides one source of evidence that there are two different ways in which faces may be represented. If a normal adult is shown an inverted face and must later indicate which of two inverted faces he previously saw, he is much less successful than if all the faces are presented upright (Hochberg & Galper, 1967); Yin, 1970a). In Yin’s study, faces inspected and recognized upright had a 30% advantage over inverted faces. Other mono-oriented stimuli, such as houses, bridges, airplanes, costumes, and stick figures of men in motion, were also easier to recognize upright, but the average advantage was only lo- 15%. The effects of inversion on face recognition was significantly greater than for any of these nonface stimuli (Yin, 197Oa). Moreover, subjects who did well at recognizing upright faces did badly at recognizing inverted faces and vice versa. No such inverse relation was found for any of the other stimuli. These results suggest that faces are processed differently from the other mono-oriented stimuli which were tested. This is not true for all perceivers. Yin showed that for some subjects (patients with lesions of the right posterior cortex) recognition of faces was no more impaired by inversion than was recognition of houses. These patients performed much more poorly on upright faces than normal adults; their accuracy on inverted faces and on both upright and inverted houses was equal to that of normals. For the right-hemisphere patients there was no inverse relation between success on upright and inverted faces. In summary, these brain-injured subjects appeared to treat faces just as they treated houses (Yin, 1970b). Thus, faces have the potential of being represented in two different ways, one of which is also used in representing nonfacial stimuli and which remains available after injury to the right posterior sector of the brain. The other form of encoding is available to normal adults but is applied only to upright faces. We agree with Yin and others who have suggested that the two different ways of representing faces involve piecemeal vs configurational information. In this view, configurational encoding requires an intact right hemisphere while piecemeal encoding does not. The differential sensitivity of faces to orientation is also taken as evidence that among the stimuli tested, only faces are represented in configurational terms while all of the classes, including faces, may be represented in terms of piecemeal information. This inference rests, however, on a plausible account of why configurational encoding should be more vulnerable to stimulus inversion than piecemeal encoding. In situations in which an unfamiliar
FACE
RECOGNITION
3
stimulus is presented, we assume that reference to a canonical, upright face is made in both kinds of encoding. To note a piecemeal distinguishing feature such as “bushy eyebrows,” one must first locate the eyebrows, which may involve a spatial predicate such as “above the eyes.” To the extent such predicates are involved, inversion will impair piecemeal encoding of faces. But we suggest that such spatial predicates are even more important in the representation of configurational features of the face. This is both because several points on the face must be located precisely and because the specification of distinctive relations among these points might also involve reference to the canonical face. Do children represent the configurational aspects of faces? In a pairedassociates task, 8-year-olds were not as impaired by inversion of faces as were adults (Goldstein, 1965). In the Yin paradigm, 6- and 8-year-olds recognized faces presented inverted almost as well as those shown upright. Further, inversion affected faces no more than houses, just as in patients with right posterior lesions (Carey & Diamond, 1977). The normal adult advantage for upright faces is found by the time subjects are 10. Thus, it appears that children under 10 represent unfamiliar faces using only that kind of information used for other kinds of mono-oriented stimuli, presumably piecemeal information. The purpose of the present experiments is to provide direct evidence that children under 10 represent unfamiliar faces in terms of relatively isolated features. Results of a quite different paradigm, not involving face recognition per se, are suggestive in this regard. When asked to judge which two of four snapshots of the same person were most alike, young children chose on the basis of common paraphernalia while older children chose on the basis of similar facial expression (Levy-Schoen, 1964; see Example 1). The transition occurred at about age 10. In another study using three photographs of the same person, 4- to 8-year-olds were equally accurate in pairing on the basis of paraphernalia or expression when only one of these cues was common to two of the items. For triads which could be solved either on the basis of expression or paraphernalia. the shift found by Levy-Schoen at age 10 appeared at age 5 (Savitsky & Izard, 1970). These problems involve resemblance. It is clear that the child’s understanding of what is relevant to such a judgment reflects both instructions and context. Judgments of resemblance among photographs of the same person do not bear directly on the question at hand, namely, how are faces represented for the purpose of identification of an individual? Therefore, we adapted the Levy-Schoen paradigm for a study of person recognition (see Examples 2 and 3). The question of “which is the same person?” has a correct answer. Expression and paraphernalia are each confounding cues in some of our items, and exclusive reliance on either will lead to a distinctive pattern of errors. These patterns would provide direct evidence
DIAMOND
AND CAREY
EXAMPLE 1. Stimuli of the type used by L&y-Schoen. L&y-Schoen face unlike the others in both paraphernalia and expression.
presented a fourth
that the child does in fact base judgments of identity on isolated features, in this case on features which are not veridical for the recognition task at hand. EXPERIMENT
1. SHORT-TERM
MEMORY
Marerials. Twelve pairs of young women were chosen as models. The two members of each pair were similar in general coloring and their hair was alike in length and texture. We attempted to pair persons who did not otherwise resemble each other. For each model pair, four types of recognition problems were constructed (see Examples 2 and 3), in which expression and paraphernalia (hats. shirts, scarves, necklaces, eyeglasses, and wigs) were manipulated. The subject was shown the top photograph and then asked to indicate which of the bottom two was of the same person. In Type I problems (paraphernalia-to-fool; expressionequal), if the subject bases his judgment of identity on paraphernalia, he will make the wrong choice. All three photographs show the same expression, so no judgment can be based on expression. In Type II
FACE
5
RECOGNITION
ri -
6
DIAMOND
AND
CAREY
problems (paraphernalia-to-fool, expression-to-help), a judgment based on paraphernalia will be incorrect, while a judgment based on expression will be correct. In Type III problems (paraphernalia-equal, expressionto-fool), no judgment can be based on paraphernalia, while a judgment based on expression will be incorrect. And in Type IV problems (paraphernalia-to-help; expression-to-fool), a judgment based on paraphernalia will be correct; a judgment based on expression will be incorrect. Thus, there were two paraphernalia-to-fool items (Types I and II), one paraphernalia-equal item (Type III) and one paraphernalia-to-help item (Type IV). Similarly, there were two expression-to-fool items (Types III and IV), one expression-equal item (Type I) and one expression-to-help item (Type II). Four alternate sets of problems were constructed. In each set, one problem was contributed by each of the 12 model pairs and there were three examples of each problem type. Thus, over all four sets, each problem type was equally represented for each pair of models. The sequence of items in each set was independently randomized with the constraint that no two successive items be of the same problem type. In addition, four identical control stimuli were added in series positions 1, 2, 4, and 8. On these items the person shown twice was photographed in different costumes or with different expressions, but the distractor’s photograph did not match the target in any obvious way. That is, there was no attempt to fool the child with confounding paraphernalia or confounding expressions. The control stimuli were included as a check that the child understood the task. Also, the youngest children, especially, might be expected to gain a feeling of confidence from the ease with which they could make a choice. Subjects. Equal numbers of boys and girls were tested within each subgroup of 12 subjects, at ages 6, 8, 10, 12, 14, and 16. Our subjects were from a number of middle-class suburbs of Boston. We requested children of normal ability or above, and tested subjects in the order that they volunteered. Procedure. The subjects were tested individually, and given the following instructions: “I’m going to show you a picture of a woman and then ask you to tell me which of two other pictures shows the same woman. Sometimes the clothing will be changed, or the eyeglasses, or even the hair might be different, because some of these women are wearing wigs. And the person might have a different expression than she had, too. So just try to say which of the two pictures is the same person as you first saw.” The subject was shown each item for 5 set with the choice pair covered by a sheet of cardboard and the inspection photograph exposed. Then the inspection photograph was covered and the choice pair revealed. The question, “Which is the same person?” was repeated for each item. The subject was permitted to take as much time as he wished to make a choice. Choices were recorded by hand by the experimenter.
FACE
RECOGNITION
7
Predictions. Consider Examples 2 and 3. These are difficult problems. For success in identifying the target person, both paraphernalia and expression cues should be ignored. The subject must abstract a representation of the target’s face which will enable him to predict how she would look with a different expression or with different paraphernalia. If such a representation involves what we have been calling configurational aspects of a face, then the results of the inversion experiments suggest that children under 10 will be unable to succeed; lo-year-olds should be able to abstract configurational representations more adequate to this task. Thus, our first prediction is that the error rate will decrease between ages 6 and 10, with relatively little change thereafter. If young children must rely on isolated features instead to represent these faces, which features will they be? Three considerations led us to predict that paraphernalia would be sources of isolated features used by young children as a cue to identity, while expression would not. First, paraphernalia are a more veridical cue. In many situations, we do encode the clothes someone is wearing, or hair style, and depend upon these not changing over some limited period of time. In contrast we expect expressions to change from moment to moment. Second, the results of the resemblance task showed reliance on paraphernalia rather than expression until age 10, in the absence of specific instructions or a context which might have increased use of expression. And third, expression is intimately linked with the configurational aspects of a face; perhaps the young child ignores these aspects altogether when attempting to identify unfamiliar faces. It might also be supposed, on the basis of the resemblance judgments, that for lo-year-olds expression might supplant paraphernalia as a source of features to be used as cues to identity. But if, as we have suggested, lo-year-olds are able to extract a configurational representation of a photographed face and do so in this task, this will not occur. That is, unlike pairing on the basis of expression in the resemblance experiments, we expect expression never to be the basis of judgments in this identification task. What patterns of errors would support these predictions? The insert on Fig. 1 shows the expected relative difficulty of the four problem types for 6- and 8-year-olds if paraphernalia, but not expressions, are a source of isolated features on which identity judgments are based. Of course, there are other isolated features-moles, bushy eyebrows, etc. -not manipulated in our experiment. Reliance on such features would lead to a correct judgment on any item. Therefore, evidence that paraphernalia are used as cues can come only from the relative difficulty of the different types. The insert on Fig. 1 summarizes the following predictions: Types I and II (both paraphernalia-to-fool) should produce the most errors, Type III (paraphernalia-equal) the next most, and Type IV (paraphernalia-to-help) the fewest. Note that Types I and II are predicted
8
DIAMOND
AND CAREY
15-%
30-
I
I
II
III
IV
PROBLEM
O
I I
II
Ill
IV
TYPES
FIG. I. Pattern of errors. Experiment
I
to be of equal difficulty. By hypothesis, expression will not be used as a cue to identity at these ages, so that the expression-to-help aspect of Type II will have no effect. If there were to be a point in development where expression supplanted paraphernalia as a source of isolated cues to identity, Types III and IV (both expression-to-fool) would be hardest, Type I (expression-equal) the next hardest, and Type II (expression-to-help) to be easiest. We do not expect to find this pattern at any age. What if paraphernalia and expression were both used as cues to identify? No overall pattern can be predicted because these cues are partially crossed in our design. However, there are two relationships among error frequency which should occur: Type III (P=/Ef) should be harder than Type IV (Ph/Ef) since expression is confounding in both these types and paraphernalia equal in Type III and to-help in Type IV. Also, Type I (Pf/E=) should be harder than Type II (Pf/Eh), since paraphernalia is confounding in both and expression is equal in Type I and to-help in Type II. Note that we do not expect to find evidence that expression is used as a sole cue, nor even as a cue along with paraphernalia. Rather, we expect the pattern of errors shown on the insert of Fig. 1 to obtain for 6- and 8year-olds, and that by age 10 significant differences in difficulty among problem types should be absent. By this age, as with adults, we expect errors to be largely determined by actual resemblances among models. Also since both extreme expressions and paraphernalia which mask part of the face interfere with representation of permanent facial structure, all problem types present some difficulty (see Examples 2 and 3.) To the degree our expressions or paraphernalia produce such interference on a particular item, it will be difficult.
9
FACE RECOGNITION
AGE
-
memory condition types I) II ( m, m
n--n
simultaneous types I, II,
III,
condltlon IX!
-
memory control
condltlon StIlllull
o---s
simultaneous condltlon control stlmull
(years)
FIG. 2. Errors (%) on Problem Types I-IV, combined and on control Items. Experiments (memory condition) and 2 (simultaneous condition).
1
Results. Figure 2 shows the changes in error rates with increasing age for all four problem types combined. There is marked improvement between the ages of 6 and 10 and little subsequent change. This is a difficult task, and even our oldest subjects made errors on about 10% of the items. In contrast, there are extremely low error rates at all ages on the control stimuli (Fig. 2). Figure 1 shows the patterns of errors across the problem types. At age 6, all the predicted differences are significant (Wilcoxon withinmodel-pairs comparison; see Table 1). At age 8, all predicted differences are significant except that Type III is no longer harder than Type IV. By age 10, only Type I is significantly harder than Types III and IV. Thus, 6-year-olds are highly susceptible to confounding paraphernalia cues, and this susceptibility has decreased markedly by age 10. However, it is not until age 12 that all of the asymmetries reflecting reliance on paraphernalia cues have disappeared. Errors due to expression do not replace errors due to paraphernalia at age 10. At no age are Types III and IV (expression-to-fool) harder than Types I and 11 (expression-equal and expression-to-help). Further, expression does not seem to be used as a cue along with paraphernalia.
10
DIAMOND
AND CAREY
TABLE
1
EXPERIMENT 1. SIGNIFICANCE LEVEL OF PREDICTED DIFFERENCES (WILCOXON SIGNED-RANKS ANALYSIS, WITHIN MODEL-PAIRS, ONE-TAILED): MEMORY CONDITION
Predictions Age
I > IV
II > IV
I > III
II > III
III > IV
6 8 10 12 14 16
.005 .Ol .05 ns ns ns
.005 .Ol ns ns ns ns
.005 .05 .025 ns ns ns
.005 ,025 ns ns ns ns
,025 ns ns ns ns ns
For 6- and gyear-olds, Types I (Pf/E=) and II (Pf/Eh) are equally difficult (see Fig. 2). At ages 10 and 14, Type I appears harder than Type II, but at ages 12 and 16 the reverse is true. None of these differences is significant. Conclusions. Our major predictions were confirmed. Accuracy of face recognition improved markedly between the ages of 6 and 10 and changed very little thereafter. However, there was some further decrease in error rate between ages 10 and 12 (see Fig. 2) and some continuing use of paraphernalia cues at age 10 (Table 1). The overall improvement from ages 6 to 10 was due mainly to a decreasing reliance on paraphernaha cues; that is, to a decrease in errors on problem Types I and II (both paraphernalia-to-fool). While paraphernalia are used by 6- and g-yearolds as cues to identity, expressions are not so used at any of the ages studied. Thus, in Experiment I, young children’s judgments of identity were often based on a class of isolated features, paraphernalia, not veridical for such decisions. It might be thought that the youngest children simply did not understand the task. Perhaps they thought they were being asked to choose which two people were dressed alike or had the same hair styles. Evidence against this possibility is that even the youngest children made very few errors on control stimuli even when the two photographs of the target person differed in paraphernalia. In addition, if our subjects had been attempting to match paraphernalia, they would have made 100% errors on Types I and II (paraphernalia-to-fool). Instead, only about 70% of the 6-year-olds’ and 50% of the 8-year-olds’ judgments were incorrect on the paraphernalia-to-fool stimuli. We conclude that the youngest children were in fact attempting to find “the same person” but were very much influenced by matching-paraphernalia cues. In another memory paradigm, children under 10 appear to process faces as they do other mono-oriented stimuli (Carey & Diamond, 1977).
11
FACE RECOGNITION
60 ages 12
n
45 -
0
14
A
16
30 -
‘:b II
PROBLEM
III
IV
TYPES
FIG 3. Pattern of errors. Experiment
2.
This result, like that found in Experiment 1, was interpreted as evidence for young children’s reliance on relatively isolated cues as a basis for their judgments of identity. These two memory experiments leave completely open the stage in processing where isolated cues come to dominate. Is it the initial coding of the stimulus ? That is, are faces represented mentally in terms of such features? Or is it in the recognition stage? Perhaps only such isolated features can be retrieved easily from the mental representation or used easily in the matching to the new stimulus. Experiment 2 was designed to address this question. EXPERIMENT
2. SIMULTANEOUS
JUDGMENTS
The subjects and materials used in Experiment 2 were exactly the same as those in Experiment 1. Procedure. Immediately after the completion of Experiment 1, the child was told that he had guessed wrong in a few cases and asked if he wanted to look at them all again to see if he wanted to change his mind about any of his judgments. The children found this task intriguing and all readily agreed to go through the set again. This time, the child was allowed to view all three photographs at the same time, and to take as long as he liked to make his judgment. Predictions. Experiment 2 should produce fewer errors than Experiment 1 no matter what kinds of representations the child is using. After all, the child can look back and forth from face to face, checking particular features so detailed as not to be represented in memory. But insofar as the performance on Experiment 1 was determined by perceptual processes, that is, by limitations on the kind of encoding the young
12
DIAMOND
AND TABLE
EXPERIMENT
CAREY 2
2. SIGNIFICANCE LEVEL OF PREDICTED DIFFERENCES (WILCOXON RANKS ANALYSIS, WITHIN MODEL-PAIRS, ONE-TAILED): SIMULTANEOUS CONDITION
SIGNED
Predictions Age 6 8 10 12 14 16
I > IV
II > IV
I > III
II > III
.005 .Ol .005 ns ns ns
,005 ,025 ns ns ns ns
.005 ,025 ns ns ns ns
,005 ns ns ns ns ns
III
> IV .05 ns ns ns ns ns
child is capable of, the pattern of errors in Experiment 2 should reflect reliance on paraphernalia just as in Experiment 1. Results. Figure 2 shows that across all four problem types there are only slightly fewer (10%) errors in the simultaneous condition than in the memory condition for all age groups, implying a common basis for the two judgments. Analysis of the pattern of errors on the four problem types provides direct evidence that in this task, perceptual organization and memorial representation take similar forms. Fig. 3 and Table 2 show that paraphernalia provide a source of features used in identity judgments in Experiment 2 just as in Experiment 1. All five predictions are borne out for 6-year-olds and three of the five for 8-year-olds. Only one of the five comparisons is significant for IO-year-olds, and by age 12 there are no significant differences in difficulty between any pair of problem types. The similarities in results between Experiments 1 and 2 might be due to the children remembering their previous choice and repeating it. This is unlikely, since on about 20% of the items the three youngest groups made the opposite choice in the simultaneous condition than they had made in the memory condition. This included changes of judgment from correct to incorrect (6%) as well as vice versa. The children often took a long time over their judgments in th,: simultaneous condition and often wanted to know which they had chosen before (information which was not given). Thus, the comparability of frequency and type of error in the two judgment conditions suggests selective and rather inflexible encoding. Just as in Experiment 1, expression never supplants paraphernalia as the sole source of confounding (Types III and IV, expression-to-fool) are never harder than Types I and II (expression-equal and expression-tohelp). However, at ages 6.8, and 10. and especially age 8, Type II (Pf/Eh)
13
FACE RECOGNITION TABLE
3
EFFECTS OF EXTREME DIFFERENCES IN EXPRESSION ON ERRORS IN TYPE II (Pf/Eh) PROBLEMSO Percentage errors on those Type II items with extreme differences in expression (50% = chance)
Percentage errors on Type I (Pf/E=) items in which those same model pairs appear (50% = chance)
Age
Memory
Simultaneous
Memory
6 8 IO 12 14 16
42 31 67 73 78 86
43 50 60 loo loo 86
46 65 58 50 54 60
Simultaneous 46 61 62
100 67 50
u The number of errors on Type I and II problems after age 8 is very small so that none of the error rates is statistically different from chance. Nevertheless, as predicted. the percentages tend to increase with age for Type II but not for Type I.
appears to be easier than Type I (Pf/E=). At none of these ages is this difference significant, but this trend suggests that expression is being used as a source of isolated features, in a secondary role to paraphernalia. However, at ages 12, 14, and 16, Type II is actually harder than Type I, as was the case at ages 12 and 16 in Experiment 1. Although none of these differences is statistically significant, the consistency of the pattern, especially in Experiment 2, suggests that expression confounding plays quite a different role at ages 10 and before than at ages 12 and after. What might this role be for the older children? The task of encoding an unfamiliar face from a still photograph involves separating the “permanent facial configuration” from the “momentary configuration due to a particular expression” (Gombrich, 1972). When expressions are all the same (Type I) one can pick out which face is different from the target without having completely disentangled the two aspects. But when the distractor has a different expression (Type II), the older subject might experience difficulty in assessing how she might look with the “matching” expression. This source of difficulty, tending to make Type II harder than Type I, should be more pronounced in those stimuli with the most exaggerated expressions, for it is in these cases that the subject should find it hardest to disentangle those differences due to expression and those due to facial structure. Table 3 shows that this is the case. Three judges inspected all 12 Type II (Pf/Eh) items and reached consensus on a division into 2 groups; 6 items with relatively great differences in the two expressions and 6 with relatively slight differences. The percentage of total errors on Type II problems contributed by the
14
DIAMOND
AND
CAREY
group of items with gross differences in expression was computed (.X% is chance). With increasing age, errors on Type II problems become progressively more concentrated on items in which the two expressions are markedly different (Table 3). This is true both in the memory condition and the simultaneous condition. To establish that this result is not due to an interaction of age with confusability of the particular pairs of models, a comparable computation was carried out for Type I (Pf/E=) items. There was no such age-related trend toward a higher concentration of errors (Table 3). Thus, among our older subjects, there seems to be a limited capacity to disentangle permanent facial characteristics from momentary expression. When coupled with the strategy of seeking a definite mismatch between one member of the choice pair and the inspection photograph, this factor means that Problem Types I, III, and IV are all readily solved by older subjects because the common expression of distractor and inspection face permits them to detect the mismatch of the facial structure. Only in Type II, where distractor and target differ in expression, does ambiguity as to what is momentary and what is permanent produce error. Conclusions, Experiments 1 and 2. The comparison among the three stimuli is, of course, mediated by the child’s perceptual representations. The results of Experiment 2 almost perfectly matched those of Experiment 1, both in terms of error rate and error pattern. This suggests that the child is relatively incapable of altering his perceptual representation of a face to take advantage of the opportunity for repeated checking in the simultaneous condition. Further, it appears that almost all of the child’s rather inflexible perceptual analysis is available in short-term memory, since he does almost as well when the inspection photo is not present as when it is. For 6- and %year-olds, that perceptual encoding included paraphernalia. Ten-year-olds appear a transitional group. While in Experiment 2 only one comparison reflecting reliance on paraphernalia is significant, IO-year-olds do not yet show the pattern shared by the three older groups (Fig. 3). Insofar as expression confounding plays a role in these judgments, it serves as a secondary isolated cue for identity judgments at the three youngest ages and as an impediment to abstracting true configuration of the face (disentangling it from momentary expression) in the three older groups. Although this experiment was designed primarily to assess directly young children’s reliance on isolated, piecemeal information, older children’s difficulty with Type II problems also permits us to infer the kind of encoding they are attempting. Any shift away from paraphernalia toward some other isolated features would not lead to Type II being harder than Type I. Only some use of configurational encoding can account for this result. Thus, this experiment supplements the experiment on inversion in providing evidence that the overall improvement with age on these face-recognition tasks implicates configurational encoding.
FACE
RECOGNITION
15
16
DIAMOND
AND
CAREY
We interpret these results as indicating that the 6- and 8-year-olds are not representing configurational aspects of the faces in these experiments. But what about familiar faces? How are they represented by young children? Does a 6-year-old recognize his mother from her hair style and glasses? As a first step, we decided to see if 6-year-olds would be influenced by confounding paraphernalia in a recognition task involving familiar faces. EXPERIMENT
3. FAMILIAR
FACES
Materials. The members of a class of 5- and 6-year-old children were paired on the basis of sex and coloring. Each pair of models was photographed to produce one paraphernalia-to-fool problem and one paraphernalia-to-help problem; each child served once as target person and once as distractor. Two different sets of costumes were used for the two problems. Expression cues were not manipulated, but the children usually had slightly different expressions in the two photographs (see Example 4). The two items in which the same pair of children appeared were semi-randomly assigned to Set I or Set II so that each set comprised equal numbers of paraphernalia-to-fool and paraphernaliato-help problems. The order of items in each set was independently randomized, and both sets were shown to all of the subjects. Half of the subjects saw Set I first and half Set II first. Subjects. The subjects were those children who had served as models, 12 5-year-olds and 8 6-year-olds. Procedure. Before testing, the children were reminded of the photographing session in which they had participated and of how accessories and clothing had sometimes been exchanged. Looking at the faces was emphasized as the means to identify and tell people apart. For each problem, the child was first asked who the target person was. If the name was not produced immediately or was wrong, the experimenter supplied it. Judgments of identity were made first according to the immediate memory procedure used in Experiment 1 and then the entire series was shown again, using the simultaneous procedure of Experiment 2. Predictions. There are several reasons that this task should produce few errors overall and little reliance on paraphernalia. Since the target person is already familiar to the child, he need not rely on a representation of the inspection photograph to answer the question “Which one is Matts?” From viewing the choice pair alone he could answer correctly either by recognizing Matts or by recognizing the distractor as someone else. However. as Example 4 shows, these are extremely masking disguises, often covering hair and other obvious striking features. However, if the child’s representation of familiar faces includes configurational information, he may be impervious to these disguises, both
17
FACE RECOGNITION TABLE !+tRORS
4 (%)
Experiment 4 Unfamiliar condition
Experiment 3 Familiar condition Para-to-fool
Para-to-help
Pam-to-fool
Para-to-help
Age
Condition
5
Memory Simultaneous
3 10
1 0
75 77
4 5
6
Memory Simultaneous
9 5
0 I
II 62
3 0
in naming the inspection photograph and in choosing that person from the pair. Results. The child’s naming of the inspection photographs was good, but not perfect. The Syear-olds named 83% of the 20 targets correctly; 6-year-olds, 84%. The range of errors was O-7 for Syear-olds and l-6 for 6-year-olds. Errors might reflect failure of facial recognition or difficulty in producing the name. The children often said, “I know who it is, I just can’t think of her name.” All readily agreed when the experimenter supplied the name. Further, when asked to indicate the target from the choice pair, there were very few errors (Table 4). Thus, it seems that these children experienced some difficulty in accessing the names of their classmates whose faces they recognized. Those errors which did occur were on paraphernalia-to-fool problems (Table 4). However, at both ages the differences between to-fool and to-help problems failed to reach significance (Wilcoxon, within-modelpairs comparison, l-tailed). Thus, as predicted, there were few errors and no significant reliance on paraphernalia. In the memory condition a third of the incorrect choices were on items involving the child himself as target or distractor, although such items constituted only a tenth of the total trials. This suggests that 5- and 6-yearold children are not as familiar with their own faces as with their classmates’ and must therefore rely more on their representation of the target photo in making their judgments. Conclusions. The child’s familiarity with the faces appears to eliminate error due to paraphernalia confounding. However, familiarity of the faces was not the only difference between Experiment 3 and the earlier studies. In Experiment 3 the models were children, not adults. There was only one source of confounding, paraphernalia; expression was not systematically varied. More important, since the child was reminded of the photographing session and the switching of costumes, he was explicitly warned against disguises. In the instructions for Experiment 1 the child was merely told that the target’s clothes might have changed, not that
18
DIAMOND
AND
CAREY
the other person might be wearing the target’s original costume. The experience of being a model could serve to help the child understand the construction of the materials. Perhaps these variables, rather than familiarity, account for the decreased reliance on paraphernalia. Experiment 4 was designed to test this possibility. EXPERIMENT
4
Materials. Same as Experiment 3. Subjects. Subjects were 12 5-year-olds and 8 6-year-olds from a different community than those in Experiment 3. They were unfamiliar with the children who served as models for the stimulus set. Procedure. The children participated in a preliminary photography session in which half the class at a time watched while Polaroid photographs of each of them were taken, in one or another costume chosen from our paraphernalia. The photographs were shown to the assembled group afterward for identification. Photographs in which the same paraphernalia appeared were compared with appropriate discussion of how, regardless of costume, children could be told apart. Finally, each group watched while two of its members were photographed so as to produce one paraphernalia-to-fool and one paraphernalia-to-help problem. Testing occurred the day after the photography session. The two paraphernalia-to-fool and two paraphernalia-to-help items which had been constructed from snapshots of four of their classmates were shown just prior to the test session. No child had difficulty with these items. Then the child was told that he would see items of the same kind, made with photographs of children he did not know. He was reminded of the principles of construction of the stimuli once again and warned to look at the faces. The experimenter told the child the name of the target model in each test item (e.g., “Matts”) and then asked which of the choice pair was Matts. Judgments of identity were made first according to the immediatememory procedure. The entire series was then shown again, andjudgments were made with all three photographs uncovered at once (simultaneous condition). Predictions. The results of Experiments 1 and 2 were attributed to young children’s limited and inflexible encoding of an unfamiliar face from a still photograph. If this interpretation is right, then the children in Experiment 4 should rely on paraphernalia just as young children did in the earlier studies. Results. Table 4 presents the error rates for Experiment 4. Items in which paraphernalia are confounding yield a high rate of errors, comparable to that found in the earlier experiments. In making immediatememory judgments, 6-year-olds made errors on 72% of the paraphernahato-fool items (Problem Types I and II) in Experiment 1 and on 77%
FACE
RECOGNITION
19
of these items in the present experiment. Similarly, in making simultaneous judgments, 6-year-olds made errors on 68% of the paraphernalia-to-fool items in Experiment 2 compared to 62% in the present experiment. Experiment 4 provides data from children a year younger than those tested in Experiments 1 and 2. The Syear-olds err at a level identical to the 6-year-olds and do not benefit at all from simultaneous presentation (Table 4). At both ages, in both the simultaneous and memory conditions, the differences between paraphernalia-to-fool and paraphernalia-to-help items were highly significant (p < .OO1, Wilcoxon signed ranks, withinmodel-pairs). Conclusions, Experiments l-4. Experiment 4 replicates, with different subjects and different materials, the principal findings of Experiments 1 and 2. Therefore, we conclude that the low error rates and small influence of paraphernalia confounding in Experiment 3 are due to the subjects’ familiarity with the faces and not to other procedural differences between it and the earlier studies. What can be learned from young children’s ease in recognizing their classmates in Experiment 3? Clearly, young children are not simply so overwhelmed by the salient paraphernalia that they ignore faces altogether. When the faces are familiar, the paraphernalia are virtually ignored. Further, it appears that young children’s representations of familiar faces are quite like adults’. That is, if the child has a representation of a face in his long-term memory, he is able to extract aspects of a photograph veridical for person identification. This means that, just as for adults, variations in costume, angle of view, and momentary expression can be tolerated. All of these considerations apply to the child’s ability to identify the person in the inspection photograph as well as his ability to identify one (or both) members of the choice pair. In the case of familiar persons, there is no necessity to match the face in the inspection photograph with the face of the target in the choice pair. When such a match is required, as in Experiments 1, 2, and 4, this is precisely what young children are unable to do. But by the age of 10 to 12, children are able to form, from viewing a single photograph of an unknown person, a representation adequate to discriminating a new photograph of that person from a photograph of someone else. It is as though the older child were capable of making each new face familiar from very little experience with it. It appears that what develops during the ages from 5 to 12 is a schema for making new faces familiar with greater and greater efficiency. DISCUSSION
These experiments provide evidence that young children represent unfamiliar faces in terms of isolated features. One class of such features, paraphernalia, often formed the basis forjudgments of identity. Prior work
20
DIAMOND
AND
CAREY
had pointed in this direction; judgments of resemblance seemed to be made on the basis of features not used by adults (Levy-Schoen, 1964; Savitsky & Izard, 1970, Trombini, 1968). But since the subject’s task in these studies was not person recognition, evidence for how facial identification is accomplished was not provided. Our change from resemblance to identification produced several differences in results. In all three resemblance studies, a shift from paraphernalia matching to expression matching occurred as children became older. In contrast, the present studies showed expression neither supplanting nor even significantly supplementing paraphernalia as a source of isolated features. Instead, aspects of the face actually veridical for recognition supplanted reliance on paraphernalia, the shift becoming complete between ages 10 and 12. In the resemblance studies, the use of a particular cue was very much influenced by contextual factors. In contrast, explicit instruction, participation in the session where sample stimuli were constructed, explication of the kinds of confounding, and practice with four items composed of familiar faces did not diminish reliance on paraphernalia in identifying unfamiliar faces. It appears that the young child cannot easily be brought to encode unfamiliar faces adequately, presumably because he lacks the ability to do so. Saltz and Sigel (1967) have also shown developmental differences in a task involving person identification. At age 6, children were three times as likely as adults to deny that two photographs of the same person which differed in clothing or angle of view depicted the same person. Since their stimuli did not involve confounding cues, their materials are comparable to our control stimuli in Experiments 1 and 2. We too found that 6-year-olds made many more errors (13%) than our three oldest groups (average 1.3%) on control stimuli, confirming Saltz and Sigel’s result. Both results can be interpreted in terms of the development of a capacity for efficient representation of unfamiliar faces. Our data go beyond Saltz and Sigel’s by establishing age lo- 12 as the point at which this development is complete. In addition, the demonstration that younger children’s errors are specifically attributable to confounding paraphernalia permits the conclusion that judgments of identity are actually sometimes based on superficial isolated features. What form of representations of a face supports correct recognition when paraphernalia changes? In the introduction, we suggested that representation in terms of isolated features might be supplanted by representation of complex spatial relations among facial parts. But another possible interpretation is that in representing familiar faces and in older children’s representation of unfamiliar faces, paraphernalia as a source of isolated features is merely replaced by another source, yielding isolated features that are more reliable. For example, instead of recognizing a person by her hat, the older child might note freckles, moles, and
FACE
RECOGNITION
21
bushiness of eyebrows. To the degree that the young children succeeded on paraphernalia-to-fool items in Experiments 1, 2, and 4, they probably chose veridical isolated cues of this kind, one which was not confounding in our experiment. And perhaps older children select isolated features from these sources. But several lines of evidence support the view that by 10 to 12 the child also computes a different kind of representation of an unfamiliar face-a configurational representation-in addition to any isolated striking features he represents. First, it is at age 10 that representations of upright unfamiliar faces become markedly superior to representations of inverted faces for mediating recognition (Carey & Diamond, 1977). Although a picture of an inverted face contains the same spatial relations as does a picture of an upright face, it is plausible that encoding distinctive configurational features requires extensive reference to a canonically oriented standard. This would make configurational encoding of inverted faces difficult, if not impossible. Second, in Experiments 1 and 2, 12 to ldyear-olds had difficulty separating momentary expression from permanent facial configuration, a problem that arises only if an attempt to extract that configuration is being made. Finally, consider the problem of how expression can be read on a photograph of an unknown face. The contribution of the expressive gesture must be disentangled from the contribution of permanent facial structure. If young children cannot extract the facial configuration from a single still photograph, they should find it difficult to interpret facial expressions on unfamiliar faces. In support of this possibility, Izard (1971) found that consensus on the interpretation of a series of expressions increased until age 10 when it had reached the adult level of 75%. Similar findings are reported in Honkavaara (1961). Why might the schema which supports encoding of unfamiliar faces in configurational terms develop around age IO- 12? Performance in certain other tasks in which the relations among stimulus parts must be utilized does not seem to follow the same time course. For example, in a task dependent on recognizing an object drawing from fragmented contours, one study showed that 5-year-olds performed just as well as adults (Gollin, 1960). Perhaps the integration of parts involved in recognizing such materials as Gollin figures is comparable to what permits the recognition of a given stimulus as a face. This integration is presupposed by our tasks, which assess the ability to distinguish individual faces by the subtle configurational differences among them. Another line of research on perception of relationships among parts has followed Piaget’s concept of decentering (Elkind, 1975, Piaget, 1969). In these studies age-related changes in performance seem to take place much earlier in childhood than age 10. Unlike the stimuli used in these studies, however, our materials do not involve ambiguous or overlapping contours, nor the possibility of seeing only parts but not the wholes which
22
DIAMOND
AND CAREY
the parts comprise. Again, although the factors which affect perceptual organization of any visual display must influence face perception, the skills tapped in the Piagetian tasks appear to be at a more primary level of whole-formation than those involved in individuating a face. Face recognition is an important social tool. Why does an efficient means of representing unfamiliar faces emerge so late? Perhaps the child must make many different faces familiar before a face schema adequate for tasks such as ours can develop. The developmental course of face perception might also reflect maturation or commitment of neural structures which subserve it. This possibility is discussed in Carey & Diamond, 1977), which suggests that the development of face recognition might fruitfully be placed in the context of the development of other visuospatial right hemisphere functions. REFERENCES Carey, S. & Diamond, R. From piecemeal to configurational representation of faces. Science, 1977, 195, in press. Elkind, D. Perceptual development in children. American Scientist, September-October, 1975. Goldstein, A. G. Learning of inverted and normally oriented faces in children and adults. Psychonomic
Gollin,
Science,
1%5.3,
447448.
E. S. Developmental studies of visual recognition of incomplete objects: A comparative investigation of children and adults. Perceptual and Motor Skills, 1960, 11, 289-298.
Gombrich,
E. H. In Gombrich, E. H., Hochberg, J. & Black, M. (Eds.) Art, perception Maryland: Johns Hopkins University Press, 1972. Pp. l-46. Honkavaara, S. The psychology of expression. British Journal ofPsychology, 1961, No. 32. Hochberg, J.. & Galper, R. E. Recognition of faces: I. An exploratory study. Psychonomic Science, 1%7,9, 619-620. Izard, C. E. The&e of emotion. New York: Appleton-Century-Crofts. 1971. Levy-Schoen, A. L’Image d’Autrui chez L’Enfant. Publications de la FacultC des lettres et Sciences Hum&es de Paris. Serie, Recherches, tome XXIII. Presses Universitaires de France: Paris, 1964. Piaget, J. The mechanisms ofperception. New York: Basic Books, 1%9. Saltz. E., & Sigel, I. E. Concept overdiscrimination in children. Journal of Experimental and reality.
Psychology,
1967,73,
82-93.
Savitsky, J. C., & Izard, C. E. Developmental changes in a concept formation task. Developmental Psychology, 1970,3, 350-351. Trombini, G. Identita e somiglianza nella percezione infantile de1 volto. In Canestrari, R. (Ed.), Richerche Di Psicologia Sperimentale in Onore di Guilio Cesare Papilli. Italy: Giunti, 1968. Yin, R. K. Face recognition: A special process. Unpublished Ph.D. thesis, M.I.T. Psychology Department, 1970a. Yin, R. K. Face recognition by brain injured patients: A dissociable ability? Neuropsychologia, 1970b. 8, 395-402. RECEIVED:
May 22, 1975:
REVISED:
May
19, 1976