Behavioural Processes 74 (2007) 27–32
Cross-modal representation of human caretakers in squirrel monkeys Ikuma Adachi ∗ , Kazuo Fujita Department of Psychology, Graduate School of Letters, Kyoto University, Yoshida-honmachi, Sakyo, Kyoto, Kyoto 606-8501, Japan Received 8 August 2006; received in revised form 13 September 2006; accepted 14 September 2006
Abstract We tested whether squirrel monkeys have cross-modal representations of their human caretakers with a 0-delay symbolic matching-to-sample procedure. We first trained the monkeys to match photographs of two of their caretakers. After reaching criterion, they were exposed to two test sessions. In these sessions 32 all-reinforced test trials were interspersed among the training trials. In the test trials, a voice, either matching (congruent condition) or mismatching (incongruent condition) with the sample photographs, was played back after the sample stimulus disappeared. The monkeys’ matching accuracies in the incongruent condition were lower than in the match condition. Post hoc analyses revealed that the presentation of the primary caretaker’s voice interfered with performance in test trials where the secondary caretaker’s face was presented (incongruent condition). This suggests that our subjects recalled their primary caretaker’s representation upon hearing the appropriate voice. © 2006 Elsevier B.V. All rights reserved. Keywords: Cross-modal representation; Natural concepts; Squirrel monkey
1. Introduction A variety of non-human species have been shown to form various natural concepts, such as “trees” or “humans” (e.g. Herrnstein and Loveland, 1964; Herrnstein et al., 1976; Cerella, 1979; Yoshikubo, 1985). However, those studies are limited in several ways. First, because of the perceptual resemblance among stimuli in previously used categorical discrimination tasks, some researchers have proposed that accurate categorization of pictures can be mediated by non-conceptual processes (Fagot et al., 1999) in which the subject animals focus on a subset of perceptual features associated with each category, such as the presence of a distinctive color patch (D’Amato and van Sant, 1988). In this case, animals could discriminate the stimuli using perceptual characteristics of images without necessarily recognizing the objects represented in the pictures (Fagot, 2000). Second, each exemplar of natural concepts, that we humans have, may lead us to generate a specific or typical representation of that concept. This aspect of interchanging information has not received much attention. Comparative cognitive approaches to
∗ Corresponding author. Present address: Yerkes National Primate Research Center, 954 Gatewood Road, Atlanta, GA 30329, USA. Tel.: +1 404 727 9619; fax: +1 404 727 8088. E-mail address:
[email protected] (I. Adachi).
0376-6357/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.beproc.2006.09.004
such aspects are essential for understanding how abilities for categorization might have evolved. In the present study, we focused on the cross-modal nature of concepts. Clearly, exemplars in different sensory modalities should not share any perceptual characteristics. For instance, our concept of “dogs” contains not only their various shapes but also their vocalizations, smells, etc., with no perceptual resemblance between their vocalizations and their appearance. Furthermore, when we hear the vocalization of a dog, we may activate visual images of dogs. This is the interchanging information aspect of concepts described above. Such interchanging information across sensory modalities would be useful to animals because the modality available at one time may be unavailable at other times. For example, if an animal uses vision as the primary channel for controlling its behavior, transforming information from other modalities to vision would be advantageous, as seems to be the case for humans. Recent reports have shown that some primates can at least associate auditory information with visual information. For instance, chimpanzees (Pan troglodytes) form intermodal associations of stimuli within a category (Hashiya and Kojima, 1999, 2001). This achievement is not limited to apes; Ghazanfar and Logothesis (2003) reported that rhesus monkeys (Macaca mulatta) looked more at videos of monkeys showing the facial expression that matched a simultaneously presented vocalization than at non-matching videos. Using a same procedure, Evans et
28
I. Adachi, K. Fujita / Behavioural Processes 74 (2007) 27–32
al. (2005) showed that capuchin monkeys (Cebus apella) are also able to detect the correspondence between appropriate visual and auditory events. These results imply that those species generate visual images when they hear sounds or vocalizations. However, the tasks used, involving simultaneous presentation of two visual stimuli to the subject, allows the latter to choose one by judging which stimulus is more associated with the auditory stimulus after comparing them. Thus, in the strict sense, it is still unclear whether they actually activate visual images on hearing the vocalization, before the visual stimuli appear. This aspect of intermodal transformation of representations remains to be tested directly. At the same time, further information is required on how widespread such intermodal transformation of representations might be in the animal kingdom. In two recent reports, we have demonstrated this intermodal transformation of representations in domestic dogs and Japanese macaques (dogs: Adachi et al., in press-a, Japanese macaques: Adachi et al., in press-b). We extended an expectancy violation procedure to test the animals’ cross-modal representations. Briefly, we introduced a voice and then presented a photograph of a face, either matching or mismatching with the preceding voice. Our hypothesis was that if subjects recall the appropriate representation upon hearing the voice, they should be surprised when a mismatching face was presented. This should manifest itself as a longer looking time at the mismatching photograph compared to the matching photograph. This result was obtained with the two species tested. Thus they not only associate auditory and visual information of familiar individuals, but also spontaneously convert information from one modality to the other. This cross-modal version of the expectancy violation procedure is useful for comparisons across species because it requires no training of subjects. However, there are several shortcomings inherent in the procedure. First, looking time varies substantially across individuals and may be affected by extraneous factors. This means that many subjects need to be tested to obtain reliable data. Second, looking behavior may be affected by subjects’ preferences for certain stimuli, which could overshadow any effect of the mismatch between stimuli. Third, we cannot test the same animal more than a few times because of the likelihood of habituation to the stimuli. These problems make it difficult to analyze the nature of the animals’ conceptual representations in detail. It is desirable to establish another procedure for examining intermodal transfer of representations in non-humans, one that would enable more detailed analyses using a small number of individuals. Another important question that was not addressed in the previous studies concerns how flexibly non-human animals can form cross-modal representations. The cross-modal concept of “owner” we showed in dogs, and that of “conspecific” in Japanese macaques, could have some special, possibly genetic, basis. That is, these two cases of cross-modal representations we demonstrated might be special cases among conceptual categories in non-humans. For dogs, the long history of domestication and close cohabitation with humans has involved selection for sophisticated abilities to interact and communicate with humans. These processes may have enhanced dogs’ abilities to
form cross-modal representations of their owners. In the case of Japanese macaques, their concept of conspecific may have a genetic basis. We have found evidence of an innate predisposition in the development of recognition of biological motion (Adachi et al., 2003). In that study, we showed that recognition of biological motion was affected by visual experience. Specifically, enclosure-reared macaques recognized the biological motion of a macaque but not that of a human, whereas cagereared monkeys with extensive visual experience of humans showed the opposite tendency. However, there was a difference in how this recognition developed. Cage-reared monkeys came to prefer human biological motion between the ages of 8 and 15 weeks, whereas the enclosure-reared group preferred macaque biological motion at all ages tested from 0 to 25 weeks old. This difference suggests that innate factors in the development of biological motion perception might interact with experience. It therefore seems important to examine whether crossmodal representations generalize to concepts that are unlikely to be influenced by any biological specificity of the subject species. In the present study, we aimed to establish a new procedure and try to reveal whether non-human animals can form crossmodal representations of individuals without the influence of biological specificity. To this aim we used a modified 0-delay symbolic matching-to-sample procedure. Although this procedure requires training, it enables us to overcome the problems raised above, such as inherent preferences for stimuli and variability in responses. At the same time, we aimed to increase reliability of the data by repeatedly testing a small number of subjects. We focus on the recognition of humans by squirrel monkeys; this phenomenon is presumably independent of any genetic predispositions in this monkey species. In the training phase, subjects were trained to discriminate photographs of two caretakers. After reaching criterion, they were tested on trials in which they heard the voice of a human before making a choice between comparison stimuli. Our hypothesis was that if the subjects recall any visual image from the voice, the voice should interfere with their memory of the sample photograph when face and voice mismatch. It follows that the subjects’ performance on mismatch trials should be worse than on matching trials and training trials. 2. Method 2.1. Subject Subjects were two 8-year-old squirrel monkeys (Saimiri sciureus), named Homer (female) and Coboo (male). They were familiar with the experimental set-up and the symbolic matching-to-sample task with visual stimuli, but had never experienced cross-modal tasks or tasks with auditory stimuli. They were kept in an indoor cage system with other squirrel monkeys, and they had daily contact with humans. Two persons, hereafter referred to as P1 and P2, took care of these subjects. Although both persons were familiar to the subjects, the relationships between them and the subjects were not the same. P1 was the primary caretaker and P2 was the secondary caretaker
I. Adachi, K. Fujita / Behavioural Processes 74 (2007) 27–32
29
Fig. 1. A schematic drawing of the apparatus.
for Homer and vice versa for Coboo. The primary caretaker had fed the monkey almost every day for more than 4 years. The secondary caretaker had fed him/her only when the primary caretaker was absent. 2.2. Apparatus The subjects were trained and tested in a transparent operant box (50 cm × 50 cm × 50 cm). One wall had an opening (18 cm × 25 cm), behind which a 15 in. CRT monitor with a capacitance touch sensor (Totoku CP151PJ1) was attached. A loudspeaker was located behind the monitor. Two levers were attached below the opening. Homer used the lever on the right and Coboo used the one on the left. A universal feeder located on the operant box delivered pieces of food (apples, sweet potatoes and peanuts) into the food box on the left-side wall (see Fig. 1). The equipment was controlled by a built-to-order personal computer (CPU: AMD Athrlon XP2400+ ). 2.3. Stimuli As stimuli, a frontal face photograph and a vocalization were prepared for each caretaker. We took a digital full-face photo of each person against an ivory-colored background and stored the photo on the computer in JPEG format sized 200 (W) × 200 (H) pixels, or ca. 5 cm × 5 cm on the monitor we used. A vocalization, the Japanese word “oide”, from each person was recorded as the auditory stimulus. The word means “come on” and is commonly said to the monkeys during daily contacts. Vocalizations from the two caretakers were recorded on a DAT recorder (Sony TCD-D100), and then digitized and stored in the computer in WAV format. The sampling rate was 44,100 Hz
and the sampling resolution was 16-bit. The duration of the two caretakers’ vocalizations was approximately equal. For comparison stimuli, we prepared two figures; a heart and a moon (100 × 100 pixel), painted in white (see Fig. 2). One subject (Homer) was trained to associate P1 to moon and P2 to heart, while the opposite pairing was used for the other subject. 2.4. Procedure Fig. 2A and B illustrates the symbolic matching-to-sample task. In the training phase (Fig. 2A), each trial started when one of the levers was illuminated after an inter-trial interval of 1 s. After the subject held the lever down for 1 s, a photograph of one caretaker was presented at the center of the monitor as a sample stimulus. Five touches on this sample stimulus extinguished it and resulted in the appearance of two comparison stimuli, located above the sample stimulus. Touching the comparison stimulus that corresponded to the sample was reinforced by a piece of food, whereas touching the incorrect comparison stimulus was followed by an 8-s time-out. Sessions consisted of 100 trials. The monkeys proceeded to the test phase after performing at above 80% correct for each face in two consecutive sessions. In the test sessions, 32 all-reinforced test trials were interspersed among 68 baseline trials in which the same task was presented as in training sessions. In the test trials, a voice, either matching (congruent condition) or mismatching (incongruent condition) with the sample photographs, was played back just after the sample stimulus disappeared (Fig. 2B). Thus, four types of test trials were given; first, the primary caretaker’s voice was presented after the primary caretaker’s face (FacePri–VoicePri trial). Second, the voice of the secondary caretaker was presented after the secondary caretaker’s voice (FaceSec–VoiceSec trial).
30
I. Adachi, K. Fujita / Behavioural Processes 74 (2007) 27–32
Fig. 2. A schematic diagram of the matching-to-sample task used for training (A) and the test (B).
Third, the primary caretaker’s voice followed the face of the secondary caretaker (FaceSec–VoicePri trial). Finally, the voice of the secondary caretaker followed the primary caretaker’s face (FacePri–VoiceSec trial). The voice and the photo matched in the former two trial types and they mismatched in the latter two. Each of the four types of test trials appeared eight times per session. We hypothesized that, if the subject recalled the visual image of the caller upon hearing the caller’s voice, the recalled visual representation should interfere with their memory trace of the sample stimulus in the mismatch condition but not in the match condition. It follows that subjects’ performance in the incongruent condition should be worse than in the congruent condition and in baseline trials.
3. Results Fig. 3A shows the proportion of correct choices in baseline trials in the two test sessions. Both subjects performed at accuracies significantly above chance for both caretakers’ faces (p < 0.001 for both faces, binomial test) (Fig. 3A). Fig. 3B shows the proportion of correct choices for each type of test trials in the test sessions. The X-axis indicates which face was presented as a sample; the color of the bars indicates condition types. The number of correct responses in each type of test trials was analyzed with a binomial test. Their performances in both of trial types where the primary caretaker’s face was presented were highly significantly above chance (Fig. 3B left two bars: p < 0.001). On the other hand, their
I. Adachi, K. Fujita / Behavioural Processes 74 (2007) 27–32
31
the incongruent condition on these trials, the presented voice belonged to the secondary caretaker. This result indicates that the secondary caretaker’s voice did not affect the subjects’ performance. On the other hand, the difference between FaceSec–VoiceSec and FaceSec–VoicePri trials (Fig. 3B, right two bars), in which the secondary caretaker’s face was presented, was highly significant (χ2 (1) = 0.9328, p = 0.005). In the incongruent condition on these trials, the presented voice belonged to the primary caretaker. This suggests that the voice of primary caretakers affects the monkeys’ responses. Specifically, their memory trace for the face of secondary caretakers’ was disrupted by the voice of primary caretaker. This result is understandable only by the relationship between our subjects and the caretakers. For Homer, P1 was the primary caretaker and P2 was secondary and vice versa for Coboo. Therefore we can reject the possibility that the results were due to perceptual features of the stimuli used. The monkeys seem to have formed a cross-modal representation of their primary caretakers in the context of close daily communication, but not with their secondary caretakers. 4. Discussion
Fig. 3. Percentage of correct choices for each subject on training trials (A) and test trials (B) during the test sessions. *** Indicates a highly significant difference from chance level (p < 0.001) by binomial test (A), and a highly significant difference between conditions (p < 0.005) by χ2 -test (B).
performances in trials where the secondary caretaker’s face was presented as a sample stimulus were relatively low. Even with the matched voice, their performance in FaceSec–VoiceSec trials was as low as 59.3%, and was not significantly above chance with a binomial test (p > 0.1). This was not caused only by longer inter-stimulus-interval than that in the baseline trials because their performances in FacePri–VoicePri and FacePri–VoiceSec trials were significantly above chance. Subjects’ memory trace for the secondary caretaker might be weaker than that for the primary caretaker and disrupted by longer inter-stimulus-interval brought by the presence of the voice. More importantly, they selected the comparison stimulus that was assigned to the primary caretaker significantly (p < 0.001) in FaceSec–VoicePri trials. This indicates they would activate visual images of the primary caretaker on hearing the vocalization. To assess the effect of the presentation of auditory information more directly, we compared accuracy between the two conditions in which the same face was presented; that is, between FacePri–VoicePri and FacePri–VoiceSec trials, and between FaceSec–VoiceSec and FaceSec–VoicePri trials, using χ2 -tests with alpha conservatively set at 0.025 because there were two comparisons. The χ2 -tests revealed that the difference between FacePri–VoicePri and FacePri–VoiceSec trials (left two bars in Fig. 3B), in which the primary caretakers’ face was presented, was not significant (χ2 (1) = 0.097, p = 1.000). In
In the present experiment, our main finding is that our squirrel monkeys showed significantly lower performance in FaceSec–VoiceFirst trials than in FaceSec–VoiceSec trials, although they got the same picture as a sample stimulus. In the FaceSec–VoiceFirst trials, they selected the comparison stimulus that was assigned to their primary caretakers significantly more than chance level. That is, the subjects’ processing of a secondary caretaker’s face was disrupted by the presentation of a primary caretaker’s voice. This suggests that both subjects had a cross-modal representation of their primary caretakers and generated the visual representation of the caretaker upon hearing the corresponding voice. Each primary caretaker of one subject was also a secondary caretaker for the other subject. Thus, perceptual characteristics of person do not seem to contribute to the observed crossmodal interference effect. This effect cannot be accounted for by the feature (Lea and Ryan, 1993), prototype (Aydin and Pearce, 1994) or exemplar theories (Medin, 1989), which posit that categorization is exclusively controlled by the perceptual characteristics of the stimuli. Moreover, reward delivery was not correlated with the voice stimulus, which eliminated any learning of simple face–voice associations on the basis of reinforcement contingencies. Our subjects had received no training on cross-modal tasks prior to this study. Therefore, the conceptual representations shown in this study cannot be attributed to artifacts of the experimental procedure, hence must have already been established before the study started. This conceptual representation cannot be explained by innate predispositions. Instead, the result suggests that squirrel monkeys spontaneously form cross-modal representations through experience. In the present study, subjects’ performance in FaceSec– VoiceSec trials dropped to chance level, although the matched
32
I. Adachi, K. Fujita / Behavioural Processes 74 (2007) 27–32
voice was presented. At the same time, although it is mismatched to the preceded face stimulus, the voice of the secondary caretaker did not harm their accuracies in FacePri–VoiceSec trials. These results would be explained by their poor representation of the secondary caretaker and their weak memory trace of the secondary caretaker. We assume that any representation of the secondary caretakers, who fed them infrequently would be much weaker than that of the primary caretakers. Thereby, the voice of the secondary caretaker did not affect their performances in trials where the primary caretaker’s face was presented, and also the weaker memory trace was easily disrupted by the longer inter-stimulus-interval brought by the presence of the voice in FaceSec–VoiceSec trials. This result suggests that establishing a concept of an individual of another species requires extraordinarily rich contact with the individual. It may be much more difficult for nonhuman animals to form such a concept of a member of another species. For instance, Matsuzawa (1991) trained chimpanzees to discriminate photographs of five humans and five chimpanzees using a symbolic matching-to-sample procedure. The apes found it more difficult to name photos of humans than photos of chimpanzees. More recently, Valerie et al. (2006) showed the face processing limitation to own species in two primate species. In their study, they found that both of Tonkean macaques (Macaca tonkeana) and brown capuchin monkeys (Cebus apella) showed the advantage for own species face recognition and showed difficulty for other species face recognition. These studies suggest that this difficulty of recognizing individuals in another species seems to be shared widely among a variety of primate species at least from new world monkeys to apes. This difficulty could explain that our subjects form a concept only of their primary caretakers. Our subjects’ performances suggest that they could not overcome this difficulty with small amount of communication with their secondary caretakers but they could with their extensive experience with their primary caretakers in daily close communication. A weakness of our experiment would be that we used only one photo for each caretaker. Further tests with more exemplars would be needed to determine whether the interjection of the voice between the sample and comparison stimuli is of significance. However, the concept we have shown could not be a consequence of association learning during the task because there was no pairing between stimuli that would result in the formation of such association learning. Finally, we suggest that the procedures we have used here and previously, both involving cross-modal expectancy violation, are complementary. The former is useful for detailed analyses of the nature of any given cross-modal concept, while the latter is useful as a screening test, suitable for a variety of non-human species at various developmental stages.
Acknowledgements This study was supported by the Research Fellowships of the Japan Society of the Promotion of Science for Young Scientists to Ikuma Adachi, the Grant-in-Aid for Scientific Research Nos. 13410026 and 14651020 from Ministry of Education, Culture, Sports, Science, and Technology (MEXT), Japan, to Kazuo Fujita, and by the 21st Century COE Program, D-10, from MEXT to Kyoto University. We also thank Dr. James R. Anderson for his editing of the manuscript. References Adachi, I., Fujita, K., Kuwahata, H., Ishikawa, S., 2003. Perception of biological motion in infant macaques. In: Tomonaga, M., Tanaka, M., Matsuzawa, T. (Eds.), Development of Cognition and Behavior of Chimpanzees. Kyoto University Press, Kyoto, pp. 333–336 (in Japanese). Adachi, I., Kuwahata, H., Fujita, K., in press-a. Dogs recall their owner’s face upon hearing the owner’s voice. Anim. Cogn. Adachi, I., Kuwahata, H., Fujita, K., Tomonaga, M., Matsuzawa, T., in press-b. Japanese macaques form a cross-modal representation of their own species in their first year of life. Behav. Proc. Aydin, A., Pearce, J.M., 1994. Prototype effects in categorization by pigeons. J. Exp. Psychol. Anim. Behav. Proc. 20, 264–277. Cerella, J., 1979. Visual classes and natural categories in the pigeon. J. Exp. Psych.: Hum. Percept. Perform. 5, 68–77. D’Amato, M.R., van Sant, P., 1988. The person concept in monkeys (Cebus apella). J. Exp. Psych.: Anim. Behav. Proc. 14, 43–55. Evans, T.A., Howell, S., Westergaard, G.C., 2005. Auditory-visual crossmodal preception of communicative stimuli in tufted capuchin monkeys (Cebus apella). J. Exp. Psych.: Anim. Behav. Proc. 31, 399–406. Fagot, J. (Ed.), 2000. Picture Perception in Animals. Psychology Press, Hove, England. Fagot, J., Martin-Malivel, J., Deˇıpy, D., 1999. What is the evidence for an equivalence between objects and pictures in birds and non-human primates? Curr. Psychol. Cogn. (Cahiers Psychol. Cogn.) 18, 923–949. Ghazanfar, A.A., Logothesis, N.K., 2003. Neuroperception: facial expressions linked to monkey calls. Nature 423, 937–938. Hashiya, K., Kojima, S., 1999. Auditory-visual intermodal matching by a chimpanzee (Pan troglodytes). Primate Res. 15, 333–342. Hashiya, K., Kojima, S., 2001. Acquisition of auditory–visual intermodal matching to sample by a chimpanzee (Pan troglodytes): comparison with visual–visual intramodal matching. Anim. Cogn. 4, 231–239. Herrnstein, R.J., Loveland, D.H., 1964. Complex visual concept in the pigeon. Science 146, 549–551. Herrnstein, R.J., Loveland, D.H., Cable, C., 1976. Natural concepts in pigeons. J. Exp. Psych.: Anim. Behav. Proc. 2, 285–302. Lea, S.E.G., Ryan, C.M.E., 1993. Featural analysis of pigeons’ acquisition of discrimination between letters. In: Commons, M.L., Herrnstein, R.J., Wagner, A.R. (Eds.), Quantitative Analyses of Behavior, vol. 4. Ballinger, Cambridge, MA, pp. 239–253. Matsuzawa, T., 1991. Chinpanzee Kara Mita Sekai. Tokyo University Press (in Japanese). Medin, D.L., 1989. Concepts and conceptual structure. Am. Psychol. 44, 1469–1481. Valerie, D., Olivier, P., Odile, P., 2006. Face processing limitation to own species in primates: a comparative study in brown capuchins, Tonkean macaques and humans. Behav. Proc. 73, 107–113. Yoshikubo, S., 1985. Species discrimination and concept formation by rhesus monkeys (Macaca mulatta). Primates 26, 285–299.