Perceptual transformations in vision and hearing

Perceptual transformations in vision and hearing

Int. J. Man-Machine Studies (1981) 14, 123-132 Perceptual transformations in vision and hearing RICHARD M. WARREN Department of Psychology, Universi...

2MB Sizes 16 Downloads 121 Views

Int. J. Man-Machine Studies (1981) 14, 123-132

Perceptual transformations in vision and hearing RICHARD M. WARREN

Department of Psychology, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201, U.S.A. (Received 8 June 1980) This paper presents a history and comparison of the illusions occurring with three types of unchanging patterns of stimulation: (1) visually ambiguous figures; (2) stabilized retinal images; (3) repeated spoken words. Photographs are presented showing that perceptually unstable ambiguous figures of considerable complexity and subtlety were constructed as mosaics in classical times. It is suggested that the ancient practice of scrying may be related to changes observed with stabilized retinal images. While illusory changes heard when listening to a recording of a repeated word have been compared with changes occurring with ambiguous figures and with stabilized retinal images, it is argued that such comparisons have been misleading. While these three illusions have in common the replacement of one perceptual form by another with continued stimulation, the types of perceptual reorganization corresponding to these changes are specific for each of these illusions, and can provide information concerning the nature of special strategies used with the different stimuli. Examples are given of how verbal transformations can reveal mechanisms used for processing of speech. Finally, some comparisons are made concerning the usefulness of studying illusions in man and in machines.

It has been known for a long time that continued stimulation with certain patterns of stimulation can lead to illusory changes. These changes can be entertaining, as well as serving m o r e serious purposes. This p a p e r will deal with perceptual transformations occurring with three types of unchanging stimuli: visually ambiguous figures, stabilized retinal images and repeated words.

Visually ambiguous figures When a pattern is viewed which has two or more plausible interpretations that are mutually incompatible (that is, it is physically impossible for two of the forms represented by the figure to be present simultaneously), then continued inspection can lead to an apparent change from one of the forms to another without conscious volition. Many of these ambiguous figures involve reversible perspective. Since Necker (1832) called attention to this p h e n o m e n o n , a large n u m b e r of reversible figures have been constructed. Perhaps the best known of these is the " N e c k e r Cube", although Necker actually described a parallelopiped. Necker pointed out that reversible geometric figures a p p e a r in Euclid's solid geometry, and reasoned that the ancient Greeks must have noticed the illusory shifts in perspective which are so obvious to us (see Fig. 1). However, I have taken some photographs providing more direct evidence available of the appreciation of reversible figures in classical times, which appear as Figs 2, 3 and 4. 123 0020-7373/81/010123 + 10 $02.00/0

9 1981 Academic Press Inc. (London) Limited

124

~R. M. W A R R E N

FIG. 1. Illustration of perspectively reversible outline cube and outline parallelopiped from Euclid's Elements, Book XI.

FIG. 2. Design of reversible stacked cubes forming the floor in the ruins of the Temple of Apollo at Pompeii.

FIG. 3. Mosaic forming reversible figures on the floor of a room in the ruins of Emperor Hadrian's Villa near Rome.

PERCEPTUAL TRANSFORMATIONS

] 25

Reversible stacked cubes were used as the floor of the Temple of Apollo in Pompeii, and the volcanic ash produced by the great eruption of Vesuvius in AD 79 had kept the mosaic intact until its recent excavation. Figure 2 is a photograph of this temple floor. Emperor Hadrian's Villa near Rome (built c a . AD 125) had intricate and clever reversible designs in the floors of rooms, as shown in Fig. 3. It can be seen that designs corresponding to Mach's Open Book Illusion (see Fig. 5(a)) and Necker Cubes (see Fig. 1) are used in this 2nd century floor.

FIG. 4. Photographs of reversible mosaic designs from the floors of St Mark's Cathedral, Venice (a, b, c, d) and Great St Mary's Basilica, Rome (e, f).

126

R . M . WARREN

St Mark's Cathedral in Venice has complex, subtle and beautiful reversible mosaics. These unstable designs are at o n e ' s feet, and when the gaze is raised to the wails and vaulted ceilings, an awe-inspiring array of perceptually stable mosaics of devotional figures are seen (the mosaics were constructed from the 1 l t h to the 14th centuries). Art books deal with the walls and ceilings, and to the best of my knowledge, there are no published accounts of the reversible mosaics which grace the floors. Ingenious and beautiful mosaic floors also can be found in G r e a t St M a r y ' s basilica at R o m e (built in the 5th century). Figure 4 presents some of the mosaics photographed in these churches. Following Necker's description of reversible solid figures, many " n e w " ambiguous figures were reported. One of the simplest figures was Ernst Mach's O p e n Book Illusion, shown in Fig. 5(a), which can be seen with the b o o k ' s spine either nearer or further from the viewer than the two outer vertical lines representing the edges of the book. A m o r e complex figure is Schroeder's Staircase/Overhanging Cornice appearing in Fig. 5 (b). The staircase is generally perceived m o r e readily than the cornice: using this favored form for reference, it can be seen that the figure can be considered as multiple Mach books, with the intersection of risers and treads of the staircase forming the spines of books.

Ca)

Cc)

Cd)

FIG. 5. Some reversible figures constructed in the 19th and 20th centuries: Ernst Mach's Open Book (a), Schroeder's Staircase/Overhanging Cornice (b), Rubin's Profile Faces/Vase (c), and Boring's Wife/Motherin-Law (d).

PERCEPTUAL TRANSFORMATIONS

127

Another class of ambiguot~s depth illusions was developed by Gestalt psychologists. These "figure-ground" illusions consisted of outline figures which are in front of, and so occlude, a portion of a generally featureless background. The special figures are constructed so that both the lighter and darker portions could serve as either figure or ground. Perhaps the best known is Rubin's Profile Faces/Vase design illustrated by Fig. 5(c). The last type of ambiguous figure which I will describe is represented by designs corresponding to two complex objects which do not represent changes associated with differences in relative distances from the viewer. Figure 5(d) represents one of the best known of these designs (Wife/Mother-in-Law Figure attributed to E. G. Boring). The visual reversible figures which have been described all have these two features in common: (1) there is more than one plausible interpretation of the figure; and (2) each of the possible organizations precludes perception of the other because of portions of the design common to both.

Stabilized retinal images While visual reversible figures are limited to a relatively few ambiguous designs, any visual display becomes perceptually unstable and subject to perceptual reorganization if the optical image's movement on the retina is restricted. Greater restriction produces greater instability, and illusory changes are maximal when the effects of small involuntary eye movements (physiological nystagmus) are cancelled, as we shall see. However, changes can occur with voluntary fixation despite the slight trembling of the image, especially under fairly dim illumination. The ancient procedure of divination through "scrying" used by cultures in Europe, Africa and Asia may be based in part on this phenomenon. Scrying involves a prolonged fixation of a visual display, leading to illusory (or hallucinatory) vision. In Egypt, a pool of ink or blood was used. In ancient Greece, a polished metal surface was fixated, and in Arab countries a polished fingernail has been used up to modern times. Queen Elizabeth I employed a crystal-ball gazer (see Rawcliffe, 1959). There was a resurgence of interest in the occult in the 19th century, and crystal-ball gazing was a popular pastime for conjuring up visions of the past or future during the Victorian era. However, during the same century, the effects of unchanging patterns of stimulation on the retina were also separated from myth and mysticism and studied in the laboratory. Charles Wheatstone seems to have been the first to recognize the great perceptual instability resulting from cancellation of the effects of small eye movements. In considering Purkinje's observation that a candle in movement at one side of the central field of vision produces an image resembling a branched tree, Wheatstone concluded in the 1830's that shadows of retinal blood vessels were responsible for the image seen (Wheatstone, 1879). He reasoned that since the blood vessels were but a short distance in front of the light-sensitive elements of the retina, any small eye movements would have a negligible effect upon the position of their shadows. However, gross movements of the candle did produce appreciable shifting of the shadows. Wheatstone used these observations to deduce that visual displays corresponding to objects outside the eye would quickly disappear unless small eye movements continually produced appreciable motion of the corresponding images on the retinas. This brilliant deduction has been verified: the continual involuntary movements of the eye during visual fixation have been measured, and experiments

128

R.M. WARREN

using stabilized retinal images have demonstrated that images with a fixed retinal location do indeed fragment and disappear. But they also reappear, often in incomplete form. Experiments with devices capable of cancelling the effects of eye movements have shown that the fragments of the fixed retinal image which are visible generally follow certain organizational rules. When words were viewed, such as BEER, the fragments of the display which were visible seemed to disappear and reappear, and often corresponded to English words (such as PEER, PEEP, BEE or BE). Straight lines of geometric figures tended to act as units, but when lines intersected, a perceptual break usually occurred at points of intersection. When a profile drawing of head was viewed, specific groupings of lines such as those forming the front of the face or the top of the head tended to persist or disappear together (Pritchard, Heron & Hebb, 1960; Pritchard, 1961). Christopher Evans and his co-workers (Bennet-Clark & Evans, 1963; Evans, 1965; Evans & Marsden, 1966) utilized a much simpler method for producing stabilized retinal images using positive after-images. By presenting a brief bright flash, a spatially fixed image could be produced capable of being viewed for tens of seconds before fading completely. Rules similar to those observed for cancellation of the effects of eye movements were observed with portions of the design coming and going in a dynamic display. Some of the results obtained by Evans and his colleagues are shown in Fig. 6.

I) FIG. 6. Some of the forms reported frequently while viewing the positive after-image of a cross circumscribed within a circle (Bennet-Clark & Evans, 1963).

Repeated words and the verbal transformation effect In the middle-1950's, it occurred to me that an auditory analog of the visual reversible figures might exist. I thought that if a word were repeated many times to produce an amgibuous stimulus, such as " s a y - s a y - s a y . . . " (which has the same sequence 0I phonemes as " a c e - a c e - a c e . . . " ) , or "tress-tress-tress..." which has the same sequence of phonemes as "rest-rest-rest..."), perception might shift from one plausible interpretation of the sequence to the other. Saying a word over and over to oneself would not do as a way of preparing such stimuli, because verbal organization associated with motor commands could become confused with auditory perception. Richard Gregory and I were both at Cambridge University at the time, and we constructed loops of tape which repeated such words. We reported in a note that the illusory changes anticipated did indeed occur (Warren & Gregory, 1958). Neither of us

PERCEPTUAL TRANSFORMATIONS

129

then appreciated the significance of other perceptual changes corresponding to perceptual distortion of the stimuli, and we attributed such distortions to inadequacies in our technique for playback of tape loops. However, further thought and a more carefully controlled study started the following year convinced me (at least) that the illusory changes in words were not as closely analogous to visual reversible figures as I thought (Warren, 1961 a). The auditory illusion, which I then named the "verbal transformation effect", did not require ambiguous stimuli (any repeated word would do), with the changes heard often involving considerable perceptual distortion (such as "ripe" played loudly and clearly being heard as the two-word repeated phrase "bright-light"). The forms heard often were quite different for different subjects, with individuals reporting perhaps a dozen words during 3 minutes of listening to a single word repeated twice a second. As an example, a subject listening to the word "tress" repeated loudly and clearly with no pauses between repetition heard, within the course of a few minutes, such illusory forms as "dress", "stress", "Joyce"., "floris", "florist" and "purse". Not willing to abandon the concept of a visual analog, I suggested that changes associated with inspection of stabilized retinal images represented a visual "counterpart" of verbal transformations (Warren, 1961b). Chris Evans, working with positive after-images as a convenient means of generating stabilized retinal images, was unaware at that time of my suggestion relating verbal transformations and stabilized retinal images, and independently conceived the idea that listening to a stabilized verbal stimulus (a repeated word) might produce effects similar to those observed with positive after-images. Thus, he also was led to the discovery of verbal transformations by a consideration of the perceptual instability of vision with unchanging stimulation. Much as Gregory and I first called verbal transformations an "auditory analog of visual figures", Evans called the illusion "stabilised auditory images". Since this special journal issue is dedicated to the late Dr Evans, it is doubly appropriate to quote from a letter he wrote to me in April 1968 which summarizes the development of his ideas concerning illusory changes in repeated words: " . . . I have been working quite vigorously with what I have come to term (if you will pardon it) 'the stabilised auditory image' for the past 18 months. This is the history of my interest. I have been working with stabilised images--I was one of Ditchburn's group at Reading--for some years and had long been pondering the possibility of producing an auditory analog. I had been familiar with the Warren and Gregory paper but it had gradually faded out of my mind because of the initial analogy that you had made with reversible figures. In due course I found myself approaching the problem of a 'stabilised auditory image' afresh as it were, and had performed some rather painful experiments in which subjects were afflicted with loud noises from high amplification speakers. I was of course attempting to replicate the brief intense stimulus in the auditory system which would correspond to the brief intense flash which you will see by the enclosed papers I now use to produce an after-image--a perfectly stabilised retinal image. In the midst of the experiments I realized that my logic was fault~, and that I was attempting to equate a system interested in temporal change with a system interested in spatial change. Clearly if in order to provide a stabilised visual image one restricted spatial change, then in order to provide a 'stabilised auditory image' it would be necessary to prevent, or cut down, temporal

130

R. M. W A R R E N

change. In this circuitous way I found myself back with the Warren and Gregory experiment and as I performed it, details of your papers came to mind." I now believe that both Chris Evans and I were violating the aphorism attributed to Sherlock Holmes: "It is a cardinal error to theorize in advance of the facts. Inevitably it biases the judgment." Somewhat different hypotheses (or guesses) concerning the relation between auditory and visual perception had led us both to our experiments with repeated words--however the characteristics of verbal transformations (which could not be predicted and were revealed only through experimental observation) indicated that aspects of each of these hypotheses were in error. The understanding of verbal transformations by both Chris Evans and myself was hindered by continuing to consider these illusory changes as analogs of visual illusions. Certainly, our hypotheses were indispensable to us as motives for listening to repeated words, but after completing the initial experiments these guesses in advance of the evidence became encumbrances for both laboratories, biasing interpretation of the new facts. My current belief is that, while decay of a particular perceptual organization over time is a general phenomenon, transformation reflects special strategies for synthesis employed for the class of stimuli being used. With visual reversible figures, the availability of plausible alternative interpretations makes them particularly labile, permitting new forms to replace the old despite normal eye movements which generally stabilize perceptual organization. Paradoxically, the afferent instability enhances perceptual stability. But when retinal images are stabilized, all visual displays become potentially unstable, and we are allowed to glimpse stages of normal perceptual processing which are ordinarily hidden from view. The evidence now on hand indicates that verbal transformations reflect special strategies which normally aid in the processing required for comprehension of speech. The complexities associated with understanding ordinary discourse have been made painfully clear to investigators working in automatic (machine) speech recognition, even when messages are drawn from a restricted lexicon and are pronounced clearly and distinctly. Yet how much more difficult is the task facing a listener under the noisy conditions usually accompanying speech! The characteristics of verbal transformations make it appear likely that, when presented with the same word stated over and over, mechanisms come into play which normally correct for any errors in the initial verbal organization. Let us consider some of the special characteristics which have been discovered for verbal transformations, and why they led to some novel suggestions concerning the perceptual processing of speech. Roslyn Warren and I have found that verbal transformations were absent in children at the age of 5 years. By the age of 6 years, verbal transformations occurred for about half the children tested (the appearance of verbal transformations is all or none--either the children heard them at a rate corresponding to older children and young adults, or they did not hear verbal transformations at all). By the age of 8 years all children heard verbal transformations (Warren & Warren, 1966). Other experiments have shown that the rate and variety of verbal transformations remain at a high level through the 20's, begin to decline in the 30's and 40's, and are down to a very low level by the age of 60, with many of these older individuals hearing no changes at all (Warren, 1962, 1976). The nature of verbal transformations at each age could not have been predicted on the basis of any available analogy, model or theory. But once the quantitative and

PERCEPTUAL TRANSFORMATIONS

131

qualitative characteristics of transformations were known, they could be linked with other information in the literature, and led to the hypothesis that these illusory changes correspond to corrective reorganization. Such revisions normally are invoked when perceptual organization of speech sounds into a particular word is not confirmed by the subsequent context. It is suggested that auditory input corresponding to a block of words can be subject to provisional linguistic organization, with the underlying neural information held in short term storage for possible perceptual reorganization if needed. Children below the age of 6 years have not yet achieved this strategy. The aged, if they are to perceive speech accurately, must give up such perceptual revision since they cannot effectively integrate ongoing speech with stored auditory information going back several words. Traditionally, it has been assumed that once mastery of speech is achieved, the same mechanisms are used for the remainder of the lifespan. But it appears that the maintenace of mastery may require a continual revision of perceptual strategies (see Warren, 1962, 1976). There is another characteristic of verbal processing which has emerged from investigation of verbal transformations and which seems at first to be counterintuitive. When each ear is stimulated with the same repeated word, but the input to one ear is delayed by half the duration of the word, there is no fusion of the input to the two ears as with diotic stimulation, and separate asynchronous statements are heard on each side. Initially, as would be anticipated, the voice on each side is heard to say the same thing, but this equivalence is not maintained once verbal transformations start. Illusory changes occur independently on each side. The existence of separate perceptual organizations for the same stimulus at the same time indicates that there is not a single cortical lexicon as is commonly assumed (Warren & Ackroff, 1976; Warren, 1977).

Conclusions Any particular perceptual organization of an unchanging pattern of stimulation tends to become weakened with the passage of time and reorganized into a different perceptual form. The nature of these transformations depends upon the particular stimulus, and can provide information concerning specific details of perceptual processing not available through other techniques. Visual reversible figures can provide information concerning mechanisms for dealing with visual ambiguity. We have seen that stabilized retinal images provide information concerning the r61e of eye movements in perception (as Wheatstone first noted), and in recent years stabilized images have furnished information concerning special rules governing perceptual groupings of portions of complex nonambiguous displays. In hearing, verbal transformations have provided clues to special linguistic mechanisms for perceptual organization and reorganization of speech. Can illusions be expected with machines? If by illusions we mean errors in identification or evaluation of data, the answer is certainly yes. Can such illusions provide information of value? Probably. Such illusions can inform us of inadequacies in programs we have designed for artificial intelligence. Illusions in humans provide information with value of another sort. Often we are quite ignorant of the programs used for perceptual analysis. Our skills leading to visual object recognition and speech comprehension are so well practised that perception appears immediate and direct. But this is most deceptive. Illusions used as experimental probes can allow us to examine

132

R.M. WARREN

perceptual processing in a way not otherwise possible. As with pathology in medicine, illusions represent normal processes laid bare.

Preparation of this paper was supported in part by a grant from the National Science Foundation,

References BENNET-CLARK, H. C. & EVANS, C. R. (1963). Fragmentation of patterned targets when viewed as prolonged after-images. Nature, 199, 1215-1216. EVANS, C. R. (1965). Some studies of pattern perception using a stabilized retinal image. British Journal of Psychology, 56, 121-133. EVANS, C. R. & MARSDEN, R. P. (1966). A study of the effect of perfect retinal stabilization on some well-known visual illusions, using the after-image as a method of compensating for eye movements. The British Journal o/Physiological Optics, 23,242-248. NECKER, L. A. (1832). Observations on some remarkable phaenomena seen in Switzerland; and on an optical phaenomenon which occurs on viewing of a crystal or geometrical solid. Philosophical Magazine (Series 1), 3, 239-337. PRITCHARD, R. M. (1961). Stabilized images on the retina. Sc&nti]ic American, 204 (June), 72-78. PRITCHARD, R. M., HERON, W. & HEBB, D. O. (1960). Visual perception approached by the method of stabilized images. Canadian Journal o/Psychology, 14, 67-77. RAWCLIFFE, D. H. (1959). Illusions and Delusions of the Supernatural and the Occult. New York: Dover. WARREN, R. M. (1961a). Illusory changes of distinct speech upon repetition--the verbal transformation effect. British Journal o/Psychology, 52, 249-258. WARREN, R. M. (196 lb). Illusory changes in repeated words: differences between young adults and the aged. American Journal o/Psychology, 74, 504-516. WARREN, R. M. (1962). An example of more accurate auditory perception in the aged. In TIBBITTS, C. & DONAHUE, W., Eds, Social and Psychological Effects o/Aging. New York: Columbia University Press. WARREN, R. Mo (1976). Auditory illusions and perceptual processes. In LASS, N. J., Ed., Contemporary Issues in Experimental Phonetics. New York: Academic Press. WARREN, R. M. (1977). Les illusions verbales. La Recherche, g, 538-543. WARREN, R. M. & ACKROFF, J. M. (1976). Dichotic verbal transformations and evidence of separate processors for identical stimuli. Nature, 259, 475-477. WARREN, R. M. & GREGORY, R. L. (1958). An auditory analog of the visual reversible figure. American Journal o/Psychology, 71, 612-613. WARREN, R. M. d[r WARREN, R. P. (1966). A comparison of speech perception in childhood, maturity and old age by means of the verbal transformation effect. Journal of Verbal Learning and VerbalBehavior, 5, 142-146. WHEATSTONE, C. (1879). The ScientificPapers o/Sir Charles Wheatstone. London: The Physical Society of London (pp. 221-222).