95
MATHEMATICAL BIOSCIENCES
How We Know Universals:
Retrospect and Prospect
MICHAEL A. ARBIB Department of Computer and Information Science University of Massachusetts Amherst, Massachusetts
Communicated by Donald Perkel
ABSTRACT A new perspective on three papers coauthored by Warren McCulloch emphasizes the paradigmatic role to be played in brain theory by action-oriented studies of distributed computation in somatotopically organized neural networks.
For Warren McCulloch (1898-1969) and Walter Pitts (1924-1969) We honor the memory of great scientists by showing that their ideas are very much alive. 1. INTRODUCTION
It is well known that “What the Frog’s Eye Tells the Frog’s Brain” [9] showed that the frog retina extracted four behaviorally significant features of the visual scene and sent axons along the optic tract to form four spatially arranged maps in register in the optic tectum. It is reasonably well known that “Some Mechanisms for a Theory of the Reticular Formation” [7] suggested a way in which neural subsystems could be so connected that-even though each has only partial information, giving contradictory indications to different subsystems-they will reach a consensus to commit the organism to a single mode of action. What has not been clear is that these two ideas fit neatly together to play a crucial role in helping us elucidate the way in which brain structure subserves brain function. The conjunction became clear in a study of how the frog used the information, which Lettvin et al. [9] had shown was available in its tectum, to guide its behavior [5]; but here we want to pay tribute to the insight of Warren McCulloch and Walter Pitts by sharing our “discovery,” made in the course of these behavioral investigations, Mathematical Biosciences 11 (1971), 95-107
Copyright 0 1971 by American Elsevier Publishing Company, Inc.
96
MICHAEL A. ARBIB
that the essential ideas were at least implicit in their 1947 article [lo], “How We Know Universals: The Perception of Auditory and Visual Forms.” To achieve this end, we shall reverse the order of evolution of this insight, and present, in historical order, what our current views now suggest were the essential points of three articles [7,9, lo] of which Warren McCulloch was coauthor. In the third section, we shall present the paradigms for future brain theory that were brought forth by this new perspective. 2.
RETROSPECT:
THREE
KEY
PAPERS
A. Pitts and McCullocft on How We Know Universals Consider a figure such as a square. We may recognize it as a square despite many changes in position and size. In a classic paper [lo] “How We Know Universals: The Perception of Auditory and Visual Forms,” Walter Pitts and Warren McCulloch sought “general methods for designing nervous nets which recognize figures in such a way as to produce the same output for every input belonging to the figure.” (The Oxford English Dictionary tells us that a universal is “what is predicated of all the individuals or species of a class or genus; an abstract or general concept regarded as having an absolute, mental, or nominal existence; a general term or notion.“) Their paper has essentially two parts: the first shows how to average over a set of transforms of an input pattern to get an output dependent only on the figure of the input; the second, neglected but more important, provides a design for reflex mechanisms that secure invariance by computing a suitable transform to apply to the input pattern to bring it to “standard form.” i. Averaging
over a set of transforms
I& .& be a cortical manifold, T the time set, and Y the set of levels of neural activity that we take to encode information. Sensory stimulation then yields a function 4: 4 x T + Y such that 4(x, t) is the intensity at time t at point x of the cortical manifold. Let CDdenote the set of all such excitation patterns. Let now G be the set of transformations (assumed to be finite) that carry patterns into other patterns corresponding to the same figure. Pitts and McCulloch study the case in which G may be thought of as a group acting on the manifold &Z : T: 4(x, 0 * dG’% 6 examples
being the group of translations
Murhemarical Biosciences
11
(1971), 95-107
whose action
on JZ is given by
97
HOW WE KNOW UNIVERSALS
the formula x H x + ar, and the group of dilations given by the formula x H CIrx. What we now call a feature is simply a map @ -+ Y that assigns some neural output level to each excitation pattern. An invariant of an excitation pattern 4 is then associated with each featurefc by the formula
where N is the number of elements of G. Clearly, if 4 and 4’ belong to the same figure, that is, 4’ = T,cj for some T1 in G, we have &,G = &,G, since G is a group, which is what we mean by an invariant. Let now Z be a manifold with one element 5 for each invariant c&c. If the nervous system needs less than complete information in order to recognize figures, the manifold Z may be much smaller than .M. We may imagine Z to be appropriately activated by the network shown in Fig. 1.
CYCLING
OF
BACKGROUND EXCITATION LAYER AT
A
TO ONE TIME 3
ml
Q-CARRIES
THE
EXCITATION
X
FIG. 1. The Pitt+McCulloch scheme for averaging over a set of transforms. (An alternative scheme might have adjustable parameters that can shift the effective connections from ./to Q without intervention of the N layers .,&).
_M is connected to N manifolds-one ~2’~ for each element T of G. Element x of JZ is connected to element TX of A,; thus 4(x, t) upon J.+?yields @(TX, t) upon AT, but we set up subsidiary wiring so that &r can only be so activated when sufficient background excitation is provided. There is a computing element for each feature that generates (l/Nlfs for each activated layer in turn; the sum of these components of &,G are then accumulated at the appropriate 5 neuron of E. Figure 1 exemplifies the idea of “exchangeability of time and space”: Mathematical Biosciences 11 (1971), 95-107
MICHAEL
98
A. ARBIB
“any degree of freedom of a manifold or group can be exchanged freely with as much delay in the operation as corresponds to the number of distinct places along that dimension” [lo, page 130]-since we chose to have one layer of (l/N)ft computers and use it N times to process the whole group, rather than having a separate layer of (l/N)J< computers for each .AT and thus processing the whole group simultaneously. Pitts and McCulloch suggest that the Mrhythm may be the manifestation of the rhythmic sweep of excitation through the ~2’~ layers, but it is not clear that the CI rhythm is persistent enough during attention to fit this role. Further drawbacks of the model are discussed in [3]. ii. Transforming
to Standard
Form
In the latter half of their paper, Pitts and McCulloch present what they call “a uniform principle of design for reflex mechanisms which secure invariance under an arbitrary group G” [lo, page 1451, and which TRANSFORM
Tcp/w(E(Tp
INPUT
1l.T
TRANSFORMED
P +
PATTERN
PATTERN
APPLICATION
TRANSFORM Y COMPUTER FIG. 2. A generalization to standard form.
COMPUTER
E 0-v)
of the Pitts-McCulloch
scheme for transforming
a pattern
we shall present in a somewhat generalized form. Given a pattern 4 we want to find some transformation T that changes 4 to a pattern Tc$= & with a desired property (e.g., standard size or position). We associate with each pattern 4 an n-dimensional “error vector” E(4) E R" with the property that E(4) = 0 if and only if 4 is in standard form. We then introduce a mapping w which associates with each error some transformation that can reduce it, that is,
?V: R" + G for all patterns Mathematical
is such that
llE[?V(E(#))
4, and with equality
Biosciences
11 (1971), 95-107
. cj]II < IIE(4)II
only in case 4 is in standard
form.
HOW
WE KNOW
99
UNIVERSALS
Figure 2 shows a discrete-time system that will generate for any 4 a transformation T, that will transform it to standard form. The transform application box in Fig. 2 is memoryless: input pattern $J and transform T at its input yield transformed pattern Tqbat its output. The error computer box is memoryless: an input pattern at its input yields the corresponding error at its output. The transform computer box is a sequential machine: if its state at time t is the transform T, and its input at time t is the error vector e, then its new state and output at time t + 1 will both be the transform V(e) . T. Of course, the hard work in such a scheme is actually defining an appropriate error measure E and then finding a mapping w that can make use of error feedback properly to control the system so that it will eventually transform the input to standard form. Pitts and McCulloch exemplify their general scheme in a plausible reflex arc from the eyes through the superior colliculus to the oculomotor nuclei to so control the muscles that direct the gaze as to bring the point of fixation to the center of gravity of distribution of brightness of the visual input. (With our current knowledge of retinal “preprocessing” we might now choose to substitute a term such as “general contour information,” or any “feature” in the sense of Section 2A i, for “brightness” in the prescription above. But that does not affect the model that follows.) Julia Apter [1, 21 showed that each half of the visual field of the cat (seen through the nasal half of one eye and the temporal half of the other) maps topographically upon the contralateral colliculus. In addition to this “sensory” map, she studied the “motor” map by strychninizing a single point on the collicular surface and flashing a diffuse light on the retina and observing which point in the visual field was affixed by the resultant change in gaze. She found that these “sensory” and “motor” maps were almost identical (cf. insert in Fig. 3). On the basis of these data, Pitts and McCulloch erected the scheme shown in Fig. 3 (their Fig. 6) for centering the gaze in animals, such as the cat, that rotate their eyeballs to do so. They noted that excitation at a point of the left colliculus corresponds to excitation from the right half of the visual field, and so should induce movement of the eye to the right. Gaze will be centered when excitation from the left is exactly balanced by excitation from the right. Their model is then so arranged, for example, that each motoneuron controlling muscle fibers in the left medial rectus and right lateral rectus muscles, which contract to move the left and right eyeballs, respectively, to the right should receive excitation summing the level of activity in a thin transverse strip of the left colliculus. This process provides all the excitation to the right lateral and medial rectus, that is, the muscles turning the eye to the right. Reciprocal inhibition by axonal Mathematical
Biosciences
11 (1971), 95-107
RIGHT
LAT. RECT.(NPD
FIG. 3. A simplified diagram showing ocular afferents to left superior colliculus, where they are integrated anteroposteriorly and laterally and relayed to the motor nuclei of the eyes. A figure of the right superior colliculus mapped for visual and motor response by Apter is inserted. An inhibiting synapse is indicated as a loop about the apical dendrite. The threshold of all cells is taken to be one. (Figure 6 from Pitts and McCulloch [lo].) Mathematical Biosciences 11 (1971), 95-107
101
HOW WE KNOW UNIVERSALS
collaterals from the nuclei of the antagonist eye muscles, which are excited similarly by the other colliculus, serve to perform subtraction. The computation of the quasi-center of gravity’s vertical coordinate is done similarly. (Of course, computation may be performed by commisural fibers linking similar contralateral tectal points, instead of in the oculomotor nuclei.) Eye movement ceases when and only when the fixation point is the center of gravity. B. Lettvin et al. on “What
the Frog’s Eye Tells the Frog’s Brain”
Pitts and McCulloch [lo] sought to explain how we know universals. Lettvin et al. [9] turned to the frog for experimental answers to the questions set by the earlier paper, noting that the frog is normally motionless, and that its visually guided behavior can be adequately described in terms of recognition of two universals, prey and enemy. Light reaching the back of the retina stimulates rods and cones, whose axons impinge upon the interneurons that provide input to the ganglion cells whose axons are the fibers that course along the optic tract to carry signals about visual stimulation to the tectum. (The tectum is the major visual center of the frog’s brain, and is homologous to the superior colliculus of cat and man. In the latter, however, the visual cortex plays the key role in complex visual perception, with the superior colliculus being relegated to play more the role of an orienting mechanism, as suggested by our discussion of Apter’s work in Section 2A i.) Lettvin et al. found that the majority of the ganglion axons could be classified into one of four groups, on the basis of their response to visual stimuli, and that moreover the endings of the four groups ended in four distinct layers of the tectum, with the four layers being in registration in that cells atop one another in the tectum would signal the presence or absence of the four features in the same region of the visual field. The four feature detectors reported by Lettvin et al. were 1. sustained contrast detectors, which yield a prompt and prolonged discharge whenever the sharp edge of an object either lighter or darker than the background moves into its receptive field (or appears there when light is turned on) and stops there; 2. net convexity detectors, which respond to a small or convex edge of a large dark object passed through the visual field; the response does not outlast the passage; a smooth motion across the visual field has less effect than a jerky one; 3. moving-edge detectors, which respond to any distinguishable edge moving through its receptive field; 4. net dimming detectors, which respond to sudden reduction of illumination by a prolonged and regular discharge. MathematicalBiosciences
11 (1971), 95-107
102
MICHAEL
A. ARBIB
The point is that [9, page 19501 “the eye speaks to the brain in a language already highly organized and interpreted, instead of transmitting some more or less accurate copy of the distribution of light on the receptor.” Further, the encoding is such as to aid the frog in finding food and evading predators-in recognizing the universals prey and enemy-as suggested in their closing paragraph [9, page 19511: The operations thus have much more the flavor of perception than of sensation if that distinction has any meaning now. That is to say that the language in which they are best described is the language of complex abstractions from the visual image. We have been tempted, for example, to call the convexity detectors “bug perceivers.” Such a fiber [operation 21 responds best when a dark object, smaller than a receptive field, enters that field, stops, and moves about intermittently thereafter. The response is not affected if the lighting changes or if the background (say a picture of grass and flowers) is moving, and is not there if only the background, moving or still, is in the field. Could one better describe a system for detecting an accessible bug ? Thus they did indeed find the layers of feature detectors posited in the 1947 papers, but they go on to say [9, page 19501; “The operations found in the frog make unlikely later processes in his system of the sort described by two of us earlier [in the 1947 paper], for example dilatations; but those were adduced for the sort of form recognition which the frog does not have,” indicating that they were thinking only of the averaging mechanism of the first part, and not the centering mechanism of the second part. We have argued [3, 51 that a mechanism akin to that latter one is indeed likely to be present in the frog. But here we note that, in a quick scan of the literature, including McCulloch’s own writings, we found that whenever the 1947 paper was alluded to, it was only the first part that was ever discussed. It seems to be an interesting item in the history of ideas that what we here suggest is the truly seminal portion of the 1947 paper has lain dormant for 23 years until our own work on the frog led us to read the second part with new insight. C. Kilmer et al. on “Some Mechanisms Formation”
for a Theory of the Reticular
Flatworms avoid the light, but if signals indicating food come from the direction of a light, the animal must resolve the conflict between approach and avoidance if it is to act. A key question is thus, “How is the central nervous system structured to allow coordinated action of the whole animal when different regions receive contradictory local information?” Mathematical Biosciences 11 (1971), 95-107
HOW
WE KNOW
UNIVERSALS
103
suggested that the answer lay in the principle of redundancy which states, essentially, that command should pass to the region with the most important information. He cited the example of a naval fleet where the behavior of the whole World War I naval fleet is controlled, at least temporarily, by the signals from whichever ship first sights the enemy, the point being that this ship need not be the flagship, in which command normally resides. McCulloch further suggested that this redundancy of potential command in vertebrates would find its clearest expression in the reticular formation (RF) of the brain stem. Kilmer and McCulloch then made the following observations toward building a model of RF. (a) They noted that at any one time an animal is in only one of some 20 or so gross modes of behavior (e.g., sleeping, eating, grooming, mating, urinating), and posited that the main role of the core of the RF (or at least the role they sought to model) was to commit the organism to one of these modes. (b) They noted that anatomical data of the Scheibels [l l] suggested that RF need not be modeled neuron by neuron, but could instead be considered as a stack of “poker chips,” each containing tens of thousands of neurons, and each with its own nexus of sensory information. (c) They posited that each module (“poker chip”) could decide which mode was most appropriate to its own nexus of information, and then asked, “How can the modules be coupled so that, in real time, a consensus can be reached as to the mode appropriate to the overall sensory input, despite conflicting mode indications from local inputs to different modules ?” In this framework, Kilmer et al. [7] designed and simulated a model, called S-RETIC, of a system to compute mode changes, comprising a column of modules that differed only in their input array, and that were interconnected in a way suggested by RF anatomy. Besides its own partial sensory information, each module receives input from several others, with physically adjacent units more highly “information coupled” than those farther apart. (S-RETIC does not model the use of a structured environment to correlate the partial sensory inputs; nor does it specify how its modal specification, to be described below, is to cause the appropriate change in the organism’s output routines.) Let there be IZ modes. Then the state of a module at any time is a probability vector 3, where fii is the weight that the module currently assigns to the hypothesis that the ith mode is currently appropriate. i. If an input to a module tends to change at all drastically, then the new $ is almost entirely determined as a function of the new sensory input. Call this p(s). This locally decouples the module after an overall McCulloch
of potential
command,
Mathematical Biosciences 11 (1971), 95-107
104
MICHAEL A. ARBIB
system input change. Another, global, form of decoupling can be added to the local variety following an input change if the entire set of modules had been heavily committed to an output mode immediately before. ii. If the input to the module causes little change in its p(s), then the new @is obtained by operating on the average of the p(s) and the old 0’s of its communicating modules. A decision is (arbitrarily) said to be reached when a majority of the modules assign weight greater than 0.5 to any one mode. The overall effect of the scheme, then, is to decouple the modules initially after an input change in order to accentuate each’s evaluation of what the next mode should be, and then through successive iterations couple them back together in order to reach a global consensus. Computer simulation showed that S-RETIC, at least with the coupling patterns they studied, would converge for every input in less than 25 cycles, and that once it had converged, it would stay converged, for a given input. When the inputs strongly indicate one mode, convergence is fast; but when the indication is weak, initial conditions and circuit characteristics play an important role. 3.
PROSPECT:
PARADIGMS
FOR
BRAIN
THEORY
For Pitts and McCulloch, the point of their model of the colliculus (cf. Section 2Aii) was that it gave an implementation of their scheme, generalized somewhat in our Fig. 2, that did justice to neurophysiological data. However, we believe that the scheme has far greater significance than this, for it shows how to design a somatotopically organized network in which there is no “executive neuron” that decrees which way the overall system behaves; rather, the dynamics of the eflectors, with assistance from neuronal interactions, extracts the output trajectory from a population of neurons, none of which has more than local information as to which way the system should behave. It is our thesis that the study of such somatotopically organized networks must become a central paradigm (in the sense of T. S. Kuhn’s The Structure of Scient$c Revolutions [8, page lo]) in brain theory. If we paraphrase our interpretation of the significance of the Pitts and McCulloch model of the superior colliculus to say that it showed how “the organism can be committed to an overall action by a population of motoneurons none of which had global information as to which action is appropriate,” we are struck by the similarity of the situation to that in our statement of the RF problem. Thus the fact, already noted in Section 2B, that the second part of “How We Know Universals” appears to have been completely neglected in the subsequent literature becomes even more incredible, since the commonality of problems has not been noted before Mufhemutical
Biosciences 11 (1971), 95-107
HOW WE KNOW UNIVERSALS
105
even in Elmer and McCulloch’s own writings on the reticular formation. Let us stress, however, that there are differences as well: If several flies are within the “snapping zone” of a frog, the frog will usually snap at one of them. Such a result could easily be explained by having a serial scan made of the tectum until a region is first found in which the activity in the four layers signals the presence of a fly, at which stage the scanner would issue a command to snap in the direction indicated by the current address of the scan. That such a serial form of processing is not a candidate for the frog’s neural machinery is shown by the observation that sometimes the frog will snap midway between two flies, precisely the “center of gravity” effect one expects from an output system of the distributed computation type suggested by Pitts and McCulloch for centering of gaze. The first point we make, then, is that the foregoing distinction between serial and distributed processing could not be made by asking only the usual question of sensory physiology, “What information is relayed to the brain ?” but by also asking, “How does the animal make use of such information to act?” Some insightful answers to the latter question for the frog have been generated by Richard Didday in his 1970 Ph.D. thesis from Stanford [5], portions of which appear in [6]. The second point is that while the Pitts-McCulloch model does yield integrated behavior, it does not explain the “usually-one-fly effect.” It turns out that the mechanism for this bears a great resemblance to the Kilmer-McCulloch RF model (for a fuller discussion, and further references, see [3]) : The observations on frog behavior suggest three layers of processing, each involving distributed computation. The first layer operates upon the four layers of retinal information to provide for each region a measure The third layer does a Pitts-McCulloch-type computation of “foodness.” (with certain refinements required to make the motion ballistic) to direct motion of the frog to the position corresponding to the “center of gravity” of activity in the second layer. The task of the second layer is then very much akin to the task of the Kilmer-McCulloch RF. Where that model has an array of modules that must interact to get a majority favoring the same mode, the task of the second layer of our hypothetical tectum is to turn down the activity of all but one region of (or from) the first layer. The essential mechanisms turn out to be very similar, and provide an and “newness” neurons observed by explanation for the “sameness” Lettvin et al. [9]. The models differ in having all modes evaluated in each module, versus having a module identified with a mode. In any case, the study of frog behavior sheds new insight on RF modeling, and suggests alternate hypotheses. To summarize, our attempt to follow up “What the Frog’s Eye Tells Mathematical Biosciences 11 (1971), 95-107
106
MICHAEL
A. ARBIB
the Frog’s Brain” has drawn together the seemingly disparate contributions of “How We Know Universals” and “Some Mechanisms for a Theory of the Reticular Formation.” Our model is still a crude oversimplification of the complexities of a real frog brain, but we believe that our partial successes show that the following paradigms, all too often neglected in the cybernetics literature, must play a crucial role in future brain theory. (For their further elaboration, see [3].) 1. Theory must be action-oriented; for example, studies of sensory processing must take into account the behavior of the animal and the classification of sensory input implied by its actions. (A preliminary account of the design of memory structures appropriate to this viewpoint appears in Arbib et al. [4].) 2. Computation must be distributed; for example, the organism is committed on the basis of interaction between whole populations of simultaneously active neurons, rather than as a result of serial processing by a localized group of “executive” neurons. 3. The brain is a layered computer, with somatotopic relations between layers. This third observation is common knowledge to neurophysiologists and neuroanatomists; the time has come to incorporate it in our theories. All this is implicit in the 1947 paper of Warren McCulloch and Walter Pitts [lo]. Their ideas are still very much alive.
ACKNOWLEDGMENT The preparation of this article was supported in part by United States Public Health Service research grant number 1 ROl NS09102ZOlCOM from the National Institute of Neurological Diseases and Stroke.
REFERENCES 1 J. Apter, The projection of the retina on the superior colliculus of cats, J. Neurophysiol. 8(1945), 123-134. 2 J. Apter, Eye movements following strychninization of the superior colliculus of cats, J. Neurophysiol. 9(1946), 73-85. 3 M. A. Arbib, The metaphorical brain, Wiley and Sons (in press). 4 M. A. Arbib, P. Dev, and R. L. Didday, Action-oriented memory subserving perception, Journal of Cyberneticsl(1971) (in press). 5 R. L. Didday, The simulation and modeling of distributed information processing in the frog visual system. Information Systems Laboratory, Technical Report 6112-1, Stanford (August, 1970). 6 R. L. Didday, A method for simulating distributed computation in nervous systems, Intern. J. Man Machine Syst. (in press). Mathematical Biosciences 11 (1971), 95-107
HOW WE KNOW UNIVERSALS
107
7 W. L. Kilmer, W. S. McCulloch, and J. Blum, Some mechanisms for a theory of the reticular formation, in Systems theory and biology (M. Mesarovic, ed.), pp. 286-375. Springer, Berlin, 1968. 8 T. S. Kuhn, The structure of scientific revolutions (2nd ed., enlarged), Vol. II, No. 2 of the International encyclopaedia of unifiedscience. Univ. of Chicago Press, Chicago, 1970. 9 J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts, What the frog’s eye tells the frog’s brain, Proc. IRE 47(1959); 1940-1951. ) 10 W. H. Pitts and W. S. McCulloch, How we know universals: The perception of auditory and visual forms, Bull. Math. Biophys. 9(1947), 127-147. 11 A. B. Scheibel and M. E. Scheibel, Structural substrates for integrative patterns in the brain stem reticular core, in Reticular formation of the brain (H. Jasper et al., eds.), pp. 31-55. Little, Brown, Boston, 1958.
Mathematical Biosciences 11 (1971), 95-107