NN 1326
Neural Networks PERGAMON
Neural Networks 12 (1999) 791–801 www.elsevier.com/locate/neunet
Self-organization of shift-invariant receptive fields Kunihiko Fukushima* Department of Information and Communication Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan Received 12 January 1999; accepted 30 April 1999
Abstract This paper proposes a new learning rule by which cells with shift-invariant receptive fields are self-organized. With this learning rule, cells similar to simple and complex cells in the primary visual cortex are generated in a network. To demonstrate the new learning rule, we simulate a three-layered network that consists of an input layer (or the retina), a layer of S-cells (or simple cells), and a layer of C-cells (or complex cells). During the learning, straight lines of various orientations sweep across the input layer. Here both S- and C-cells are created through competition. Although S-cells compete depending on their instantaneous outputs, C-cells compete depending on the traces (or temporal averages) of their outputs. For the self-organization of S-cells, only winner S-cells increase their input connections in a similar way to that for the neocognitron. In other words, the winner S-cells have LTP (long term potentiation) in their input connections. For the selforganization of C-cells, however, loser C-cells decrease their input connections (LTD long term depression), while winners increase their input connections (LTP). Here both S- and C-cells are accompanied by inhibitory cells. Modification of inhibitory connections together with excitatory connections is important for creation of C-cells as well as S-cells. q 1999 Elsevier Science Ltd. All rights reserved. Keywords: Shift-invariant receptive field; Self-organization; Competitive learning; Neural network model; Visual system; Complex and simple cells
1. Introduction This paper proposes a new learning rule by which cells with shift-invariant receptive fields are self-organized. Namely, cells similar to simple and complex cells in the primary visual cortex are generated in a network trained by the new learning rule. Although a variety of neural network models for complex cells have been proposed so far, models that discuss the selforganization of complex cell are limited. Most of them try to faithfully reconstruct the characteristics or behaviors of the cells that have already been matured. The author previously proposed a neural network model of the visual system, called a neocognitron (Fukushima, 1988; Fukushima & Miyake, 1982). The neocognitron has a hierarchical multilayered architecture similar to the classical hypothesis by Hubel and Wiesel (1962, 1965), and acquires an ability to robustly recognize visual patterns through unsupervised learning. It consists of layers of Scells, which resemble simple cells in the visual cortex, and layers of C-cells, that resemble complex cells. S-cells are feature-extracting cells. They have variable input connections that are strengthened through unsupervised * Tel.: 1 81-424-43-5362; fax: 1 81-424-43-5376. E-mail address:
[email protected] (K. Fukushima)
learning with a winner-take-all process. The preferred features of S-cells are determined during the learning. Ccells, in similarity with complex cells, exhibit an approximate invariance to the position of the stimuli presented within their receptive fields. Although input connections to S-cells are variable and are modified through unsupervised learning, input connections to C-cells are fixed and unmodifiable in the conventional neocognitron. Each C-cell receives excitatory input connections from a group of Scells that have the same preferred features. Since the locations of the receptive fields of these S-cells differ slightly from each other, the C-cell can exhibit a shift-invariant response to stimuli presented in its receptive field. With the new learning rule proposed in this paper, input connections to C-cells, as well as S-cells, can be created through learning. Shift-invariant receptive fields of C-cells can be automatically generated in a network, if straight lines of various orientations sweep across the input layer during the learning. The conventional neocognitron has an architecture called cell-planes from the beginning before the learning starts. The cells in each cell-plane share the same set of input connections. The self-organization of the network progresses always maintaining this condition of shared connections. In contrast to the conventional learning rule used in the
0893-6080/99/$ - see front matter q 1999 Elsevier Science Ltd. All rights reserved. PII: S0893-608 0(99)00039-8
792
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801 U0
US
UC
a S ucSL S
u0 cS photo-
vS
bS cC
VS- c e l l
receptors
aC
S-cells
vC
uC
cCL
bC
the cells are connected in the network. Cells are arranged in a two-dimensional array in each layer. The built-in architecture of cell-planes, which the conventional neocognitron has, does not exist in the network. The self-organization can start from a homogeneous network. Shift-invariant receptive fields are automatically generated without using an architecture like cellplanes. 2.1. S-cell layer
VC- c e l l C-cells
connections: excitatory inhibitory excitatory inhibitory
variable fixed
Fig. 1. Connections between cells in the network.
neocognitron, the new learning rule requires neither the architecture of cell-planes nor the condition of shared connections to realize shift-invariant receptive fields of Ccells. C-cells, as well as S-cells, are created through competition. Although S-cells compete depending on their instantaneous outputs, C-cells compete depending on the traces (or temporal averages) of their outputs. Incidentally, the use of traces in self-organization was suggested previously by Fo¨ldia´k (1991), but our learning rule is different from his. This paper proposes to introduce inhibitory cells in the network for C-cells as well as in that for S-cells. Both S- and C-cells are accompanied by inhibitory cells. Modification of inhibitory connections together with excitatory connections is important for creation of C-cells as well as S-cells. Although it is possible to generate shift-invariant receptive fields without using inhibitory cells, the excitatory connections have to be modified with a complicated learning rule. Incidentally, the old version of the learning rule that the author proposed previously (Fukushima & Yoshimoto, 1998) tried to create C-cells without using inhibitory cells. The new learning rule proposes in this paper is greatly simplified from the previous one by the introduction of inhibitory cells. To test the new learning rule, we will simulate a threelayered network that consists of an input layer (or the retina), a layer of S-cells, and a layer of C-cells.
Each S-cell is accompanied by an inhibitory VS-cell as in the case of the neocognitron (or the cognitron) (Fukushima, 1988; Fukushima & Miyake, 1982; Fukushima, 1989). As shown in Fig. 1, the S-cell receives variable excitatory connections from photoreceptor cells of the input layer (or the retina), and a variable inhibitory connection from the VS-cell. The VS-cell has fixed excitatory connections from the same set of photoreceptor cells as the S-cell has, and always responds with the average (root mean square) intensity of the outputs of the photoreceptor cells. The location of the connectable area of each S-cell is determined so that the retinotopy can be maintained. Incidentally, a cell in the network receives variable connections only from the cells in the connectable area. Besides these connections, there is a built-in mechanism of recurrent lateral inhibition among S-cells, which is drawn with broken lines in Fig. 1. In the mathematical descriptions below, the notation u and v is used to represent the output of a cell. Each cell is denoted by the location of its receptive field center on the input layer. For example, the output of an S-cell at time t is denoted by uS(t,x), where x is the two-dimensional coordinates of the receptive field center of the cell. We will also use notation uS(x) to represent the cell. Since the densities of the cells differ in the three layers, the x- and y-coordinates of a cell do not necessarily take integer values. The output of an S-cell is given by the following equation. In order to show only the essence, the effect of lateral inhibition among S-cells is neglected in this equation. Exact expressions including lateral inhibition appear in the Appendix. X 2 3 11 aS
t; j; x u0
t; j 6 7 uS uj 2 xu#AS 6 7 2 17 ; w6 uS
t; x 5 1 2 uS 4 1 1 uS bS
t; x vS
t; x
1
2. Network architecture The network that is simulated in this paper has an input layer (U0) consisting of photoreceptor cells, a layer of Scells (US), and a layer of C-cells (UC). 1 Fig. 1 illustrates how
where w [ ] is a nonlinear function defined by ( x if x $ 0 D : wx 0 if x , 0
1 Although we can construct a model that has layers of contrast-extracting cells corresponding to retinal ganglion cells and lateral geniculate cells, we discuss the simplest model in this paper, because the aim of this paper is to explain the problem of self-organization of the connections to C-cells.
Here aS
t; jx
$ 0 is the strength of the excitatory variable connection from photoreceptor u0(j) to S-cell uS(x), and bS
t; x
$ 0 is the strength of the inhibitory variable
2
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
connection from VS-cell vS(x). The radius of the connectable area of an S-cell is AS. Parameter u S is a constant (0 , uS , 1) determining the threshold of S-cells. It controls the selectivity of the S-cell’s response to a specific feature: the larger the threshold is, the larger becomes the selectivity (Fukushima, 1989). The output vS(t,x) of the subsidiary inhibitory VS-cell, which inhibits the S-cell in a shunting manner, is given by vS
t; x
s X cS
j 2 x{u0
t; j}2 :
3
whose output is given by s X vC
t; x cC
j 2 x{uS
t; j}2 ;
793
5
uj 2 xu#AC
where cC
j
. 0 is a slightly bell-shaped, but almost flat, two-dimensional function. It determines the strength of the excitatory fixed connection from S- to VC-cells. 3. Learning
uj 2 xu#AS
Here cS
j
. 0 is a slightly bell-shaped, but almost flat, two-dimensional function. It determines the strength of the excitatory fixed connection from photoreceptors to VS-cells. As can be seen from this equation, the output of VS-cell is proportional to the (weighted) root-mean-square of the outputs of the photoreceptors, from which the S-cell receives variable excitatory connections. 2.2. C-cell layer As shown in Fig. 1, the architecture of the network from S-cells to C-cells is the same as that from photoreceptor cells to S-cells. Each C-cell receives variable excitatory connections from S-cells of the preceding layer. Each Ccell is accompanied by an inhibitory VC-cell in the same way as the S-cells. The connectable area of each C-cell, like that of an S-cell, is predetermined so that the retinotopy can be maintained. The layer of C-cells also has recurrent lateral inhibition. The difference in characteristics between S- and C-cells is created, not by the difference in network architecture, but by the difference in learning rules by which the connections are modified. Mathematically, the output of a C-cell is given by the same equations as those for S-cells: X 2 3 11 aC
t; j; x uS
t; j 6 7 uC uj 2 xu#AC 6 7 2 17: w6 uC
t; x 5 1 2 uC 4 1 1 uC bC
t; x vC
t; x
4 This equation represents the response of the C-cell neglecting the effect of lateral inhibition. Exact expressions including lateral inhibition appear in the Appendix. Here aC
t; j; x
$ 0 is the strength of the variable excitatory connection from S-cell uS(j) to C-cell uC(x), and AC is the radius of the connectable area of a C-cell. Parameter u C is a constant (0 , uC , 1) determining the threshold of the Ccell. Similarly to the S-cells, the selectivity in the response of the C-cell is controlled by the value of the threshold. The C-cell also receives variable inhibitory connection bC
t; x
$ 0 from subsidiary inhibitory VC-cell vC(x),
Training patterns that are presented during the learning phase are moving patterns that sweep across the input layer. In the computer simulation discussed in this paper, long straight lines of various orientations are mainly used because the aim of our simulation is to test the self-organization of a three-layered network in which line-extracting cells (S- and C-cells) are expected to be generated. Straight lines, which are long enough to cover the input layer completely, sweep across the input layer. Both S- and C-cells are created through competition. Although S-cells compete depending on their instantaneous outputs, C-cells compete depending on the traces (or temporal averages) of their outputs. For the self-organization of S-cells, only winner S-cells increase their input connections. In other words, the winner S-cells have LTP (long term potentiation) in their input connections. For the self-organization of C-cells, however, loser C-cells decrease their input connections (LTD long term depression), while winners increase their input connections (LTP). Both the S- and the C-cells are accompanied by inhibitory cells. Modification of inhibitory connections together with excitatory connections is important for creation of C-cells as well as S-cells. These processes will be discussed below in more detail. Initial values of excitatory input connections to each S- or C-cell have a weak and almost flat, but slightly bell-shaped, spatial distribution in their connectable area. These diffused initial connections coincide with anatomical observations, where in the developing nervous system, synaptic connections between neurons are overproduced initially and redundant axons are gradually eliminated afterwards. 3.1. Learning rule for S-cells The learning rule of the S-cells is similar to that for the conventional neocognitron (Fukushima, 1989) (and also for the training of the excitatory cells in a cognitron). Each cell (S-cell) competes with other cells in its vicinity (called the competition area of the cell), and the competition depends on the instantaneous activities of the cells. (We will use the term activity in almost the same meaning as output of a cell in this paper.) Only winners of the competition have their input connections increased. The increment of each connection is proportional to the presynaptic activity. Namely, LTP is induced in the input synapses of the winner S-cells.
794
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
(Fukushima, Nagahara, Shouno, & Okada, 1996). v u X u {aS
t 1 1; j; x}2 bS
t 1 1; x t cS
j 2 x uj 2 xu#A
8
S
Fig. 2. Learning rule for C-cells. This figure illustrates how the cells respond and how the connections are modified when a vertical line stimulus is presented at the center of the receptive fields of the upper three S-cells. The topmost S-cell responds strongly (11), because the orientation of the stimulus coincides with the preferred orientation of the cell. The second Scell responds only weakly (1), because the cell has a slightly different preferred orientation. The third S-cell is silent (0), because the orientation of the stimulus line is completely different from the preferred orientation of the cell. The lower three S-cells are silent (0), because the stimulus is not presented at proper location in the receptive fields of these cells. The lines between S- and C-cells represent the desired connections that are expected to be generated after finishing the learning. The marks besides the connections represent that the connections are largely (11) or slightly (1) strengthened, or decreased largely (– –) or slightly (2), when the topmost C-cell becomes a winner. The connections without any mark are unchanged.
Therefore, the input connections of a winner cell will form a template that exactly matches the stimulus presented in its connectable area. Since the variable inhibitory connection from the accompanying VS-cell is also increased, the winner S-cell acquires a selective responsiveness to this line stimulus. Mathematically, excitatory connection aS(t,j,x) from u0(j) to uS(x) is increased by an amount qS cS
j 2 xu0
t; j, if cell uS(x) becomes a winner at time t within the competition area of radius DS. When cell uS(x) is a loser, the connections converging on the cell are unchanged. That is, aS
t 1 1; j; x aS
t; j; x 1 DaS
t; j; x;
6
where DaS
t; j; x
8 < qS cS
j 2 x u0
t; j
if uS
t; x
:
otherwise
0
max {uS
t; j} . 0
uj 2 xu#DS
7 and qS is a positive constant that determines the learning speed. cS(j) is the same function as that used in Eq. (3), and is used here as a spatial weight loaded into aS(t,j,x). The strength of the inhibitory connection is determined from the excitatory connections by the following equation
If a straight line is presented during the learning phase, winners in the competition are generally distributed along the line with approximately equal intervals because of the limited size of the competition areas. As a result of modification of the input connections, the receptive fields of the winners become selectively responsive to this line (Fukushima, 1989). When the line is moved in parallel and comes to another location, another set of winners is chosen, and the new winners acquire receptive fields of the same preferred orientation but at different locations. Thus, after finishing a sweep of the line across the input layer, S-cells whose receptive fields have the same preferred orientation as the line are generated and become distributed over the layer US. S-cells of other preferred orientations are generated by sweeps of lines of other orientations. Thus, after finishing repeated presentations of lines of various orientations, Scells of a variety of preferred orientations come to be distributed all over the layer. The competition area (DS) for an S-cell has to be smaller than its connectable area (AS), where the areas are compared based on the size mapped onto the input layer. Otherwise, the distribution of the receptive fields of S-cells of the same preferred feature will become too sparse to cover every part of the input layer. The recurrent lateral inhibition among the S-cells is not always essential, but is useful for choosing winner cells in some special situations. Suppose, for example, a training pattern be a line stimulus whose intensity is maximum at the center and decreases monotonically towards both sides. In the S-cell layer at the initial state, only the cell located at the center of the line becomes a winner, and all other cells become losers because of the monotonic decrease of the stimulus intensity. If the recurrent lateral inhibition exists, however, the output of the cells around the winner cell will be suppressed, once the winner cell has learned the stimulus and responds to it strongly. As a result, winners will also be chosen from among the cells situated outside of the inhibited cells. In the real world situation, the selection of winners will usually progress successfully even without lateral inhibition because of noise or irregularity in the network. The computer simulation discussed later shows that the characteristics of the generated S- and C-cells do not change greatly by the lateral inhibition, but that the number of silent cells is reduced. The initial values of connections aS(t,j, x) and bS(t,x) are given in Appendix. 3.2. Learning rule for C-cells C-cells also compete with each other for the selforganization. In contrast to S-cells, however, the competition
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
is based on the traces of their activities, not on their instantaneous activities. A trace is a kind of temporal average (or moving average). Mathematically, the trace u~C
t; x of the output uC(t,x) of a C-cell is defined by u~ C
t; x
1 2 aC u~ C
t 2 1; x 1 aC uC
t; x
9
where a C is a constant (0 , aC , 1) determining the speed of decay of the trace. Once the winners are determined by the competition, the excitatory connections to the winners increase. The increment of each connection is proportional to the instantaneous presynaptic activity. At the same time, the losers have their excitatory input connections decreased. The decrement of each connection is proportional to both the instantaneous presynaptic activity and the instantaneous postsynaptic activity. Fig. 2 illustrates this situation. This rule can be rephrased that LTP is induced in the excitatory input synapses of the winner C-cells, and that LTD is induced in those of the loser C-cells. The decrease of the connections to loser cells can be interpreted physiologically that the total amount of synaptic connections diverging from a single cell cannot increase unlimitedly: an increase of some synapses from a cell is accompanied by a decrease of other synapses. Mathematically, excitatory variable connection aC(t,j,x) from uS(x) to uC(x) is given by aC
t 1 1; j; x waC
t; j; x 1 DaC
t; j; x;
10
where DaC(t,j,x) changes depending whether the C-cell is a winner or not: DaC
t; j; x
8 < qC cC
j 2 x uS
t; j :
if u~C
t; x
max {u~C
t; j} . 0
uj 2 xu#DC
2q 0C cC
j 2 x uS
t; j uC
t; x otherwise
11 and DC is the size of competition area of the C-cell. Here cC(j) is the same function as that used in Eq. (5), and is used as a spatial weight. Positive constants qC and q 0C determine the speed of learning. The inhibitory variable connection from VC-cell vC(x) to C-cell uC(x) is increased in such P a way that the total amount of the excitatory inputs
uj 2 xu#AC aC
j; xuS
j never exceeds the inhibitory input
bC
xvC
t; x to the C-cell. In other words, the strength of the inhibitory connection is regulated so as to balance the excitation and inhibition to the cell and to prevent runaway excitation. The activity of a Ccell is thus regulated so as not to exceed a certain value. Mathematically, 0
B B bC
t 1 1; x maxBbC
t; x; @
X
aC
t 1 1; j; x uS
t; j
uj 2 xu#AC
vC
t; x
1 C C C A
12
As can be seen from this equation, an increase in some excitatory synapses to a C-cell causes an increase in the
795
total amount of excitatory inputs to the C-cell, and hence usually causes an increase in the inhibitory connection to the C-cell. The initial values of connections aC(t,j,x) and bC(t,x) are given in Appendix. We will now discuss how the self-organization of the Ccell layer (UC) progresses under this learning rule. To simplify the explanation, we assume here that the selforganization of the preceding S-cell layer (US) has already been finished. (In the computer simulation discussed later, however, we deal with a more general case in which the self-organization of S- and C-cells progresses simultaneously.) If a line stimulus sweeps across the input layer, S-cells whose preferred orientation matches the orientation of the line become active. The timings of becoming active, however, differ among S-cells. For the creation of shiftinvariant receptive fields of C-cells, it is desired that a single C-cell obtains strong excitatory connections from all of these S-cells in its connectable area. To prevent a redundant generation of C-cells that receive connections from the same set of S-cells, competition among postsynaptic C-cells that have receptive fields at nearly the same location is required. In this competition, however, the same C-cell has to continue to be a winner throughout the period when the line is sweeping across its receptive field. This condition can generally be satisfied by our learning rule, by which winners are determined based on traces of outputs of C-cells, and not on instantaneous outputs. If a C-cell once becomes a winner, the same Ccell will be apt to keep winning for a while after that because the trace of an output lasts for some time after the extinction of the output. Hence a larger number of S-cells will become connected to the C-cell. The larger the number of connected S-cells becomes, the larger the chance of winning becomes because the C-cell is excited repeatedly. The trace of the output of the C-cell can thus continue longer than the time constant of the decay, and the C-cell will finally acquire connections from all relevant S-cells. This means that the self-organization of C-cells can be successful even if the trace has a time constant short enough to disappear before a stimulus has swept over the entire receptive field of the C-cell. On the contrary, the time constant of the decay should not be too long. Suppose two completely different learning stimuli sweep across the receptive field of a C-cell sequentially. If the trace of the response to the first stimulus does not vanish until the appearance of the second stimulus, the same C-cell might erroneously be trained to respond to both stimuli. Incidentally, the size of connectable areas (AC) to postsynaptic C-cells is larger than the size of competition areas (DS) for presynaptic S-cells. Hence, at each location of the moving line stimulus, a number of presynaptic S-cells usually create connections to a single postsynaptic winner C-cell. It should be noted here that an increase in some particular excitatory connections to a C-cell, which usually causes an
796
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
Fig. 3. Responses of the network that has finished self-organization: (a) the stimulus line is optimally oriented to the C-cell marked with an arrow; (b) the same line as in (a) is presented at a different location; (c) the line is rotated to another orientation.
increase in the inhibitory connection from the VC-cell, has a similar effect on the C-cell’s response as the decrease of other excitatory connections. If an excitatory connection from a presynaptic S-cell is weak, the disynaptic signal through the inhibitory connection becomes stronger than the monosynaptic excitatory signal from the S-cell, and the S-cell come to work in an inhibitory manner to the Ccell. The increase in the inhibitory synapses thus sharpens the orientation selectivity of C-cells. A desired state after the learning is that a C-cell comes to receive excitatory connections from all S-cells of a particular preferred orientation, but not from S-cells of any other preferred orientations. Once a C-cell happens to have had strong excitatory connections from S-cells of two completely different preferred orientations, however, the increase of the inhibitory connections from the VC-cells alone is not strong enough to eliminate the unnecessary troublesome connections from S-cells of one of the two orientations. The decrease of excitatory connections to loser C-cells from presynaptic active S-cells, as well as the increase to winner C-cells, is important for preventing a single C-cell
from receiving connections simultaneously from S-cells of completely different preferred orientations. (Incidentally, in the initial state before the start of learning, each C-cell has connections, although very weak, from S-cells of all preferred orientations.) Suppose that a C-cell has had connections from S-cells of two different preferred orientations. If the C-cell once becomes a winner for a line stimulus of one of the two orientations, the input connections from Scells of that preferred orientation are strengthened. As a result, the excitatory effects from other set of S-cells, which have a preference to the second orientation, will be reduced by function of the increased inhibitory connection from the VC-cell. Hence, the C-cell now responds more strongly to lines of the first orientation, and responds more weakly, or ceases to respond, to lines of the second orientation. This increases the probability of the C-cell’s becoming a loser to the second stimulus. If the C-cell becomes a loser for lines of the second orientation, the connections from Scells of the second preferred orientation, which are now active, are reduced according to our learning rule. The Ccell thus increases the tendency to be a winner for the first orientation and to be a loser for the second orientation. Once an unbalance is produced in the strengths of the connections from S-cells of different preferred orientations, the unbalance is thus emphasized by our learning rule. Each C-cell will finally come to receive connections only from S-cells of one particular preferred orientation, and will acquire a single orientation tuning. As can be seen from Eq. (11), the amount of decrease of excitatory connections to a loser C-cell is proportional to the activity of the C-cell. This means that C-cells do not change their excitatory input connections when they are silent. This is useful to prevent the destruction of connections that have already been completed, when line stimuli of irrelevant orientations are presented.
4. Computer simulation A three-layered network is simulated on a computer. It has input layer U0, S-cell layer US, and C-cell layer UC, connected in a cascade as shown in Fig. 1. Layer U0 has 27 × 27 photoreceptors, layer US has 87 × 87 S-cells, and layer UC has 31 × 31 C-cells. The cells are arranged in a rectangular array in each layer with a constant pitch. The pitches are different in the three layers: they are 1.0 in U0, 0.25 in US, and 0.5 in UC. The cell density is thus highest in US and lowest in U0. As was discussed in Section 2, each cell in US and UC receives variable input connections from the cells within a small connectable area in the preceding layer. The locations of the connectable areas of the cells of a layer are retinotopically ordered. The location of each cell in layers US and UC is expressed by the location of the center of its receptive fields on layer U0. The sizes of layers differ among the three. If the connect-
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
797
Fig. 4. Typical orientation-tuning curves for S- and C-cells.
able area of a cell near the periphery of a layer exceeds the boundary of the preceding layer, the connections from outside of the boundary can never be used, and self-organization of the cells might not progress properly. To avoid such a peripheral effect, a higher layer is designed to be smaller in size than a lower layer in the simulation. The sizes of the connectable areas and the competition areas for US and UC are AS 3:75, DS 0:4AS , AC 3:75 and DC 0:5AC , where the unit of length is taken as the pitch of the cells in U0. Training patterns are moving lines of various orientations. The width of the lines used for the simulation is 1.4. Each photoreceptor has a square face of 1 × 1 in size, and its output is proportional to the degree of overlap between the stimulus line and the photoreceptor face. The lines are long enough to cover the input layer completely. The training set has lines of 16 different orientations with an interval of 11.258. From this set of lines of various orientations, a line is randomly chosen and moved across the input layer at a constant speed to a direction perpendicular to the orientation of the line. Therefore, there are 32 moving directions, because each line can move in two opposite directions. The moving speed of a line is 0.3. In other words, the location of a line is shifted by 0.3 at each discrete time unit. The decay of the trace of C-cells at unit discrete time (Eq. (9)) is taken as aC 0:1. After finishing a single sweep of a line, another line of different orientation is swept across the input layer. The sweeps of lines are repeated many times during the learning phase. The self-organization of S- and C-cells progresses simultaneously. Fig. 3 shows responses of the network that has finished self-organization. The total number of line sweeps during the learning phase is 512 in this example. The responses of photoreceptors of U0, S-cells of US and C-cells of UC are displayed in the figure by the size of the small dots (filled squares). If a line is presented to a location on the input layer as shown in Fig. 3(a), a number of S-cells respond to the line, and consequently C-cells that receive excitatory connec-
tions from these S-cells become active. Let us watch, for example, an active C-cell marked with an arrow in layer UC. The circles in US and U0 show the connectable area and the effective receptive field 2 of the C-cell, respectively. When comparing the sizes of these areas, note that layers US and UC, that have a larger number of cells, are displayed in this figure on a larger scale than layer U0. In fact, layer U0 is the largest in size, and UC is the smallest, if the sizes are compared based on the locations of receptive fields. If the line shifts to a new location as shown in Fig. 3(b), other S-cells become active because S-cells are sensitive to stimulus location. In layer UC, however, several C-cells that were active in Fig. 3(a) are still active for this shifted stimulus. For example, the C-cell marked with an arrow continues responding. This shows that the C-cell exhibits a shift-invariance within the receptive field. When the line is rotated to another orientation as shown in Fig. 3(c), this C-cell, which is marked with an arrow, is of course silent (the location of the cell is blank in the display of this figure) even if the line is presented in its receptive field. We can also see that the S- and C-cells that respond to this rotated line do not overlap the cells that are active in Fig. 3(a) or (b). In order to observe orientation tuning and shift-invariance of the cells more precisely, we collect the statistics of the response of the cells. A long line of the same width as the one used for the training is presented to the network. The line is rotated over a 1808 range of orientation in steps of 11.258, and is swept across the input layer. In a sweep, the line is shifted by a distance of 0.3 at each discrete time, and the response to the line at each location is counted. (The
2 Here we have supposed that the size of the effective receptive field of a C-cell is the same as the size of the connectable area of the C-cell, assuming that each S-cell responds to stimuli presented at the center of its receptive field, and not at the periphery. The actual receptive field, however, is slightly larger than the circle drawn in the figure, because some S-cells respond to stimuli presented not only at the receptive field centers exactly, but also around the centers.
798
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
Fig. 7. Typical responses of S- and C-cells to the shift of an optimally oriented line. The responses are taken from the same cells as in Fig. 4. Fig. 5. Distribution of the sharpness of the orientation-tuning curves of the S- and C-cells.
moving speed and direction of the line have no sense in this test, because these values only influence the learning.) Fig. 4 shows typical orientation tuning curves for S- and C-cells. The ordinate of the curve is the total response of the cell during a sweep of the line in each orientation. Incidentally, the curve for the C-cell in this figure is taken from the C-cell marked with the arrow in Fig. 3. We have checked tuning curves of all cells and confirmed that all cells have
only one peak in their orientation-tuning curves. In other words, no cell has a double-peaked tuning curve. To estimate the sharpness of the orientation-tuning curves, we count how many different orientations each cell responded to, and use this number as the measure of orientation tuning of the cell. The histograms of Fig. 5 show the number of cells for each width of orientation-tuning curve. The width distributes between 2 and 4 (2.37 on average) for S-cells, and between 2 and 5 (2.57 on average) for C-cells. Since the test line is presented in steps of 11.258, we
Fig. 6. Preferred orientations of the cells in layers US and UC.
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
Fig. 8. Distribution of the shift tolerance of the cells.
roughly estimate that the full-width of the orientation-tuning curves of a cell is 11.258 times the number of responded orientations. If we assume that the half-width of the tuning curve is 0.5 times the full responding-ranges, S-cells have a mean half-width of 2.37 × 11.258 × 0.5 13.38, and C-cells have 2.57 × 11.258 × 0.5 14.58. Fig. 6 shows the preferred orientations of the cells in layers US and UC. The preferred orientation of each cell is displayed by the orientation of the line in the figure. The preferred orientations are distributed over all orientations. Fig. 7 shows typical responses of S- and C-cells to an optimally oriented line moved over the receptive fields. The responses are taken from the same cells as in Fig. 4. It can be seen from the figure, that the C-cell responds to the line within a wide range of locations. In other words, the Ccell has a shift-invariant receptive field. We measure the breadth of the range in which each cell keeps responding when an optimally oriented line moves over the receptive field, and use this as the measure of the shift tolerance of the cell. Fig. 8 shows the distributions of the shift tolerance of the S- and C-cells. We can see that Ccells have wide responding ranges comparable to the size of the receptive fields. We have thus observed that C-cells have shift-invariant, orientation-sensitive receptive fields. In the above simulation, there was a total of 512 line sweeps during the learning. The self-organization of the network, however, is almost completed with fewer sweeps. Actually, even after 64 sweeps (that is, two sweeps for each direction of line movement), the network shows almost the same response as that after 512 sweeps, except for a very small number of premature Ccells. We have also tested 1024 sweeps, but no effective change has been seen in the response of the network.
5. Discussion We have proposed a new learning rule by which cells with shift-invariant receptive fields are self-organized. During the learning, straight lines of various orientations
799
sweep across the input layer. With the proposed learning rule, cells similar to simple and complex cells in the primary visual cortex are generated in a network. To demonstrate the new learning rule, we have simulated a three-layered network, that consists of an input layer (or the retina), a layer of S-cells (or simple cells) and a layer of C-cells (or complex cells). The layer of S-cells and the layer of C-cells have the same network architecture. Only the learning rules differ between the two layers. Both S- and C-cells are created through competition. Although S-cells compete depending on their instantaneous outputs, C-cells compete depending on the traces (or temporal averages) of their outputs. For the self-organization of S-cells, only winner S-cells increase their input connections (LTP) in a similar way to that for the neocognitron. For the self-organization of C-cells, however, loser C-cells decrease their input connections (LTD), while winners increase their input connections (LTP). Both S- and C-cells are accompanied by inhibitory cells. Modification of inhibitory connections together with excitatory connections is important for creation of C-cells as well as S-cells. The author proposed previously a preliminary version of the learning rule for shift-invariant receptive fields (Fukushima & Yoshimoto, 1998). In the old version, the network did not have inhibitory VC-cells. The learning rule proposed in this paper has been greatly simplified from the old version by the introduction of inhibitory VC-cells. This paper has shown that a better performance can be obtained with a simpler learning rule, if inhibitory VCcells and modifiable inhibitory connections from them are introduced into the network for C-cells. In the computer simulation discussed in this paper, the training patterns have been moving lines that sweep across the input layer. In a real world situation, however, animals usually see many other objects than straight lines. If the receptive fields of the S- and C-cells are small enough to extract only local features of the objects, these cells will have a larger probability of observing line components rather than other features. To test under this situation, we made a preliminary simulation, in which X-shaped patterns moving up and down are added to the conventional line stimuli. Orientation tuning of C-cells became slightly broader. Although some C-cells came to have double peaks in their orientation tuning curves, the number was very few (only one or two C-cells in the entire network). Incidentally, the decay of the trace in our simulation (aC 0:1) has already been adjusted roughly to satisfy the self-organization when X-shaped moving patterns are included in the training pattern set. If the training pattern set consists of only moving lines, a smaller decay (i.e. a longer time constant) of the trace is preferable because it can provide a faster learning speed. When the training pattern set contains X-shaped patterns, however, a smaller decay generates a larger number of double-peaked C-cells in the network and causes an unsatisfactory result. We need more detailed experiments in the future, but this
800
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
preliminary simulation suggests us that our learning rule works successfully even if training set contains not only lines but also some other patterns. The use of traces of outputs for self-organization of shiftinvariant receptive fields has been proposed previously by (Fo¨ldia´k, 1991; Oram & Fo¨ldia´k, 1996). He proposed a modified Hebbian rule (called trace-Hebbian), in which the amount of modification is proportional to the instantaneous presynaptic activity and the trace of the postsynaptic activity. He also proposed to introduce a kind of competition among cells, which differs from that used in our learning rule. (The trace of a postsynaptic activity in his modified Hebbian rule is not a trace of the actual postsynaptic activity. The traces are calculated, not from the actual activities, but from the results of competition among postsynaptic activities. In other words, only a winner can leave a trace.) In his simulation, lines of only four different orientations are used for the learning and testing of the network, and he assumes that orientation selectivity of each “simple unit”, which corresponds to our S-cell, is sharp enough to respond to lines of only one orientation. He has not tested whether the self-organization is successful when lines of a variety of orientations are presented in the learning, if each simple unit has an orientation-tuning curve broad enough to respond to lines of two or more adjacent orientations. In a real-world situation where lines of a variety of orientations (not limited to only four) are presented, however, only the use of postsynaptic competition based on the traces is not enough for a successful self-organization. The postsynaptic competition alone cannot prevent a single C-cell from receiving connections simultaneously from S-cells of completely different preferred orientations. The inhibitory VC cells introduced in our network play an important role in sharpening the orientation selectivity and preventing the generation of such troublesome C-cells. Barrow and Bray (1992) proposed to use the trace of the presynaptic, not the postsynaptic, activity. Their learning rule also has some difficulties. It works satisfactory if the number of postsynaptic cells in a competition area is equal to the number of different orientations of the training stimuli. If the number of the cells is larger than the number of the stimulus orientations, however, postsynaptic cells fail to acquire a full shiftinvariance. Each cell comes to respond only to stimuli presented in a certain restricted portion of the receptive field. The three-layered network used for our simulation has a hierarchical architecture similar to the classical hypothesis by Hubel and Wiesel (1962). Hubel and Wiesel have hypothesized that rows of LGN cells are pooled to drive oriented simple cells, which are pooled in turn to drive complex cells. The shift invariant receptive field of a complex cell is assumed to be created by excitatory input signals coming from a group of simple cells with receptive fields of the same preferred orientation but at different locations. The pure hierarchical model, however, has since been challenged by experimental results indicating that many complex cells receive excitatory monosynaptic input from LGN cells (Toyama, Maekawa, & Takeda, 1973; Stone &
Dreher, 1973) or do not depend on simple cell input (Movshon, 1975; Hammond & MacKay, 1977). Nevertheless, alternative accounts of shift-invariant receptive fields of complex cells still remain scant. On the contrary, there are recent physiological experiments supporting the hierarchical model. For example, Alonso and Martinez (1998) report that there is a monosynaptic connection from simple cell to complex cell of similar orientation preferences. This paper does not necessarily insist that S-cells directly correspond to simple cells and C-cells to complex cells. The author believes, however, that the learning rule proposed in this paper represents an essence of the mechanism of generating shift-invariant receptive fields, and that the same rule will successfully work for the self-organization of shiftinvariant receptive fields even in neural networks of different architectures. Some complex cells respond to lines or edges of both contrast polarities (i.e. light and dark stimuli) (e.g. Hubel & Wiesel, 1968). 3 The C-cells in our model does not necessarily acquire this characteristic by self-organization. This problem will be solved, however, by our learning rule, if a layer of contrast-extracting cells, which have concentric on- and offcenter receptive fields, is introduced between layers U0 and US. A line stimulus with some width has edges on the both sides, and the two edges have opposite contrast polarities. If line stimuli of a variety of widths are presented, edge-extracting S-cells, as well as line-extracting S-cells, will be generated in the network. Since edges of opposite polarities appear always in a pair, C-cells will have excitatory connections simultaneously from S-cells extracting edges of opposite polarities. We have demonstrated that our learning rule works successfully for a three-layered network. A future problem is to apply the proposed learning rule to self-organization of a multilayered network and construct a neocognitron-like network, completely by self-organization.
Acknowledgements The author gratefully acknowledges the large contributions of his student Kazuya Yoshimoto and his colleague Hayaru Shouno, both at Osaka University, to this work. This work was supported in part by Grants-in-Aids #09308010 and #10164231 for Scientific Research from the Ministry of Education, Science, Sports and Culture of Japan. Appendix. Lateral inhibition Eqs. (1) and (4) represent the response of the cells neglecting the effect of lateral inhibition. Actual outputs 3 Some investigators (e.g. Pollen & Ronner, 1983) suggest that a complex cell convey information about a spatial frequency amplitude, neglecting the phase, over the receptive field (energy model). This may be a conclusion derived from the fact that complex cells respond to stimuli of both contrast polarities.
Kunihiko Fukushima / Neural Networks 12 (1999) 791–801
including lateral inhibition are calculated by the following process in the simulation. Let the right side of Eq. (1) be u 0S
t; x. This is a temporary output of the S-cell, in which the effect of lateral inhibition is ignored. The final output uS
t; x including lateral inhibition is then calculated from this temporary output. For the sake of economy of computation time, an iterative calculation for the recurrent lateral inhibition is limited to a single step in the simulation, because this approximation does not seem to affect the result of simulation qualitatively. 2 3 X cLS
j 2 x u 0S
t; j5
13 uS
t; x w4u 0S
t; x 2 uj 2 xu#ALS
where cLS
j, which represents the strength of lateral inhibition, is a bell-shaped two-dimensional function similar to cC
j. ALS is the radius of the connections of lateral inhibition, and ALS DS in the simulation. The output of a C-cell is also calculated with the same process. Let the right side of Eq. (4) be u 0C
t; x, which is the response of the C-cell without lateral inhibition. The final output of the C-cell including lateral inhibition is then calculated from this temporary output. 2 3 X cLC
j 2 x u 0C
t; j5
14 uC
t; x w4u 0C
t; x 2 uj 2 xu#ALC
where cLC
j represents the strength of lateral inhibition, and ALC is the radius of the connections of lateral inhibition. We take ALC DC in the simulation. Appendix. Initial connections The initial values of connection aS
t; j; x is given by ( q0S cS
j 2 x if uj 2 xu # A0S ;
15 aS
0; j; x 0 otherwise where q0S is a small positive constant. A0S is a positive constant smaller than radius AS of the connectable area. Since the initial connection aS
0; j; x has thus positive values only in the central part of the receptive field, Scells in the initial state are insensitive to lines that pass only peripheral parts of the receptive fields. As a result, Scells show a large tendency to be organized to respond to lines passing through central areas of their receptive fields. The inhibitory connection bS(0,x), at the initial state, takes a smaller value than that determined from Eq. (8) and Eq. (15). That is, s X cS
j 2 x
16 bS
0; x q 0S0 uj 2 xu#A0S
where q 0S0 is a positive constant smaller than qS0. This means that the strength of the initial inhibitory connection bS(0,x) is less than the value calculated from Eq. (8) by the factor of q 0S0 /qS0 . The reduction of inhibitory connection at the initial
801
state is necessary when threshold u S of S-cells is large. Otherwise, no S-cell may respond in the initial state because the shape of the initial excitatory connections determined by Eq. (15) is different from the shape of the line stimuli used for the learning. The initial values of connections to C-cells are aC
0; j; x q0C cC
j 2 x
17
bC
0; x 0
18
where q0C is a small positive constant. References Alonso, J. M., & Martinez, L. M. (1998). Functional connectivity between simple and complex cells in cat striate cortex. Nature Neuroscience, 1 (5), 395–403. Barrow, H. G., & Bray, A. J. (1992). A model of adaptive development of complex cortical cells. In I. Alexander & J. G. Taylor (Eds.), Artificial Neural Networks, 2: Proceedings of ICANN-92, (pp. 881). Amsterdam: Elsevier. Fo¨ldia´k, P. (1991). Learning invariance from transformation sequences. Neural Computation, 3, 194–200. Fukushima, K. (1988). Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Networks, 1 (2), 119–130. Fukushima, K. (1989). Analysis of the process of visual pattern recognition by the neocognitron. Neural Networks, 2 (6), 413–420. Fukushima, K., & Miyake, S. (1982). Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15 (6), 455–469. Fukushima, K., Nagahara, K., Shouno, H., & Okada, M. (1996). Training neocognitron to recognize handwritten digits in the real world. WCNN’96 (World Congress on Neural Networks, San Diego, CA), (pp. 21). INNS Press. Fukushima, K., & Yoshimoto, K. (1998). Self-organization of shift-invariant receptive fields through pre- and post-synaptic competition. In: L. Niklasson, M. Bode´n, & T. Ziemke (Eds.), ICANN’98 (International Conference on Artificial Neural Networks, Sko¨vde, Sweden), 2, pp. 955–960. Hammond, P., & MacKay, D. M. (1977). Differential responsiveness of simple and complex cells in cat striate cortex to visual texture. Experiments in Brain Research, 30 (2/3), 275–296. Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cats visual cortex. Journal of Physiology, 160, 106–154. Hubel, D. H., & Wiesel, T. N. (1965). Receptive fields and functional architecture in nonstriate areas (18 and 19) of the cat. Journal of Neurophysiology, 28 (2), 229–289. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195 (1), 215–243. Movshon, J. A. (1975). The velocity tuning of single units in cat striate cortex. Journal of Physiology, 249 (3), 445–468. Oram, M. W., & Fo¨ldia´k, P. (1996). Learning generalization and localization: competition for stimulus type and receptive field. Neurocomputing, 11 (2/4), 297–321. Pollen, D. A., & Ronner, S. F. (1983). Visual cortical neurons as localized spatial frequency filters. IEEE Transactions on Systems, Man and Cybernetics, 13 (5), 907–916. Stone, J., & Dreher, B. (1973). Projection of X- and Y-cells of the cat’s lateral geniculate nucleus to area 17 and 18 of visual cortex. Journal of Neurophysiology, 36 (3), 551–567. Toyama, K., Maekawa, K., & Takeda, T. (1973). An analysis of neuronal circuitry for two types of visual cortical neurones classified on the basis of their response to phasic stimuli. Brain Research, 61, 395–399.