Chapter 1 Pulse-Coupled Neural Networks J. L. Johnson H. Ranganath G. Kuntimad H. J. Caulfield ABSTRACT A pulse-coupled neural network using the Eckhorn linking field coupling [1] is shown to contain invariant spatial information in the phase structure of the output pulse trains. The time domain signals are directly related to the intensity histogram of an input spatial distribution and have complex phase factors that specify the spatial location of the histogram elements. Two time scales are identified. On the fast time scale the linking produces dynamic, quasi-periodic, fringe-like traveling waves [2] that can carry information beyond the physical limits of the receptive fields. These waves contain the morphological connectivity structure of image elements. The slow time scale is set by the pulse generator, and on that scale the image is segmented into multineuron time-synchronous groups. These groups act as giant neurons, firing together, and by the same linking field mechanism as for the linking waves can form quasi-periodic pulse structures whose relative phases encode the location of the groups with respect to one another. These time signals are a unique, object-specific, and roughly invariant time signature for their corresponding input spatial image or distribution [3]. The details of the model are discussed, giving the basic Eckhorn linking field, extensions, generation of time series in the limit of very weak linking, invariances from the symmetries of the receptive fields, time scales, waves, and signatures. Multirule logical systems are shown to exist on single neurons. Adaptation is discussed. The pulse-coupled nets axe compatible with standard nonpulsed adaptive nets rather than competitive with them in the sense that any learning law can be used. Their temporal nature results in adaptive associations in time as well as over space, and they are similar to the time-sequence learning models of Reiss and Taylor [4]. Hardware implementations, optical and electronic, aie reviewed. Segmentation, object identification, and location methods are discussed and current results given. The conjugate basic problem of transforming a time signal into a spatial distribution, comparable in importance to the transformation of a spatial distribution into a time signal, is discussed. It maps the invariant time sig-
Johnson, Ranganath, Kuntimad, and Caulfield nature into a phase versus frequency spatial distribution and is the spatial representation of the complex histogram. A method of generating this map is discussed. Image pattern recognition using this network is shown to have the power of syntactical pattern recognition and the simplicity of statistical pattern recognition.
1
Introduction
The linking field model of Eckhorn et al. [1] was proposed as a minimal model to explain the experimentally observed synchronous featuredependent activity of neural assemblies over large cortical distances in the cat cortex [5]. It is a cortical model. It emphasizes synchronizations of oscillatory spindles that occur in the limit of strong linking fields and distinguishes two major types: (1) forced, or stimulus-locked, synchronous activity and (2) induced synchronous activity . Forced activity is produced by abrupt temporal changes such a^s movement. Induced activity occurs when the pulse train structure of the outputs of groups of cells are similar [6]. The model is called "linking field" because it uses a secondary receptive field's input to modulate a primary receptive field's input by multiplication in order to obtain the necessary coupling that links the pulse activity into synchronicity. This paper is concerned with the behavior of the linking field model in the limit of weak-to-moderate linking strengths [2],[7]. Strong linking is characterized by synchronous bursts of pulses. When the linking strength is reduced, the neurons no longer fire in bursts but still have a high degree of phase and frequency locking. This is the regime of moderate linking strength. Further reduction continuously lowers the degree of linking to a situation where locking can occur only for small phase and frequency differences. This is the weak linking regime. A major result of this research is the finding that in the weak linking regime it is possible to encode spatial input distributions into corresponding temporal patterns with enough structure to have object-specific time series for each input pattern. The pulse phase patterns in the time series are often found to be periodic. In both simulations and in an optical hybrid laboratory demonstration system, periodicity is observed to be the rule rather than the exception. The time series can be made insensitive to translation, rotation, and scale changes of the input image disrtibution by an appropriate choice of the structure of the receptive field weight patterns. Substantial insensitivity against scene illumination and image distortion has also been observed in simulations.
1. P u l s e - C o u p l e d Neural N e t w o r k s
3
Linking
Inputs from other neurons
Threshold
1 + PjLj
Inputs from other neurons
^y^0—*. Step Function
DENDRITIC TREE
LINKING
Output to other neurons
PULSE GENERATOR
FIGURE 1. The model neuron. The model neuron has three parts: The dendritic tree, the linking, and the pulse generator. The dendritic tree is subdivided into two channels, linking and feeding. All synapses are leaky integrator connections. The inputs are pulses from other neurons and the output is a pulse. The linking input modulates the feeding input. When a pulse occurs in the linking input it briefly raises the total internal activity Uj and can cause the model neuron to fire at that time, thus synchronizing it with the neuron transmitting the linking pulse. (Reprinted with permission from [1]).
2
Basic Model
This section reviews t h e basic model as discussed in Eckhorn et al. [1], [5], [6], [8], [9], a n d [10]. T h e model neuron is a neuromime [11], modified w i t h two receptive fields per neuron a n d a linking mechanism added. It is shown in Figure 1. T h e r e are t h r e e p a r t s t o t h e model neuron: t h e dendritic tree, t h e linking modulation, and t h e pulse generator. Each p a r t will b e described separately, a n d t h e n t h e operation of t h e complete model will b e discussed.
2.1
The Dendritic Tree
T h e dendritic tree is divided into two principal branches in order t o m a k e two distinct inputs t o t h e linking p a r t of t h e j t h neuron. T h e y are t h e prim a r y i n p u t , t e r m e d t h e feeding i n p u t F j , a n d t h e secondary i n p u t , which is the linking input Lj. These are given in equations 1 a n d 2, respectively, for the case of continuous time. For discrete t i m e steps, t h e digital filter
4
Johnson, Ranganath, Kuntimad, and Caulfield
model is used, as given in the appendix of Eckhorn et al. [1]. (The simulations reported here used the discrete model. The equations are given in Section 9.) Each input is a weighted sum from the synaptic connections on its dendritic branch. The synapses themselves are modeled as leaky integrators. An electrical version of a leaky integrator is a capacitor and a resistor in parallel, charged by a brief voltage pulse and decaying exponentially. Likewise, when a synapse receives a pulse, it is charged, and its output amplitude rises steeply. The amount of rise depends on the amplitude gain factor assigned to the synapse. It then decays exponentially according to its time constant. These postsynaptic signals are summed to form the total signal out of that branch of the dendritic tree, as indicated in Figure 1. The amplitude gain factors and the decay time constants of the synapses characterize the signals. The synapses in the feeding branch are assumed [1] to have smaller time constants than those of the linking branch. This assumption lets the feeding signal have a long decay tail on which the spikelike linking input can operate through the linking modulation process. The linking and feeding inputs are given by
Li
=
$]i*,=^(W^*,e-<')*nW, k
Fj
=
'£F,j k
(1)
k
= '£{Mkje-<')*Yk{t)
+ Ij,
(2)
k
where Wkj and Mkj are the synaptic gain strengths, or weights, for the fcth synapse of the linking and feeding receptive fields, respectively, to the j t h neuron. Yk{t) is the input pulse, or pulse train, from the fcth neuron; a^j and a^j are the time constants; and / l * / 2 denotes the convolution integral operation for any two functions / I and / 2 . Note that both the feeding and linking fields can recieve inputs from the A:th neuron. Ij is an analog feeding input to the jth neuron. It is shown here as a distinct single term but in general can be a weighted sum like the pulsed inputs. If the inputs Yk{t) are allowed to be arbitrary functions of time, then Ij can be included in the weighted sum over the F's as a step function in time Step{t — to). Each neuron thus has two receptive fields, linking and feeding. Both fields are dendritic tree structures and can overlay the same areas around the neuron. However, their weighted sums enter the neuron via distinct channels and are combined internally by the linking, as discussed below.
2.2
The Linking
The linking modulation (see Figure 1) is obtained by adding a constant positive bias to the linking input and multiplying that by the feeding input. The bias is taken to be unity. This bias has many uses as we will see, and
1. Pulse-Coupled Neural Networks
5
one of them is obvious. The hnking input cannot drive the internal activity to zero. The total internal activity Uj of the neuron is Uj=FJil
+ 0jLj)
(3)
where f3j is the linking strength. For convenience, it is broken out separately here, but strictly speaking, it could be incorporated in the synaptic weights. Uj is a function of time. Under the above assumption that the feeding input has a smaller time constant than that of the linking input, the general behavior of Uj is that the linking inputs appear as spike-like modulations riding on a quasi-constant carrier formed by the feeding input. The internal activity Uj thus is briefly raised above the feeding input level whenever a linking input occurs (Figure 2), and it can then trigger the neuron to fire. This effect is responsible for the synchronous activity found in the network as a whole. Equation 3 also establishes a correspondence between the linking field model and higher-order networks. If equations 1 and 2 are inserted into equation 3, there will be product terms of the form MkjWijYkYi within a double sum. This is a second-order network [12]. This implies that if a pulse output model rather than an average firing rate output model is used in higher-order nets, time-synchronous behavior should be observed. The work on adaptive higher-order nets [13] may be applicable to adaptation in pulse-coupled nets as well.
2.3
The Pulse Generator
The pulse generator uses a threshold discriminator followed by a pulse former, and a variable threshold that is dependent upon the prior pulse output of the generator itself. When the neuron emits a pulse, a portion of it feeds back to the threshold, which is yet another leaky integrator, as shown in Figure 1. One or more output pulses recharge the threshold to a high level. This quickly raises it above the current value of the internal activity Uj, which in turn causes the threshold discriminator to turn off the pulse former, and the neuron stops emitting pulses. The recharged threshold then decays exponentially according to its time constant and amplitude gain factor until it drops below the internal activity again, triggering a new output pulse or pulse burst from the neuron (Figure 2). This is the pulse generator model illustrated in Eckhorn et al [1], [8], [9], and [10], and given analytically in the appendix of [3]. One important result of the model is that under constant stimulation, the pulse former produces a train of uniformly spaced pulses. The spacing represents the refractory period r^ of the neuron within which time a new pulse cannot occur. This will give an upper saturation limit to the maximum output pulse frequency. The pulse generator is modeled by a leaky integrator threshold 0j (equation (4)), a threshold discriminator in the form of a sigmoidal envelope (equation (5)),
6
Johnson, Ranganath, Kuntimad, and Caulfield
Pulse Period Tj
r*
^ Threshold 9,
t
Pulse Capture
Output Pulse Yj Output I
FIGURE 2. Pulse generation and linking. The threshold is recharged when it decays below the internal activity Uj = Fj{l-\-PjLj). The output pulse is formed as the threshold turns the step function of equation (5) on and then off as the threshold goes below Uj, starts recharging, and then rises above Uj. If a linking pulse occurs in the capture zone time, it causes the threshold to recharge sooner than otherwise, and the neuron fires a pulse synchronized with the arrival of the linking pulse. (Reprinted with permission from [3]). and a pulse former (equations 6 and 8): Oj = Yj{t)
=
^ ( 0
—
{VTe-^')*Yj{t)-^0o, {Sig{Uj{t)-0j{t))P{t))*e-r^), /]pulse{t
—
UTr),
(4) (5) (6)
where Sig{z) is a hyperbolic tangent sigmoidal envelope for the pulse train P{t) out of the pulse former. The sigmoid function and the pulse function pulse{t — uTr) are Sig{z)
=
— UTr)
=
1 1 + e-^^ K
pulse{t
{S{t' - riTr) - S{t' - UTr - T^))dt'.
(7) (8)
J —o
Equation (8) defines a square pulse of height K whose leading and trailing edges are formed by two delta functions separated by width r^ . It has a
1. Pulse-Coupled Neural Networks
7
constant area of K.VT and aj are the amplitude gain and the time constant of the leaky integrator threshold, and OQ is a threshold offset. A is the scale of the sigmoid argument, and aJ is the time constant for the convolution of equation (5). The number n refers to the pulse number. In order to have a good dynamic range of pulse periods it is desirable to require ajrr < 1.
(9)
The system of equations (4)-(8) exphcitly shows the causality in the pulse generator and that the pulses are finite. Now idealize it. First let r^j go to zero. This makes P{t) into a train of delta function pulses. Perform the convolution of equation (5) and take the limit of both aJ and K going to infinity, in such a way that their ratio is constant, to obtain yet another delta function limit, and finally, take A approaching infinity to get a single equation for Yj that replaces equations (5)-(8): Yj{t) = Yl^{t-
nrr)Step{Uj{t)
- Ojit)).
(10)
n
Step{ ) is defined as 1 when its argument is positive, and 0 otherwise. Equations (10) and (4) are the ideahzed pulse generator. Its input has a lower limit of ^o- An upper limit can be established by asking for the largest value of the input that will just barely recharge the threshold back to that level in a decay time Tr, the minimum pulse period. Equation (4) gives f / e - ^ ^ ^ + y / > f/, from which U<—p^^^Uma.^^ 1 —e
J
(11)
^j'T
under the dynamic range requirement of equation (9). Figure 3 summarizes the properties of the pulse generator. There, the digital filter form (equation (29)) of the time convolutions was used.
2.4
Pulse Periods
The firing rate of a single neuron is a sigmoidal function of the feeding input. This is shown by obtaining the pulse period TJ of a neuron. It is the time required for the threshold to decay from its recharged initial height down to the internal activity level (Figure 2). Consider equation (4) when the threshold is recharged with a single pulse by an amount VT- For a constant feeding input F and a zero linking input, the decay time back down to F is
Johnson, Ranganath, Kuntimad, and Caulfield threshold
1w
J Vj 1 ky
e cJecay loop -1 ^^ +1 ^^ ky
r\
A ^^ ^^
sigmoid enve lope (Jecay loop
imi
'1 1
pulse former
i
I
[
The pulse generator. The low-pass filter decay loops correspond to the time constants in the convolution integrals.
FIGURE 3. The pulse generator. The internal activity U feeds a sigmoidal envelope. When U > 6 the envelope becomes high, allowing the pulse former to make an output of uniformly spaced pulses. These are the cell's output. The envelope and pulse former are in a decay loop with a large time constant. This loop ensures causality, i.e., it gives a small time delay between the pulse output and the recharging of the threshold (upper feedback loop). The threshold is another leaky integrator, recharged by the pulse output. An idealization (see text) reduces the sigmoidal envelope to a step function and makes the pulse former's output into a train of delta function pulses. (Reprinted with permission from [26]). The refractory period is added to the decay time to obtain the total pulse period. The pulse firing rate fj is then fj =
{rj+TrY
(13)
As shown in Figure 4, it is a sigmoid function [14]: it increases more slowly than linear up to ^o, then rises quickly (this is the center of the "S" shape), and finally goes to saturation. Its monotonically increasing behavior shows that the original input feeding distribution can be approximately recovered at any time by taking an average over many pulse periods, because the pulse frequency is faster for stronger (more intense) feeding inputs. The sigmoidal nonlinearity will cut off values below 6o and act as a squashing function near saturation, so the overall function is a sigmoidal mapping of the internal activity to the output when pulse-averaging is done. When linking pulses are present, their strength defines a capture zone in the neuron receiving the linking pulse. From Figure 2, the capture zone
1. P u l s e - C o u p l e d Neural N e t w o r k s
>-
o
INTERNAL ACTIVITY
Uj
FIGURE 4. Pulse frequency fj as a function of the internal activity Uj. The pulse frequency is a sigmoidal function of the internal activity. Addition of a refractory time period Tr makes the frequency saturate at the refractory frequency. A bias offset ^0 will shift the curve's origin to that bias point. (Reprinted with permission from [3]). time interval is •PL), (14) Tc = - ^ l n ( l -h F-Oi a^ where /? is t h e linking strength. If a linking pulse is received in this interval, it will briefly raise t h e internal activity level and cause t h e receiving neuron to fire at t h e arrival time of t h e linking pulse (Figure 2). T h e receiving neuron will frequency lock to t h e t r a n s m i t t i n g neuron if their pulse r a t e s Ti and T2 are similar. If t h e neurons have t h e same frequency (ri = T2), they will phase lock when their phase difference
1^2 - T i l < Tc, \(t>\ < OL^Tc.
(15)
T h e r e is also a forbidden zone immediately after each linking pulse. For a^ much greater t h a n a ^ , t h e length of t h e forbidden zone is equal t o t h a t of t h e c a p t u r e zone (see Figure 2). This completes t h e description of t h e basic model neuron. T h e threshold time constants used by Eckhorn are intermediate in value between t h e linking a n d feeding time constants. T h e pulse-coupled linking field model
10
Johnson, Ranganath, Kuntimad, and Caulfield
contains synaptic weights but does not require any learning law. On the other hand, any learning law can be used. The frequency function of equation (12) gives the desired nonlinear response in the limit of averaging over many pulses, so this model reduces to the usual nonpulsed networks in that limit. It has the weighted interconnects, the internal sums, and the sigmoid nonlinearity. The simple pulse generator used in simulations by Eckhorn and others [1] corresponds to a two-cell oscillator [15], [16] where the threshold acts as an inhibitor cell with a slow response and the step function as an excitatory cell with a fast response. The three parts of the model (the dendritic tree, the linking, and the pulse generator) act together to weight and sum inputs in the receptive fields, modulate one input channel with a second input channel, and form the output pulses, which in turn are received by other neurons through their receptive fields. In the remainder of this paper the same threshold time constant a^ will be used for all neurons, the same linking time constant a^ used for all linking fields, the same feeding time constant a^ used for all feeding fields, and the same linking strength /? used for all neurons unless otherwise stated. The subscript j will be suppressed except where necessary.
3
Multiple Pulses
Suppose that at time zero a cell receives linking pulses from N other cells, all arriving at the same time, and that a single firing is inadequate to raise its threshold above the composite linking pulse. It will continue to fire until it exceeds the linking pulse height, as shown in Figure 5(a). Let M be the number of pulses required. For simplicity take ^ = F at ^ = 0. Then from equations (1), (3), and (4), r{M-l)Tr ^0
^-1 m=0
>F(l4-iV/3e-^^^''(^-^>), where M — 1 has been used because the time interval for the cell to fire M times is (M — l)r^. The left-hand side yields a finite sum of exponential decays. Expressing this in closed form leads to the result that 1 -_
^^(
^-aTTriM-l)
l-e-"Tr.
) ^ i^^iVe-«^^^(^-i).
(16)
This gives M in terms of A^. If M is small enough so that all the exponentials can be expanded (see condition of equation (9)), then M is approximately given by
1. Pulse-Coupled Neural Networks
11
M^
^-^ . VT + aLTr0FN But it is not that simple. Suppose that the N pulses came from the same group containing the cell and they all had the same feeding input F. Then every cell in the group must send M pulses to the others. The situation, shown in Figure 5(b), is that each cell receives A^ pulses at a time, N being the number of cells in the group, for M times, with a separation of Tr between times. The cells must pulse their way over a much larger linking pulse than in the previous case. Let M' be the number of pulses required. The linking pulse is now, at t = {M' — l)rr, ^'-}
1 _ pOCLTrM'
m=0
Applying the condition that the threshold must be greater than this gives, after some rearrangement. l_e-«^^-^' 1 _ ^-OCLTrM'
>-
PFN
A-e
y^ • ( T 3 7 ^ ^ ) -
(17)
Unfortunately, since as shown in Figure 5(b) this condition depends on the gradual saturation of the envelope of the linking pulses, a first-order expansion may be inappropriate for the left-hand side. An asymptotic approximation comes from noting that the left-hand side is of order unity if M' ^ \. This gives a rough upper limit of
1>
K^)N.
This is similar to equation (11) when equation (3) is used in it to explicitly include F and (i. The limit of equation (17) is above that of equation (11), which was the pulse saturation limit. This shows that the model can handle all multiple pulses under the pulse saturation limit. A somewhat better approximation is to assume that arTrM' is small. This allows the expansion to first order of the numerator on the left hand side of equation (17):
which is of the form X > a ( l - e~^), where x = aiTrM' and a = /3FN/VT. Finally, the value of A^ can be related to the receptive field kernel (equation (2)) as N^NRF=
f I
WL(f-?)¥{?,t)(fr',
12
J o h n s o n , Ranganath, K u n t i m a d , and Caulfield
(a) A cell receives a composite linking pulse from an external group and fires M times for the threshold to exceed the internal activity U.
(b) A cell receives a composite linking pulse from its own group. It fires M' times, as do all the other cells in the group, causing more linking pulses. The linking pulse envelope saturates, allowing the threshold to finally exceed the internal activity U. FIGURE 5. Multiple pulses. Two cases are shown. In 5(a), a cell receives A^ linking pulses simultaneously, as would occur when the cell is not part of the group of N cells making the pulses. It must fire M times to overcome the composite linking pulse. In 5(b), the cell is a member of a group of iV + 1 cells. Since every member must fire multiple pulses, each fires M ' times, and each firing generates an additional linking pulse of size iV, which the cell must attempt to overcome by firing again. It succeeds eventually because the linking pulse train envelope saturates more quickly that the threshold pulse train, allowing the threshold to catch up after M' pulses. (Reprinted with permission from [26]).
1. Pulse-Coupled Neural Networks
13
which, with equation (11) or (17), shows that the integral of the receptive field kernel W needs to be finite if the slab is not bounded.
4
Multiple Receptive Field Inputs
The pulse-coupled neural network is a dendritic model. The inputs from the receptive fields enter along the length of the dendrite, and the linking modulation occurs at the point of entry, the dendritic signal flows toward the cell body. There can be many inputs. The internal activity U is in general of the form f/ = F ( l + /3iLi)(l + (32L2){1 + 03L3)...-{1 + PnLn).
(18)
This is for one dendrite. A cell can have many dendrites. They are summed to form the total input to the cell, and can be excitatory or inhibitory. If the products are carried out, the internal activity has all possible products of all the receptive fields. These are products of weighted sums of inputs, as shown in equations (1) and (2). It is seen, then, that these are general higherorder networks. Eckhorn argues that the inputs far out on the dendrite have small synaptic time constants, while those close to the cell body have large synaptic time constants, so there is a transition from "feeding" to "linking" inputs along the length of the dendrite. The receptive fields can overlap, they can be offset, and each one can have its own kernel defining its weight distribution. Now, a given weight distribution W can give the same weighted response for more than one input distribution. This corresponds to a logical "OR" gate in that sense. The linking modulation uses an algebraic product, which corresponds to a logical "AND" gate. The inhibitory inputs give logical complementation. In this view (Figure 6), each neuron is a large multirule logical system. This property was used to achieve exact scale, translation, and rotation invariance as shown by the simulations discussed later.
5
Time Evolution of Two Cells
This section shows how to follow the time evolution of the pulse outputs of a two-cell system. As each cell fires, it can capture the other cell and shift its phase. By constructing an iterative map of the phase shifts from one pulse to the next, the time of firing can be predicted. The map plots the current phase versus the next phase. The simplest form of the map, discussed here, is constrained to one-way linking. There are two cells. The first one has a feeding input Fi and the second has F2. The first cell sends a linking input to the second, but not vice versa. It is assumed that the
14
Johnson, Ranganath, Kuntimad, and Caulfield RECEPTIVE FIELDS
Yk
DENDRITE
CELL BODY
Yk —
I
n-1
.^
I
n
^ ^
I ,
o n DENDRITE
n+1 n ^ n U: =Ui(l+PiL:)
EACH DENDRITE IS A LOGICAL RULE RF weighted sums
-^^^^^ "OR"
Linking product
-^^^^
"AND"
FIGURE 6. The linking field model neuron is a multirule logical system. A dendrite receives inputs from many receptive fields along its length. Each input modulates the dendritic signal by the factor (1 -\- PjUj) for the nth input. The receptive fields can give the same signal for more than one input distribution and thus correspond to a logical "OR". The product term in the modulation factors corresponds to a logical "AND". These logic gate correspondences are not exact, but they can be used effectively, as shown by the example discussed in the text. Reprinted with permission from [3]). linking pulses are Kronecker delta functions (0 or 1), with no decay tail. The threshold is assumed to recharge instantly by an amount VT from the point where it intersected the internal activity. In this case the forbidden zone is equal to the capture zone. To form the map, first construct the threshold diagram of Figure 7. Pulses can intersect the internal activity outside the forbidden zone, including on the leading vertical edge of the linking pulse. This then defines an upper trace, where the recharged threshold can begin its decay back down to the internal activity. The upper trace is simply the lower one, raised up by a distance VT- It is effectively a launch platform from which the recharged threshold begins its downward decay. When the threshold again intersects the lower trace, it recharges and comes to a new location on one of the upper traces at a later time. This generates a mapping from one upper trace to another, and it can be used to make the iterative map with which to follow the time behavior of the system. Let the total length along the trace be X. Note that this consists of a horizontal (H) section followed by a short vertical (V) section corresponding to the
1. Pulse-Coupled Neural Networks
15
leading edge of the linking pulse (Figure 7). Let the remapped length be Y, If the threshold launches from the horizontal part of X , it can hit either a horizontal or vertical part of F , and the same is true for launch from the vertical part of X. The mapping accordingly will be linear (horizontal to horizontal, vertical to vertical), exponential (horizontal to vertical), or logarithmic (vertical to horizontal). There are five distinct cases, depending on where the mapping starts and ends. They are Case I Case II Case III Case IV CaseV
HV - HH - VH - VV HV - HH - VH HH - HV - HH - VH HH - HV - VV - VH HH - HV ~ VV
The iterative map for Case I is shown in Figure 8. It is piece wise continuous and has an upper section and a lower section. All the curve sections can be written parametrically in terms of the inputs Fi, F2, the time constants Q T , QL, the linking strength /3, the Unking period TL and pulse period TT, the capture zone time period TC (which is also the forbidden zone in this case), and the number N of linking periods spanned by the threshold pulse period. The map of Figure 8 can be followed, step by step, by reference to the traces shown in Figure 7. Suppose a pulse begins on the upper trace's horizontal region and maps to the next lower trace's vertical region, following the b - b decay curve of Figure 7, for example. This would be an HV transition in Figure 8. It is reset by Vr to the corresponding upper trace. From there, it decays and hits the horizontal section of the next lower trace, as indicated by the e-e decay curve of Figure 7. This is a VH transition. It is again reset to the upper trace by Vr, decays to a horizontal section through an HH transition (the a-a decay curve of Figure 7), resets to the upper trace, again decays to another horizontal section (HH), resets, and this time maps from a horizontal section to a vertical section (HV) as shown in Figure 8. This follows the two-cell system through one cycle around the phase map of Figure 8. Note that although it has again reached an HV transition, it occurs at a different point than the first HV transition. If the system approaches a limit cycle in Figure 8, this means that the corresponding cell has a periodic pulse train output.
5,1
The Linking Decay Tail Is an Unstable Region
A geometrical argument can be used to show that the linking decay tail is an unstable region. Suppose there are two mutually linked cells, both fed by the same input F. Then they pulse at the same basic frequency. Now suppose that they are out of phase such that they link on each other's linking decay
16
J o h n s o n , R a n g a n a t h , K u n t i m a d , and Caulfield
!L
upper trace
F+Wj
lower trace
time
cc' d
FIGURE 7. Two cells with one-way linking. The top figure shows the threshold diagram for the cell receiving an idealized linking pulse from the other cell. The second cell does not receive linking from the first cell (two-way linking is shown in Figure 9). The threshold recharges from the lower trace by Vr, defining an upper trace as well. When the threshold decays from the upper trace to the lower and then is recharged back to the upper trace, it defines a mapping between upper traces that can be used to track the time evolution of the pulse history of the system. (Reprinted with permission from [26]). tail, as shown in Figure 9(a): Each cell's threshold intersects t h e internal activity level of t h e other cell beyond t h e c a p t u r e zone. Consider first cell # 1 . It links on t h e decay tail of t h e linking input from cell # 2 a t point A i , recharges to t h e upper trace, decays, and links again at point Bi. T h e diagram shows a composite trace combining t h e upper and lower traces for cell # 1 , with points Ai and Bi b o t h on it. A similar composite trace is true for cell # 2 . Now consider b o t h cells, as shown in Figure 9(b). T h e difference A2 — A i is the change in time separation between t h e firing of t h e two cells. Due to t h e difference in t h e height of t h e linking trace at points Ai and A2, A2 — A i will in general not be zero. (There is a single point on t h e decay tail where this difference is zero, b u t it is an unstable point.)
1. Pulse-Coupled Neural Networks
17
FIGURE 8. Iterative map. The horizontal axis is the total distance along the upper trace of Figure 7, from which the threshold can begin its decay, and the vertical axis is the distance along the upper trace where the pulse returns after it has recharged. There are five distinct cases, and each case is defined by the particular values of the two-cell system and its two feeding input strengths. For each case there are four possible transitions, HH, HV,VH, VV, corresponding to the initial and final locations on the traces of Figure 7. H indicates horizontal, V indicates vertical. These transitions are discussed in the text. (Reprinted with permission from [26]). It is clear from the diagram that the firing time Bi of cell # 1 will move closer to the leading edge of the linking pulse from cell # 2 by an amount A2 — Ai. The same is true for ^2- The cells constantly try to catch up with each other by firing more frequently, but each one's gain helps the other one gain more, and the overall eflFect is that they repel each other out of the decay tail region. After several cycles, one of the cells' thresholds will decay into the leading edge of the linking pulse from the other cell and thus will fire at essentially the same time as that cell. Since both have the same
18
Johnson, Ranganath, Kuntimad, and Caulfield
feeding input, they will be phase locked together from this time on. This shows how two cells with the same feeding input will always become phase locked together, regardless of their initial phase difference, due to the finite decay tails of the linking pulses.
6
Space to Time
Consider a group of weakly linked neurons. Suppose at time zero all the neurons fire together. As time goes by they will occasionally link in different combinations, as illustrated in Figure 10. Each neuron has its own basic firing rate due to its particular feeding input. Suppose further that at time T the neurons' combined firing rates and linking interactions cause them all to fire together a second time. This duplicates the initial state at time zero. Then everything that happened during time T will happen again in the same order, and all the neurons will fire together again at time 2T. This will continue, resulting in periodic behavior of the group with period T. The assumption of a single exact repetition of a given state (all the neurons fire together, for example) leads to the conclusion that everything that happened between the repetitions must necessarily happen again in the same order, in a permanently periodic way, for every neuron in that group. If all the outputs of the group are linearly summed, the result will be a single periodic time series that is the signature of that spatial input distribution. This is the time series S{t) for that group of neurons [7]. The length of time required for periodicity is primarily governed by the ratio '^c/Ttyp where rtyp is the characteristic pulse period of the input image. (For large P the ratio can be much greater than one, in which case the group links on every pulse and is completely periodic.) Two other factors that promote periodicity in a two-neuron system are linking in quasiharmonic ratios and linking on the decay tail of the linking pulses. For quasiharmonic pulse rates such that \mT2 — nril < TC^
m,n
are integers,
(19)
the two neurons will periodically link about every mr2 seconds. When two mutually linked neurons link on the decay tails of the linking pulses (Figure 9), the cycle-to-cycle behavior is that they actively expel each other from this region into the leading edge linking region. While both effects promote periodicity they do not guarantee it. The time required to achieve periodicity, and the overall period length, can be large for large, weakly-linked slabs. The following interpretation of the time series relates it to the input image's intensity histogram. The network maps intensity to frequency. The size of an isointensity image patch determines how many neurons fire at that
1. P u l s e - C o u p l e d Neural N e t w o r k s
19
(a) Threshold diagram for cell #1, showing origin of composite trace diagram.
Cell#l
Cell #2
(b) Interaction of cell #1 and cell #2. B2 actually occurs in time on the next cycle, at the point (B2).
FIGURE 9. Two cells each linking on the other's linking pulse decay tail. Upper and lower traces are defined for each cell, and a composite trace is constructed that shows for each cell its map points A and B from one recharging point to the next (a). Both cells have the same feeding input strength F. Figure 9(b) uses the composite traces for both cells to show their interaction. Each cell's second recharging point B shifts the linking pulse time for the pulse that it sends to the other, with the result that both cells' firing points steadily move closer to the leading edge of the linking pulses until one or the other locks in the capture zone. The cells are then phase locked. When finite linking decay exists, as assumed here, this interaction shows that two cells with the same feeding input strength will always become phase locked. (Reprinted with permission from [26]).
20
Johnson, Ranganath, Kuntimad, and Caulfield
G> GVL GvL
I U I J_JJ II 1^ I* i.ii n
11^
I A
mt
I II I
SUM 1+2+3+4 t=0
t=T
FIGURE 10. Formation of a periodic time series. Neurons 1-4 all fire together at t = 0. As time passes, they occasionally link in various combinations. If at time T they again link as so to fire together, the situation will be the same as at t = 0. The system will repeat its behavior, generating a time series. The linear sum of the group's outputs is the periodic time signature of the input distribution to neurons 1-4. (Reprinted with permission from [3]). corresponding frequency, so patch size maps to amplitude. The image's intensity histogram counts the number of pixels with a given intensity, while the amplitude of a given frequency counts the number of neurons firing at that rate. The frequency spectrum of the time signal is the intensity histogram of the input image as mapped through the sigmoidal response. Although this argument holds exactly only for a system with zero linking, a linked system will generate an intensity-quantized histogram whose envelope generally follows that of the analog input image. This is true for discrete pulse models and for continuous oscillator models, and for any other model where the output frequency is proportional to the input signal strength. For a linked slab, the coherent periodicity of the time signal suggests that there must exist phase factors as well as frequency and amplitude. Suppose that the time signal S{t) is expressed as a sum of delta function pulses: K
^w = EE«*^(^-^^-^^)' n
(20)
k=l
where T is the periodicity, ak is the amplitude of the A;th subgroup, and (pk is the time offset of the subgroup of cells with amplitude ak. The time offset
1. Pulse-Coupled Neural Networks
21
is between zero and T, and there are K subgroups that are Unked into the overall repetition period T. If a fourier transform is taken, it factors into a sum of complex phases and a sum representing the repetition period: K
F.T.(S) =[J2^ke'''^'] [^e'''^'^]. k=l
(21)
n
The corresponding "histogram" must in some form include the phases as well as the amplitudes. Other transforms may be more appropriate; the Fourier transform was used here for illustrative purposes. This discussion shows that the geometrical content of an image, as well as its intensity, is encoded in the time signal, and that distance-dependent linking action provides a way to include syntactical information. The time signals are object-specific. They are a signature, or identification code, that represents a two-dimensional image as a time-domain signal that can be processed by the neural net. The signatures have some degeneracy, but this can be an advantage rather than a drawback, since certain classes of degeneracy can also be viewed as invariance.
7
Linking Waves and T i m e Scales
The linking pulses are transmitted very quickly as compared to the firing rates of the cells. If the receiving cells are within their capture zone, they will be induced to fire upon receipt of the linking inputs, and their output pulses can in turn capture other cells. This causes a wave of pulses to sweep across a region of the slab. The propagation of the wave will follow the underlying feeding input distribution, generally flowing down gradients and firing larger or smaller areas of cells according to how many are within their capture zones. The time profile of the firing history will reflect the shape of the underlying feeding spatial distribution, and, for the case of the feeding input being an image intensity pattern, be related to the geometry of the image, as shown in Figure 11. The repetition rate of a linking wave, e.g., how often it sweeps through an area, is determined by the intensity in that area. On a time scale that shows the linking wave profiles, the profiles can be taken as elementary signatures identifying their areas. On a time scale that compresses the linking wave profiles into a single time bin, the repetition period of each area can be used to segment that subregion of the total image. These segmented areas are in eff^ect "giant neurons," i.e., synchronous groups. The linking still exists, and these groups transmit and receive composite linking pulses. They have their own group capture zones and behave like single neurons in many ways, with the exception that their output pulse is no longer a binary 1 or 0 but instead has an amplitude that is equal to the number of individual cells comprising the synchronous group.
22
J o h n s o n , Ranganath, K u n t i m a d , and Caulfield
Accordingly, group linking waves can exist. This is discussed in the next section. The time profile on this scale is the signature of the group of linked groups, and on yet another still-larger time scale the repetition period of the group of groups can be used to segment it into a supergroup. At this point the interpretation from an image processing standpoint is that the syntactic information of a large composite image has been encoded into an object-specific signature for that image. In principle, further time scales can be incorporated indefinitely in a self-similar manner, leading to groups of supergroups, supergroups of supergroups, and so on, each having its own time signature and segmentation time scale. This is indicated by Figure 12. It reduces the fundamental problem of image understanding to one of time correlation of time signatures, which may be a solvable problem. It has implications for how the brain works to send and receive signals. The Eckhorn linking field and in general all higher-order networks when used with pulsed neuronal models provide a specific mechanism to generate the essential time signals that carry syntatic information about arbitrary spatial distributions.
8
Groups
On a time scale that segments groups of cells, multiple pulses occur even for very weak linking strengths. Consider an idealized situation (Figure 13) where there are two groups having A and B numbers of cells in groups A and B, respectively. Assume for simplicity that each group sends a linking pulse of amplitude A' or B' to the other. Look at a cell in group A. Let M^ be the number of multiple pulses of group A. Then equation (17) gives an estimate M^ = /3FAA'/Vr for large numbers of multiple pulses. The repetition period of group A is longer than that for an individual cell because its threshold must rise via multiple pulses within the group to overcome A'. Approximately, it can be obtained from equation (12) by substituting M'J^VT for Vr- Now look at the linking inputs, and write the total internal activity: UA = FA{1 + P{A'YA
+ B'YB)).
(22)
The F's give the moments in time when the groups' pulses occur, each at its own characteristic period. The groups A and B rescale all their characteristic times in proportion to the group sizes. The capture zone for group A with respect to group B, for example, is now
OLT
1. P u l s e - C o u p l e d Neural N e t w o r k s
23
AT
Signature on time scale Ax.
illlu
JM
Ill
^^
i Segmentation on time scale X.
FIGURE 11. Linking waves. An elementary region generates a linking wave that sweeps through it. The time history of the wave amplitude as summed over the region depends on the geometry of the area and is its signature. The repetition rate of the wave defines a time scale on which the elementary area can be segmented. (Reprinted with permission from [26]).
and the decay time of group A is
TA = — l n ( l + ^ ^ ) OCT
i* A
= — ln(l + OLT
/3A').
(23)
24
J o h n s o n , Ranganath, K u n t i m a d , and Caulfield
Elementary image patch
Jk
Image feature
Composite object
LL 7
m
/ /
mm y^^^Y/^ /^y\
FIGURE 12. Time scales. Linking waves for elementary areas make signatures for them. On a time scale where these areas are segmented, the signatures are compressed into a single time bin and become a composite pulse. The composite pulses link as groups (see Figure 13) to make linking waves on a group of elementary groups. The time history of the amplitude of these waves is the signature for the group of groups. Increasing the time scale so that these signatures are in turn compressed into a single time bin leads to supergroups, which in turn link together and form linking waves on that time scale. The process continues, leading to signatures for entire images as suggested by the figure. (Reprinted with permission from [26]).
1. Pulse-Coupled Neural Networks
25
The period of group A is the sum of the time required for the pulse burst and the decay time. This is a major change from the operation in the single pulse regime. There, the period depended on the individual cells' feeding inputs, while here it depends on the linking input from its own group. Since that linking input will be proportional to the area of the group and not its intensity, the behavior of a system of groups in the multiple pulse regime is driven by the sizes of the areas rather than only by their intensities. The intensity, however, will partially control the number of pulses in the bursts from each group (see equation (17)) and thus will enter into the period via M'. The size of the capture zone is still a function of the linking input, so the ratio of it to the group's period will determine the degree of linking among groups. This ratio can still be small, which defines the linking to be in a "weak linking" regime again. Even though the system emits multiple pulses and synchronous bursts, it is still in a "weak linking" mode on this larger time scale of group interactions. The system for groups is scaled in proportion to the number of cells in each group (with allowance for multiple pulses), giving a larger time scale on which linking among groups occurs, but in the same way as linking occurs for individual cells. This is illustrated in Figure 13.
9
Invariances
If there are symmetries in a receptive field weight pattern such that any two parts of it are the same, then an exchange of the corresponding parts of an input image distribution will not change the sum of the product of the field and the image. The exchanged parts of the input image will still be multiplied by the same weight values, because the weight pattern wa^ the same in those two regions. The exchange symmetry of the weight pattern makes the output of that field invariant against the corresponding exchange operation acting on the input image. This is because the neuron's output is determined by the internal activity Uj, which is a function of the feeding and linking inputs. They, in turn, are weighted sums. In general, if the image changes in a way that fits the symmetries of the feeding and linking receptive fields so that the internal activity doesn't change, then the neuronal output will be invariant against those changes. The utility of this is that the symmetries of the receptive fields then correspond to invariances of the time signal generated by the input image [7] because the time signal is driven by the internal activity. This is a very general principle. It can be used to make desirable time signal invariances by an appropriate choice of receptive field symmetries. The pulse-coupled network produces time series that encode in their phase structure the two-dimensional spatial input distribution, including its geometrical connectivity relationships. Symmetries
26
Johnson, Ranganath, Kuntimad, and Caulfield F^pA' triggered by F^pB'
,
-->^
time \ FIGURE 13. Group linking. Two groups A and B send linking pulses to each other. Their thresholds must recharge to a height that exceeds their own group action (Figure 5), and so they reach heights approximately equal to their group linking amplitudes. These are much greater than for single-cell recharging. But the inter-group linking pulses are also much larger, and as a result the relative heights of both the thresholds and the linking inputs scale with group size. The ratio of the capture zone and the group periods can still be small, giving effectively "weak linking" despite the presence of multiple pulses. The detailed structure of the amplitudes is shown in Figure 5; it is simplified here for clarity. (Reprinted with permission from [26]). can be introduced in the receptive fields to make the time signature of an image invariant against translation, rotation, and scale. Simulation results also show that there is a significant insensitivity to scene illumination and distortion, and further that there is some limited insensitivity to changes in the overlying patterns (shadows) on a given image shape. The design objective is to make the internal activity invariant by introducing geometrical symmetries into the receptive field weight pattern. (1) For translational invariance use the same receptive field weight pattern at every neuron. (2) For rotational invariance make the receptive field patterns circularly symmetric. A translated and rotated image then covers a different set of neurons, but due to the translational and rotational symmetry of their receptive fields, sees the same receptive field patterns as before. The time signal is a sum over all the neurons, so it doesn't matter which neurons are used. (3) For scale invariance use an inverse square radial falloff. This does not make the internal activity invariant against distances r, but rather against scale changes as represented by the factor k in the rescaled distance kr. To see this, consider an optical image that is rescaled by a change in the object distance (Figure 14). In this case, the intensity per
1. Pulse-Coupled Neural Networks
27
image patch is constant. The number of cells affected by the rescaled patch is changed, but not their output pulse frequency. A neuron receiving the input at the rescaled location of the original image patch is driven by the same intensity as the neuron at the original location. For a rescaling factor of k, Y{kR) = Y{R). The linking input to that neuron, using an inverse square kernel, is /•27r
L{kR)
/"OO
= Jo J On /O Jpo r27r rcx)
=
1
——Y(k{R
1 1 Jo
+ p))kpkdpde
( ^M
-^Y{k{R + ^)pdpd9 = L{R).
(24)
Jpo P
This removes the scale factor dependence k from the integrand. The lower integration limit po is fixed and does not scale, so the above relation is not an exact equality in some cases. This will be discussed below.
Image patch
Original Optical Image
Rescaled Optical Image
FIGURE 14. Geometry for scale invariance. A neuron at R receives a linking contribution from a neuron at p . When the image is rescaled, the image patch at R goes to kR and the patch at p goes to kp . Only the latter patch is shown. For the case of an optical image rescaled by a change in the object distance, the intensity per image patch is constant. The object is to design a linking receptive field such that L(kR) = L{R). (Reprinted with permission from [3]). If the feeding field is a single pixel (this is not essential and is done here only for simplicity), then F{kR) = F{R). The internal activity of the rescaled image is thus the same as that for the unsealed image: U{kR)
=
F{kR){l
-f
pL{kR))
28
Johnson, Ranganath, Kuntimad, and Caulfield =
F{R){l-{-PL{R))
= U{R).
(25)
There is a problem that must be resolved before complete scale invariance is achieved. It appears to be less important for large images on a fine grid of cells, but when the isointensity patch size covers less than approximately 10 X 10 cells in the simulations, it has some effect. The problem is that the local group around a neuron also changes in scale. The linking input due to the local group accordingly varies with scale, making the internal activity change as well. The cause is the fixed inner edge po of the linking field. It does not scale. External groups do not have this property because all their boundaries shift accordingly as the image scale is changed. For simplicity consider a neuron at the center of its local patch, which is surrounded by an external patch, making two concentric circles, as shown in Figure 15. Let Po be the fixed inner edge of the local patch, and Yi and I2 the pulse activities in the local and external patches, respectively. Then pr y = 2TX I -4pdp Jpo P^
pR y + 27r / -4pdp = Jr
P^
2-KYI
r H In — + 2nY2 In —. Po
(26)
r
Under a scale change, r and R become kr and kR, but po is fixed. The linking input to the center neuron then has a scale factor dependence proportional to Yi ln{k). This is the problem. The solution is to make the internal activity distinguish between the local and the external groups, and to make both scale-invariant. The local group can be made independent of scale by using a nearest-neighbor receptive field with a fixed outer limit so it fits in the image's characteristic isointensity patch size. To distinguish between local and external groups, however, it is necessary to use the generalized linking field model with multiple linking fields as well as excitatory and inhibitory dendritic inputs. The dendritic signals are summed in the cell body and can be either excitatory or inhibitory. The weighted sums in the receptive fields correspond to fuzzy OR-gates, while the products from the linking modulation correspond to fuzzy AND-gates. This view will be used to construct a "semi-exclusive OR" to let the neuron distinguish between the local and the external linking inputs. Use two dendrites, each having two linking inputs. One dendrite is excitatory, the other is inhibitory. The same linking inputs Li and L2 are used on both, and both are fed by the same feeding input F , but the linking strength coefficients are all different: C/exc
=
+aiF(l+/3lLi)(l+^2l^2),
Uinh
=
- a 2 F ( l + /33Li)(H-/?4L2),
Utotal
=
Uexc-^Uinh'
(27)
Choose the a's and /3's such that they are all positive and such that the
1. Pulse-Coupled Neural Networks
29
FIGURE 15. Geometry used to show that the fixed inner radius po of the local group Li causes a dependency on the rescaling factor k. The external group L2 is in the annulus from r to R^ while Li extends from po to r. (Reprinted with permission from [3]). total internal activity has the form Utotai = F{1 + /3Li + /3'[1 - L i / ( L i ^ _ ^ ) ] L 2 ) .
(28)
For the values /3 = 0.2, / 3 ' = 0 . 3 , and Li^^^^^ = 4 0 used in the simulations, one possible set of coefficients is ai = 2, 02 = 1, /3i = I, f32 = 219/640, ^3 = 1.8, and /34 = 123/320. i^i(^^,) is the maximum possible value of the local-neighborhood linking input Li, and L2 is a linking input from a larger and more extended receptive field such as the inverse square field. Li gives the input from the local group, and L2 gives the input from external groups that do not contain the neuron being linked. When the entire local group fires, Li = ^i(^a«)5 sind the neuron sees only its nearest neighbors. When the local group is quiet, Li = 0, and the neuron can receive the L2 linking from the external groups. Suppose the rescaled image patch now makes several new adjacent groups out of the local group, all with the same frequency. If they are in phase, the neuron's local group will mask them. If they are not in phcise then they will link with the local group through the second linking input and be captured by the local group. Then they
30
Johnson, Ranganath, Kuntimad, and CaulReld
will be in phase, and the local group has effectively enlarged to include them but without altering the internal activity seen by a given neuron. When the outer limit of Li is chosen to overlap the inner limit on L2, the inner boundary of the external group is always the outer boundary of the composite local group, as desired. The system's architecture has translation, rotation, and scale invariance. It is a third-order network, which has been shown [17] to be the minimum order necessary for achieving these invariances all at the same time. An open problem is to derive specific geometrical rules in terms of the synaptic weights through equations (1), (2), and the internal activity equation, for these invariances.
9.1
Invariance Simulation Results
This model was simulated [3] on a PC. The array size was 33 x 33, and the images were made of five blocks, each with its own intensity, and the blocks rearranged to form the different test images. A cross shape and a "T" shape were used. They differed only in their geometrical arrangement, or syntax, an observation that will turn out to be of vital importance in our discussion of pattern recognition. Each block contained from five to eleven cells on a side, depending on the scale factor, and the background was set to zero in all cases. No noise was added. Analysis of the grid size indicated that reasonable results could be expected down to a 5 x 5 block size for rotation, and the scale increments were chosen so that the blocks varied in size by 5, 7, 9, and 11 cells on a side. The nearest neighbor linking field for Li was a 3 x 3 square (center excluded), while the outer radius of the inverse square linking field for L2 was fixed at 10 and the inner radius at 1. The simulation's equations were written for discrete time steps using the digital filter form from reference [1]. They are F
=
ImageO',A:)/255,
Llocalit-\- I)
=
Ai J^local
Lext{t-\-l) e{t + l)
= =
AiLext{t)-\-VLL2{t), A20{t) +
VTY{t),
Y{t) = Step{Utotai{t)-e{t)), (29) where Utotai is given by equation (18). The parameter values were Ai = exp{-l/ti),A2 = exp{-l/t2), ti = IM = 5, VL = b.Vr = 20, /? = 0.2,/3' = 0.3, I/i(^^^) = 40, and Image(j, A:) was the input image. The results are shown in Figures 16 through 21. The most important result was that the time signatures were object-specific. Each test image generated a distinct periodic time signal that would never be confused with the signal from the other class (cross or "T"). This showed that the pulse-coupled net encoded the images in accordance with their geometrical configuration.
1. Pulse-Coupled Neural Networks
31
because both images were built of the same five blocks arranged in different geometrical configurations. Good invariance was achieved for translation, rotation, and scale. The time signatures of the two test images were easily distinguished in all cases except for the smallest rescaled "T" (Figure 17). Its patch size was 5 x 5. A grid coarsness analysis had indicated that below a 7 X 7 size the grid effects would be significant. The rotated "T" images, likewise, were sensitive to these effects, but their signatures were still distinct from those of the cross image (Figure 16) for patch sizes greater than 5 x 5 . The rotated "T" images were translated, as well, to fit in the small slab grid of 33 x 33 cells, so Figure 17 also indicates translational invar iance. The images were tested with different scene illumination levels. It was found that their time signatures (Figure 18) were essentially invariant over a factor of two hundred in illumination. This was not expected, as the ratio of the capture zone time to the neuronal period changes in this case. What happens is that the signature period varies, as expected, but the signature itself remains the same. Detailed examination of these runs after the fact gives a possible explanation: The signatures reflect the propagation of linking waves through the scene object. These waves follow gradients, and changes in the overall scene illumination did not change the relative gradient patterns. There was less variation in the signatures due to scene illumination changes than for other image changes. Figure 19 shows the effect of image distortion. A coordinate transform of the form x' = x -\- O.Olxy^ y' = y -\- O.Olxy was used to approximate an out-of-plane rotation of about 30 degrees with some perspective added. The signatures retained their characteristic forms sufficiently for the cross and the "T" images to still be correctly classified by their signatures. Again, this suggests a close relationship between the image morphology and the time signature. The insensitivity to distortion is because the signature generation is more of an area effect than an edge or angle effect. Image intensity overlays were investigated next. The 9 x 9-scale "T" image was altered by transposing the two lower blocks. This would correspond to a shadow across the image, for example. The result, shown in Figure 20, is not invariant, but shows a distinct correspondence of the new signature to the original. Figure 21 shows the effect of combined image changes. Translation, rotation, scale, scene illumination, and distortional changes were made as indicated in the figure. The new signatures were similar enough to the originals for the altered images to be correctly classified as a cross or a "T" by using only the signatures. They are clearly not strictly invariant, but show a substantial insensitivity to the geometrical changes while retaining their object-specific character.
32
Johnson, R a n g a n a t h , K u n t i m a d , and Caulfield 150 1 100 250 100
SCALE:
200
121
ROTATIO N:
il
SC= 1
\c = o
SC = .82 A C = 45
iL^ 1 49
1
85
>C = .82
\c = o
^
SC = .82 A C = 30
1
s.C =
.64 17
p 25
j ^
5C = .46 \C = 0
FIGURE 16. Periodic time signatures and invariances for the cross image. The signatures cire the periodic part of the total output time signal of the pulsed 2u:ray. SC is the scale factor and AC is the rotation angle in degrees. Good scale invariance was found for scales over 1:0.46, and for large rotations of 30 and 45 degrees. The five blocks arranged to form the image were scaled from 11 x 11, 9 x 9 , 7 x 7 , to 5 x 5 block sizes. The 33 x 33 slab had a background intensity level of zero. Grid coarseness effects were expected for 7 x 7 and smaller block sizes in scale, and for 14 x 14 block sizes in rotation. Grid effects were not severe in this image. (Reprinted with permission from [3]).
1. P u l s e - C o u p l e d Neural N e t w o r k s
150
33
100 200 100
SCALE:
250
121 ROTATION
it L
kl llL
SC= 1 AC = 0
SC = .82 AC = 0
81
111
SC = .82 AC = 30
1,1 17
25
iiLi
SC = .46 AC = 0
FIGURE 17. Periodic time signature and invariances for the "T" image. Same setup as for Figure 16, but with the five blocks rearranged to form a "T". The signature was very distinct as compared to the first case, showing that the net makes unique time signatures for different images even when they are rearrangements of the same components. The scale invariance was good down to the 7 x 7 block size. The rotated images' signatures still followed the overall "T" signature shape in contrast to the cross signature. Their variation from ideal is strictly due to grid effects. (Reprinted with permission from [3]).
34
Johnson, Ranganath, Kuntimad, and Caulfield
81
iL
SC = .82 AC = 0 40
13
17
BRIGHT: I=2*Io
ORIGINAL
81
81 SC = .82 AC = 0
1
SC = .82 AC = 0
IL
SC = .82 AC = 0
13
17
BRIGHT: I=2*Io
ORIGINAL
DIM:I=.01*Io
L
SC = .82 AC = 0
40 DIM: 1=01 *Io
FIGURE 18. Intensity invariance. The 9 x 9 block size images were multiplied by an intensity factor /o corresponding to a change in scene illumination. Prom /o = 2 to 0.01 the signature was invariant in its shape, though the period of the signature varied from 13 to 40 time units. (Reprinted with permission from [3]).
10
Segmentation
Image segmentation, the task of partitioning an image into its component parts, may be defined as the process of decomposing a given image F into disjoint nonempty regions, or subimages, Ri, R2, • • •, Rk such that • RiUR2U'"URk
= F;
• Ri is connected for all z; • All pixels belonging to Ri are similar, based on some meaningful similarity measure M; • Pixels belonging to Ri and Rj are dissimilar based on M. In general, image segmentation is a challenging problem. Intensity variations within regions, fuzzy and incomplete boundaries, changing viewing conditions, and the presence of random noise are a few of the factors that make image segmentation a difficult task. In the past, researchers have used classical deterministic and nondeterministic methods, knowledge and rule based systems, and trainable neural networks to build automatic image segmentation systems. A recent survey paper by N. R. Pal and S. K. Pal summarizes many image segmentation techniques reported in the literature [18]. It is obvious that fast and accurate image segmentation is essential to
1. Pulse-Coupled Neural Networks
35
81
UL
81
III IL. 81
I
SC =.82 AC = 0
SC =.82 AC = 0
SC =.82 AC = 0
FIGURE 19. Image distortion. A coordinate transform approximating a 30-degree out-of-plane rotation was used for both test images. Their signatures were still distinct and recognizable as belonging to the correct image classification. (Reprinted with permission from [3]).
obtain meaningful results from image analysis or computer vision systems. The next few sections describe how pulse-coupled neural networks (PCNN) may be used for segmentating digital images.
36
Johnson, Ranganath, Kuntimad, and Caulfield 150 100 200
[150 100 200
100
250
250
100 121
121
Uk
SC= 1 AC = 0
ORIGINAL
i
1
sc = 1
III- ^ ° ° NEW
FIGURE 20. Signature of "T" image with two blocks interchanged. The two lower blocks of the full-scale unrotated "T" image were interchanged, simulating the effect of a shadow moving down the image. The new signature is similar to that of the 7 X 7 - block size "T" image and still has an initial peak followed by a valley and then a higher peak. In contrast, the cross image's second peak was lower than its first peak, so this signature would still be classified as a "T" and not a cross. (Reprinted with permission from [3]).
10.1
Modified Pulse- Coupled Neuron
An area is segmented by the PCNN when a linking wave sweeps through it in a time short compared to the overall repetition rate of that area, so the linking activity is the primary process in segmentation. In order to emphasize the linking action, the feeding inputs will be constrained to be small compared to the threshold gain Vr- Special attention will be given to the linking strength /3 and the radius r of the linking field, as well. The pulse generator and the dendritic tree are accordingly modified to reflect this emphasis. The number of neurons in the network is equal to the number of pixels in the image to be segmented. For each pixel in the image there is a corresponding neuron. Let Xj and Nj be the jth image pixel and its corresponding neuron, respectively. The segmentation model is as follows: 1. The feeding, or primary, input to Nj is the intensity value of Xj or simply Xj. There are no leaky integrators in the feeding branch of the dendritic tree. If desired, the average intensity of a local neighborhood centered on Xj may also be used as the feeding input to Nj. 2. Each neuron receives a linking input from its neighbors. Let Sj denote the group of neurons that are linked with Nj. Usually, a circular linking field of radius r centered on Nj is used: all neurons that are within a distance of r from Nj are linked to Nj. Other neurons are not linked to Nj. The outputs of all the leaky integrators in the linking branch of the dendritic tree decay at the same rate, as determined by the linking field decay time constant a^- The linking contribution of Nk to Nj is given by equation (1).
1. Pulse-Coupled Neural Networks
501
Ih
(18,18), SC = .64, AC = 45, 10 = .5, RD = 33.
121
37
44
llUlA. (20,14), SC = .64, AC = 45, I0 = .5,RD = 33.
121
u
(16,16), S C = 1 , AC = 0, I 0 = 1 , R D = 0.
iHUi. (16,16), S C = 1 , AC = 0, I 0 = 1 , R D = 0.
FIGURE 21. Effect of combined image changes. The original images were located at coordinates (16,16) with scale factors of unity, unrotated, and with no distortion (RD is the approximate out-of-plane rotation). The signatures were sufficiently insensitive to the combined changes for the images still to be correctly classified. (Reprinted with permission from [3]). Usually, the weights Wkj are inversly proportional to the distance or the square of the distance between Nj and Nk. 3. The feeding input Xj and the linking input Lj are combined via equation (3) to produce the total internal activity Uj {t) for the neuron Nj. At present, the value of (3 is the same for all neurons for a given image. However, it may be ultimately desirable to use different values of /3 for different regions, based on the regional intensity distribution. Then /3 can be viewed as an adaptive weight that adjusts to each image region for optimum segmentation. 4. The pulse generator of the neuron consists of a step-function generator and a threshold signal generator. The output of the step-function generator Yj (t) goes to 1 when the internal activity Uj {t) is greater than the threshold signal 9j{t). This charges the threshold according to equation (4). Since VT is much larger than Uj{t), the output of the neuron changes back to zero. The pulse generator produces a single pulse at its output whenever Uj{t) exceeds Oj{t). There are two major differences between this model and the original. The latter has the ability to produce a train of output pulses.
38
Johnson, Ranganath, Kuntimad, and Caulfield
The model used here for segmentation produces only one pulse, which is approximated by a unit impulse function. The second difference is in the recharging of the threshold. Because the internal activity Uj{t) is much smaller that the threshold gain factor Vr, the recharging is done by setting the threshold to VT rather than to 6j{t) -h Vr- If two successive firings of Nj occur at times ti and ^2, then 0j{t) = VTe-''^^^-^'\
tl
t2.
(30)
This new threshold mechanism is equivalent to the old one when the input signal level is much smaller than the threshold gain factor, as can be seen by looking at the pulse period TJ : 1 VT I VT — ln(l -\~^)^— J_ln(l + ^ ) - J - l Inn(( ^^ );, ax
aT
Aj
for Xj « VT.
Aj
On the segmentation time scale, neurons corresponding to pixels of each image region are forced to pulse together periodically. The pulse rate of a region is determined by the feeding and linking inputs to its neuron group. Therefore, it is important to understand the mathematics associated with the firing rate of a neuron using the segmentation model approximation of equation (30). Consider first a totally unhnked PCNN. Such a network may be realized by setting the linking strength /3 to zero. The activity internal to A^^ is then simply Xj. Initially, at time t = 0, Oj{0) = 0 for all j . Assuming that Xj is greater than zero, all neurons fire at f = 0. From then on, each neuron fires periodically, and the period is determined by the feeding input, VT, and QT- Since VT and Q T are constants, the period is a function of the intensity of Xj. The intensity / and the corresponding period T{I) are related by T ( / ) = —{IniVr)
- HI)).
(31)
aT
For a given / , T{I) may be increased by increasing VT or decreasing QT- It is often convenient to express T{I) as the number of decay time constants. The period in number of decay time constants is r(/) =\n{VT)-HI).
(32)
The plot of r ( / ) as a function of ln(/) is a straight line with slope —1 and intercept In(VT). If r{I) is known, one can compute r{al), T{I -f 6), and r(a7-f 6): T{al) r ( / + 6)
= r(/)-ln(a), = r ( / ) - l n ( l + 6//),
(33) (34)
T{al -\-b)
= r{I) - ln(a -h b/I).
(35)
1. Pulse-Coupled Neural Networks
39
It is interesting to note that r ( / ) — T{al -f b) is independent of Vr- Also, r ( / ) — T{al) is independent of both VT and / . The approximation of equation (30) makes the system less dependent on the prior activity of the threshold, and its behavior is more strongly governed by the linking. Now consider the effect of linking. Let ^i, t2, ts, . . . mark the times at which the ith neuron fires. For ti < t < ti^i, let / and Lj{t) be the feeding and linking inputs to the neuron, respectively. The linking input increases the internal activity of the neuron from / to 7(1 -h /3Lj{t)). Accordingly, in the interval ti-^i — ti the period reduces from T{I) to T'{I): T\I)
= T{I) - — ln(l + 0Lj{ti^,)), ax
(36)
If the decay rate of Lj (t) is large and much greater than the decay rate of Oj{t)^ the following statements can be made: 1. Lj{t) may be approximated as an impulse train, whose magnitudes are proportional to the number of linking input pulses at time t. 2. If a subset of neurons belonging to Sj fire at tj and fail to capture Nj at that time, then the subset will not capture Nj later in the interval ti < tj < ti^i. In other words, there is no linking decay tail, and the receiving neuron's output is unaltered if the linking pulse is outside the capture zone (equation (14)).
10.2
Image Segmentation
The image segmentation approach using pulse-coupled neural networks is described in this section. Figure 22 shows an image consisting of two regions i?i and i?2- Spatially connected object pixels form Ri. Similarly, spatially connected background pixels form i?2. Perfect segmentation is possible if there exists a linking radius r and a linking coefficient /3 that will force all neurons belonging to Ri to pulse together periodically with period Ti. Of course, Ti is not equal to T2. If all pixels of i?i are of intensity /i and all pixels of R2 are of intensity I2, the segmentation problem becomes trivial. A pulse-coupled neural network with /3 equal to zero will do the job. Neurons of Ri will fire together at times t = nT{Ii), where n is an integer greater than or equal to zero. In practice, image segmentation is not this simple. Images that consist of two regions will have bimodal histograms. Assume that [h^h] and [/3,/4] are the intensity ranges of the background {R2) and object (Ri) pixels, respectively. If I3 > h, simple thresholding can be used to achieve perfect segmentation. When h < h^ thresholding techniques do not produce a perfect result. Optimal thresholding techniques minimize or attempt to minimize the error. The error may be defined as the number of pixels incorrectly classified during segmentation. The presence of linking inputs
40
Johnson, Ranganath, Kuntimad, and Caulfield
(a)
(b)
(c)
FIGURE 22. An example of a perfect image segmentation, (a) input image; (b) segmented object region; (c) segmented background region. (Reprinted with permission from [27]. © IEEE 1995.)
makes pulse-coupled neural networks fairly insensitive to noise and minor local intensity variations. As a result, the PCNN is expected to produce better segmentation results. Consider the segmentation of the digital image in Figure 22. Assume h > h and /i > 0. At t = 0, all neurons fire and charge the outputs of all the threshold units to Vr- The group of neurons corresponding to object pixels of intensity I4 fire first at time ^1 = T{l4). This type of firing, which is mainly due to the feeding input, is called natural firing. The natural firing at ti leads to the following: 1. Object neurons for which the following inequality is true are captured
1. Pulse-Coupled Neural Networks
41
a,t t = ti : Xj{l-^l3Lj{ti))>h.
(37)
Subscript j is used to represent object pixels and neurons. 2. Background neurons for which the following inequality is not true are also captured at ^i : Xk{l-^pLk{h))
(38)
Subscript k is used to represent background pixels and neurons. 3. Object pixels not captured at ti fire in several groups after ^i. The number of groups and the exact time at which each group fires are determined by the intensity distribution of Ri, P, and r. 4. Neurons corresponding to background pixels of intensity /2, which are not captured so far, fire at t2 = T{l2)- This primary firing has no eff^ect on neurons that have already fired {VT is large compared to the image intensity). However, all background neurons that are in the capture zone of this primary firing will fire at ^2 • X,(l+/3L,(^2))>/2.
(39)
Other background neurons organize into several groups and fire after If inequality (37) is true for all Nj (object neurons), and inequalities (38) and (39) are true for all Nk (background neurons), the input image is perfectly segmented even when I2 > h. The value of the linking input to Nj, Lj{ti), depends on the composition of 5j and the number of fired neurons at ^1. For pixels like Pi, where all members of Sj are object neurons, Lj{ti) is relatively large. For pixels like P4, where Sj consists of mostly background pixels, Lj{ti) is small. Let Lmini = minLj(ti), Lmin2 = minLj{t2), and Lmax2 = maxLk{ti).
It is obvious that values of Lminl, Lmin2, and Lmax2
depend on r and object-background boundary geometry. All three increase in value as r increases. However, the rate of increase varies depending on the boundary geometry. Perfect segmentation of the input image is possible if there exist 0 and r such that the following inequalities are true:
hil -^ PLminlih)) hil -^ 0Lmax2itl))
> /4, < h.
h(l^pLmin2{t2)) > h-
(40) (41) (42)
The above conditions when satisfied guarantee a perfect result for the worst case.
42
Johnson, Ranganath, Kuntimad, and Caulfield
However, the solution may not be unique: perfect segmentation is not always possible. Inequality (40), when not true, leads to the fragmentation of Ri. Similarly, if inequality (42) is not true, R2 gets fragmented. Some background neurons (perhaps those near the object boundary) fire with object neurons, making Ri look larger than its actual size when inequality (41) is not true. A challange is to find optimal parameters /3* and r* that minimize the error. The determination of p* and r* requires adaptation and is not addressed in this chapter.
10.3
Segmentation Results
A pulse-coupled network was simulated on a SUN workstation. A number of real and artifical images were used. The study focused on the effects of intensity variation within regions, extent of intensity overlap, noise and smoothing, and boundary geometry. Each artifical test image, an array of size 64 x 64, consisted of two regions, an object and a background. The object was a 32 x 32 subimage located at the center of the image. The object's intensity range was [/a, 74]. The remaining pixels of the image formed the background, and its intensity range was [/i,/2]. The object intensity range overlapped the background intensity range: I4 > I2 > h > h- Since the object was rectangular, the boundary geometry was simple to handle. For r = 1 only four pixels (top, bottom, left, right) were in the Unking field. It can be shown for that c a s e t h a t Lrninl
^^ 2, Lrnin2 ^ 3 , a n d Lrnax2 ^^
1. Perfect segmentation is possible if /? is in the range [(3i,/32], where /3i = m a x [ ( / 4 / / 3 - l ) / 2 , ( / 2 / / i - l ) / 3 ] , 02 = ( / 4 / / 2 - I ) .
(43) (44)
If P2 is not greater than Pi, then perfect segmentation is not possible. Note that the solution range of /? changes with r. A number of artifical images were created by varying the object and background intensity ranges and the extent of overlap. Figure 22(a) shows an input for which /i = 100, I2 = 175, I3 = 150, and I4 = 250. From equations (43) and (44) the solution range for /3 is [1/3, 3/7]. The image was segmented using r = 1 and P = 0.35. The segmented image as determined by the synchronous firing of neurons is shown in Figures 22(b) and 22(c). The PCNN gave a perfect result because a solution range for P existed. If the intensity distribution of the image is such that Pi is greater than or equal to /32, a perfect segmentation is not possible. Then the best /? can be determined by trial and error. The PCNN was tested using low-resolution TV and infrared (IR) images of tanks and helicopters for this case. Each image consisted of one target in a fairly noisy background. The network successfully segmented each image into background and target.
1. Pulse-Coupled Neural Networks
43
It is obvious that wide and excessively overlapping intensity ranges have an adverse effect on image segmentation. The segmentation error can be greatly reduced by shrinking the object and background intensity ranges and also by reducing the extent of overlap in the intensity ranges. A reduction in the intensity range reduces the value of /3i. Now more image pixels satisfy the desired inequalities, increaging the number of pixels correctly classfied. If the value of /32 then exceeds the value of ^ i , a perfect segmentation is possible. When the spread is due to noise, a smoothing algorithm can be used. Neighborhood averaging smooths regions but blurs edges. A median filter suppresses random noise and also preserves edges. The PCNN is also capable of smoothing images without blurring the edges. The technique is to run the net and adjust the feeding input intensity of the pixels based on the local neuronal firing pattern. If a neuron Nj fires and a majority of its eight nearest neighbors do not fire then the intensity is changed as follows: 1. If five or more neighbors are brighter than Xj, c is added to the value of Xj, where c is a small integer constant.
2. If five or more neighbors are darker than Xj, c is subtracted from the value of Xj.
3. If five or more neighbors are of the same intensity as Xj, the threshold signal of Xj is set to the threshold value of its neighbors. This compensates for the phase shift. A 128 X 128 image of Bambi, shown in Figure 23(a) is smoothed using the neighborhood average, a median filter, and the PCNN algorithm. The smoothed images are shown in Figures 23(b), 23(c), and 23(d). The PCNN filtered the noise without affecting the edges. In comparison, the neighborhood average blurred the edges. The median filter broke some edges and merged parallel lines running close to each other by filling in the dark spaces that existed between them. The PCNN performed better than the other two methods. Theoretical results and simulations show that pulse-coupled neural networks can be used for segmenting digital images. The possibility of obtaining a perfect result even when the intensity ranges substantially overlap is a new and exciting result. The net can also be used to filter random noise without blurring edges. Since the network is compatible with electronic and optical hardware implementation techniques, it is a strong candidate for real-time image processing.
44
Johnson, Ranganath, Kuntimad, and Caulfield
(a)
(b)
(c)
(d)
FIGURE 23. An example of image smoothing, (a) input image; (b) image after smoothing with PCNN algorithm; (c) image after neighborhood smoothing; (d) image after median filtering. (Reprinted with permission from [27]. © IEEE 1995.)
11
Adaptation
The Eckhorn linking field model contains synaptic weights but does not require a specific learning law. Any learning law, or none, can be used. (The Hebbian decay learning law is too rudimentary and is not considered here. It fails to retain the adapted weights after learning is complete. More realistic models such as the Grossberg competitive law [14] or a saturable law [19], either associative or causal, are more useful.) Any synaptic weight in the linking field model can be made adaptive, but for simplicity only the feeding field weights will be considered. The linking field weights will be fixed as the inverse square pattern in order to retain the invariance
1. Pulse-Coupled Neural Networks
45
properties discussed earlier. Suppose a wave of pulses sweeps over a region in which the feeding weights are adaptive (Figure 24). As the wave passes over a given cell, it is turned on and receives feeding input pulses. These weights adapt, memorizing the local pattern of the wave crest around the cell. The cells that had been active just prior to this time have been reset, and they are turned off. But the leaky integrator synapses connecting them to the currently on cells still have a residual signal on them, and those connections adapt to that strength. Likewise, the connections from the group of cells that had been active still earlier have an even more decayed signal strength, and the active cell will adapt to them as well. Each time the linking wave sweeps over the cell in question, more adaptation occurs. Whenever it is on, it sees the same pattern of active cells and decayed signals from the previously active cells due to the periodic nature of the established wave pattern. After adaptation is complete, suppose that a cell is stimulated and fires. It recalls the wave-crest pattern in its local neighborhood and also sends a pulse to the cells that had fired next as the wave passed over them after leaving the cell. These connections were adapted during training. The cell forward-biases them through the adapted feeding connections and further gives them an additional input through the linking field channel. This can cause them to fire next, just as the original linking wave had done. The process continues, each wave crest forwardbiasing the next, and the slab not only recalls the wave pattern but also sets it in motion again [7]. A time average of the slab's pulse activity then approximately recovers the original spatial distribution that generated the linking wave. The waves are binary fringe-like patterns very similar in appearance to holographic fringes. This suggests that it may be possible to store many wave patterns in an adaptive slab in the same sense that many holograms can be superimposed on a single photographic plate. It may be possible to have a slab with relatively few adaptive interconnects and to use the linking modulation to fill in the patterns when they are recalled. Figure 25 shows some wave patterns generated by a light square (lower left) and a light spot (lower right) on a light background. The network stores and recalls the traveling waves. It can also do the same for sequences of images. Use a distribution of feeding time constants such that some of the feeding synapses have very long decay times. Present one image of a sequence and allow its linking waves to become established and memorized, and then do the same for the next image of the sequence. Some of the synaptic connections will overlap the images in time. Now when the first image is recalled, those connections will also stimulate the wave pattern of the next image, and it will be recalled in turn. This is the mechanism used in the time sequence memory model of Reiss and Taylor [4], except that pulses are used here. In that model an intermediate slab with leaky integrator decay characteristics was used to provide the
46
J o h n s o n , Ranganath, K u n t i m a d , a n d Caulfield
linking wave
linking modulation
\U^
distance adaptive bias
o wave direction FIGURE 24. Adaptation, (a) A linking wave sweeps over a cell, turning it on. Its feeding synapses adapt to the current wave pattern and also to the decayed inputs from previously on cells whose signal is still present on the leaky integrator synapses connecting them to the on cell, (b) After adaptation the cell fires. It recalls the wave-crest pattern and forwaxd-biases the cells that need to fire next in order to recreate the wave motion. It also sends a linking modulation to them. The wave crest that should fire next can be stimulated in preference to the one that fired previously, and the wave motion as well as the wave-crest shape can be regenerated. (Reprinted with permission from [26]). overlap in time, and then adaptively associated with the current input image. Then when the first few images of the sequence were applied to the adapted system, they formed the decaying time overlap image, which in turn recalled the next image in the sequence. It was then fed back to the intermediate slab to make the next overlap, and so on, until the entire sequence had been recalled. Consider a slab on which several wave patterns have been adapted, either superimposed or in different locations on the slab. Is it possible to
1. P u l s e - C o u p l e d Neural N e t w o r k s
47
FIGURE 25. Linking waves from an optical hybrid laboratory demonstration system. The underlying image is a light square (lower left) and a light spot (lower right) on a light background. Coherent, locally periodic linking waves are generated as the system attempts to pulse at a frequency driven by the input intensity at each pixel while also attempting to obey the linking requirement. To satisfy both requirements the waves evolve and bifurcate into complex fringe-like patterns. (Reprinted with permission from [2].)
selectively recall a given pattern using only its time signal as input? This would mean that the slab could access any memory in parallel. Suppose the time signal of one of the encoded patterns is globally broadcast to the entire slab. It will stimulate all the patterns to attempt to regenerate their waves. As they start up, those that have different time signals will interfere with the incoming signal. The pattern with the same time signal will also interfere, since it will not generally be in phase with the incoming signal. None of the patterns will be able to establish themselves. They will continue to compete for resonance with the input. Eventually, the pattern with the matching signal may start up in the right phase. It will establish itself at the expense of the others because it will be locked in with the incoming signal and will proceed to generate its traveling wave pattern. A time average of the slab pulse activity then recovers the original input scene. This argument shows how a pulse-coupled adaptive neural network can in principle achieve parallel memory access. It is recognized that it must be verified before it can be claimed to be a viable mechanism for global recall,
48
Johnson, R a n g a n a t h , K u n t i m a d , and Caulfield
but it is a specific possibility.
12
Time to Space
The pulse-coupled neural network generates a time signal that encodes a spatial distribution. Is it possible to make a network that forms a spatial distribution from a time signal? If so, then the cycle would be complete: space to time to space. The time signal is periodic and coherent. The intensity of the input maps to frequency in the time signal, while the geometrical relationships are encoded by the linking into phases in the time signal. The desired mapping should have a frequency coordinate and a phase coordinate for each amplitude component. Wavelet transforms [20], [21], [22] retain both phase and frequency information, so these transforms may be appropriate for the pulse-coupled time signals. Wavelet transforms can be done optically [23]. A way to do it with a third-order linking field is discussed below. It is not required that the resulting spatial distribution be identical to the original one that generated the time signal, but rather that it be reasonablly object-specific. Then the time-to-space transform becomes the second half of a spatial remapping transform. The resulting spatial distribution can in turn make another time signal, and so on, so that an input is transmitted from one place to another as a time signal and at each place is operated on by spatial interactions. This is a parallel processor in one sense, but in another sense, it is a serial processor like a digital computer. It has the advantages of the parallel processing and adaptation inherent in a neural network, yet it can perform the sequential operations necessary for causal logic operations. It does not need predefined features. It generates its own syntactical features. These are very insensitive to geometrical distortions, yet they can be object-specific. The key is weak linking. In this linking regime it is possible to make periodic, coherent, object-specific time signals, and from them the rest follows.
12.1
A Model for Time-to-Space Mapping
This model uses a third-order pulse-coupled neural network. It consists of two slabs P and Q, as shown in Figure 26(a). The P-slab generates a spatial signal distribution of frequencies in the vertical direction and phases in the horizontal direction. The Q-slab receives a globally broadcast time signal at every cell and a one-to-one input from the P-slab. These are multiplied by a linking modulation in front of each Q-slab model neuron, making it a third-order node (Figure 26(b)). The product of the global time signal input and the P-slab signal input comprise the feeding input to the Q-slab cell. The P-slab consists of rows of horizontally linked cells
1. P u l s e - C o u p l e d Neural N e t w o r k s
49
(a) Time to space network architecture Q-slab linking Pulse | - ^ Y Q « I , , Gen.
V)
(b) Q-slab third order cell One-way P-slab linking I(v)
Yp($,v) Pulse Gen.
to (t)+l P-slab linking input
(c) P-slab second order cell FIGURE 26. A time-to-space architecture. A two-slab system is used. The P-slab has one-way linking across each row. Just as the last cell in a row fires, the first cell fires again. The kMigtli of the row and the feeding input of the row are chosen such that each row lias a repetition rate that increases with row number. The P-slab cells are second-order neurons. The Q-slab neurons are third-order cells. A time signal S{t) is globally broadcast to the Q-slab and multiplied by the input from the P-slab at eacli point. A pulse in the time signal with a given frequency and phase will be coinc ident with one of the pulses from the P-slab at a location corresponding to its frequency and phase, giving a nonzero feeding input to the Q-slab cell at thai location. This produces a distribution on the Q-slab whose geometry is a function of the frequency and phase x content of the time signal. (Reprinted with permission from [26]).
50
Johnson, Ranganath, Kuntimad, and Caulfield
with hnking only in the forward direction as shown in Figure 26(c). When the leftmost cell in each row fires, a linking wave sweeps across its row. The length of the row is such that the wave reaches the other side at the same time that the leftmost cell fires again. The rows have a feeding input / that increases with increasing row number. The result is that the P-slab sustains horizontally propagating waves along each row that have a repetition rate that increases with increasing row number. Each row represents a different frequency, and the distance along each row represents the phase at that frequency. Consider a time signal input S{t) globally broadcast to the Qslab. Suppose one of its frequency components i/ has phase 0. Then it will be coincident into the Q-slab cell with the P-slab's nonzero input on the z/th row and at the 0th distance along that row, and the linking product will be nonzero for that Q-slab cell. This construction satisfies the basic requirements for converting a time signal to a spatial distribution.
13
Implementations
The nonadaptative pulse-coupled neural network has been implemented as a hybrid optical laboratory demonstration system [2], [7] and as a linear eight-element electronic array. The optical system used a liquid crystal television (LCTV) spatial light modulator from a commercially available projection television to perform the linking modulation. The scene was reimaged to an intermediate focal plane and then sent through the LCTV located slightly beyond the focus so that it was out of focus. This allowed each pixel of the LCTV to modulate a small local area of the input image, effectively forming the linking receptive field by the defocusing circle. The input image was then reimaged into a standard video camera and its signal sent to a framegrabber in a 386 PC. The signal was compared to the current value of the computed threshold in the computer, and an output array was formed that contained a one or a zero, depending on whether or not the input was below the threshold. This array represented the pulses. It was used to update the threshold array, recharging at those pixels that had a pulse output, and then sent through the framegrabber back to the LCTV. A bright pixel there indicated that the neuron for that pixel had fired, and it multiplied the incoming scene to preform the linking modulation for the next processing cycle. Each cycle took about ten seconds, which gave time to examine in detail the traveling linking wave patterns that formed. The electronic chip array had eight neurons in a linear array. Each was linked to its two nearest neighbors and had a feeding input as well. Four arrays were built. Two were entirely electronic, and two had photodetectors at each cell for the feeding inputs and ferroelectric spatial light modulator pads for outputs. Preliminary tests of the all-electronic arrays showed a
1. Pulse-Coupled Neural Networks
51
pulse output range from 2 Hertz to 1 MHz and that the nearest-neighbor Unking was active. Further tests are in progress at this time. The optical implementation is attractive in that it allows access to the linking wave patterns for study, but it suffers from the limit of video frame rates. The best that it can do is 30 Hz for the maximum pulse frequency. On the other hand, electronic two-dimensional array architectures are entirely within current technology. The linking field receptive weight pattern can be approximated by a resistive plane or grid that is common to all the cells. It can also have local 3 x 3 linking fields in addition to the larger resistive plane field. Electronic arrays have the major advantage of high pulse rates, at or above the 1 MHz rate already demonstrated. The time signal is the sum of all the pulse activity, so the output can be a single wire. The linking modulation is straightforward, and the pulse generator architecture is electronically simple.
14
Integration into Systems
Two key features of the pulse-coupled neural network are first, it does not require training and second, it has the capability of operating very fast. This makes it suitable as a preprocessor because it can decrease the temporal complexity of many problems due to its high-speed parallel operation while producing an invariant output suitable for use by an adaptive classifier or by sequential iconic logical processors. The retina is an example of a preprocessor. It is nonadaptive and so can operate on any visual image. It is a hard-wired processor with parallel, high-speed action. It does immense bandwidth reduction, edge enhancement, noise reduction, and spectral decomposition and transmits the preprocessed results, all in real time. There is some evidence that the human vision preprocessor has further properties in terms of ability to tolerate significant distortions. For instance, in a 1993 special issue of Science on human vision [24], "Recognition of objects from their visual images is a key function of the primate brain. This recognition is not a template matching between the input image and stored images like the vision in lower animals but is a flexible process in which considerable changes in images, resulting from different illumination, viewing angle, and articulation of the object, can be tolerated." If the retina does in fact produce the invariant time signals of the pulse-coupled net, a view supported by the simple symmetries in the nonadaptive receptive fields being the cause of the invariances, then the "tolerance" is in the preprocessor itself. When viewed as an image preprocessor, the pulse-coupled neural network bridges the gap between the most fundamental division in pattern recognition: the division between the syntactical and the statistical approach. In statistical pattern recognition, the properties (features) of the
52
Johnson, R a n g a n a t h , K u n t i m a d , and Caulfield
scene are measured and used to form a multidimensional feature vector in an A^-dimensional hyper space. Each set of measurements forms a vector in the space. If the features form groups (i.e., if they are "good" features), then surfaces in the hyperspace can be found that "optimally" separate the groups. Then a given input feature vector can be classified as belonging to one of the groups. The problem is that the features must be correctly defined, and this has been a major problem in statistical pattern recognition. Syntactical pattern recognition goes beyond statistical pattern recognition by considering, and indeed emphasizing, the relationships among features. Since the number of possible relationships is exponential in A^, this is an incomparably richer, more powerful method. It is also much harder: the number of groups is also exponential! But if the geometrical relationships are made independent of the possible geometrical distortions, then the syntactical approach yields a natural grouping method in which the large number of possibilities becomes an advantage rather than a drawback. The pulsecoupled neural nets provide the invariances essential for syntactical pattern recognition. They do this in a suprising way. The features it uses are not features of the input pattern. Rather, they are features of the pulse code generated by the net when the image is presented to it. The simulations using a cross and a "T" shape illustrate this. The features are the pulse phase patterns, and they are syntactical: "Where does the bar cross the post?" The image itself no longer is used, only the syntactically derived periodic time signal. This serves as the input to a statistical pattern classifier, and the pattern it classifies is the phase structure of the time signal, not the image pattern. When a time-to-space mapping is also possible, the pulse-coupled neural network becomes more than a preprocessor. A spatial input IQ is first transformed into a time signal and then transmitted to another location where it is retransformed into a spatial distribution So again. The new pattern will not necessarily be the same as the original, but since the time signal had invariances encoded into it, the new pattern will also be invariant against the same distortions and so will be of reduced dimensionality in the sense of information content. The information that is lost is information about the disortions. The syntactical information about the geometrical input pattern is preserved, so the new pattern is an idealization or generalization of the original. Now suppose the pattern is again transformed into another time signal, transmitted, and made into a second spatial pattern Si. It will preserve the syntactical information of the preceeding pattern. As an example, consider the information about the scale of an input image. The first transform pair {IQ, SQ) is scale invariant with respect to the pulse phase pattern, but the amplitude of the time signal connecting them was proportional to the area covered by the image /Q, and so the amplitude of So still has an area dependence. However, the second transform {So,Si) will be invariant with respect to amplitude, as shown in the discussions
1. Pulse-Coupled Neural Networks
53
earlier, so Si will not depend on the original image area either by phase structure or by amplitude and will be completely independent of any scale effect in the original image. Each successive transform {S^ Sn-\-i) results in a more invariant pattern. If the time-to-space transform is poorly chosen, this could result in a final pattern that is invariant with respect to everything, including syntax. This is not desirable! On the other hand, it may be possible to choose a time-to-space transform that becomes stable yet still contains the fundamental syntactical information of the original image /Q. If so, then in the asymptotic limit the transform pair will become idempotent: SN = SN-^-I- This will be a point attractor, and all the distortions of /o that map to it will define its basin of attraction. It will be an idealized, or platonic, icon that represents the object itself rather than a view of the object. The existence of platonic icons is shown by this argument to be critically dependent on the choice of the time-to-space mapping. The repeated transformation process, however, will always make the resultant icon more and more invariant, and since it will always be an icon, there must always be at least some syntactical information in it. Now, whenever there is a spatial distribution in a net, it is possible to perform spatial operations on it via weighted receptive fields. Thus the repeated iconic transforms can undergo processing each time they are mapped to a spatial distribution, making the pulse-coupled neural net into a full processor rather than a preprocessor. Further, since each iconic transform is sequential in time, the system possesses causality. This leads to the view of a powerful processing system combining the capabilities of parallel and serial processing techniques, where information is transmitted as time signals and operated on as spatial distributions.
15
Concluding Remarks
This work begins with the Eckhorn linking field model and then investigates the new regime of weak linking to find the existence of time signals that encode spatial distributions in their phase structure. The signals are generally periodic. They are a signature for the image that generated them. They are a syntactical signature, made by the network itself, and its temporal features are features that are about the image, not in the image. The pulse-coupled nets are a general higher-order network that provide an object-specific and reasonably invariant time signature for spatial input distributions. Multiple time scales exist, and for each time scale at which a signature exists, the next time scale permits segmentation of the part of the image generating that signature. Conditions for perfect segmentation are given and verified through simulations. The time signal may represent a possible means of communication within the brain, a way to transmit and
54
J o h n s o n , R a n g a n a t h , K u n t i m a d , and Caulfield
receive information. It is analogous to the characteristic acoustic tone of a given musical instrument, in a sense bestowing a different "sound" on each distinct two-dimensional input image. The musical analogy is reinforced by the observation that pulse frequency harmonics are more stable against noise when hnked; i.e., the "harmony of thought" may be literally true [25]. The time signal can be transformed back into spatial distributions and operations performed on it, and these in turn generate another time signal to be sent to other processing areas of the brain. It reduces the basic problem of image understanding to that of correlation on an invariant time signal. Much research remains to be done, but the pulse-coupled model and its time signals are a significant step forward in the understanding of the brain. 16
REFERENCES
[1] R. Eckhorn, H. J. Reitboeck, M. Arndt, and P. Dicke, "Feature Linking via Synchronization Among Distributed Assemblies: Simulations of Results from Cat Cortex," Neural Computation 2, 293-307 (1990). [2] J. L. Johnson and D. Ritter, "Observation of Periodic Waves in a Pulse-Coupled Neural Network," Optics Letters 18 (15), 1253-1255 (1993). [3] J. L. Johnson, "Pulse-Coupled Neural Nets: Translation, Rotation, Scale, Distortion, and Intensity Signal Invariance for Images," Applied Optics 33 (26), 6239-6253 (1994). [4] M. Reiss and J. G. Taylor, "Storing Temporal Sequences," Neural Networks 4, 773-787 (1991). [5] R. Eckhorn, R. Bauer, M. Rosch, W. Jordan, W. Kruse, and M. Munk, "Functionally Related Modules of Cat Visual Cortex Show StimulusEvoked Coherent Oscillations: A Multiple Electrode Study," Invest. Ophthalmol. Visual Sci. 29 (12), 331 (1988). [6] R. Eckhorn, "Stimulus-Evoked Synchronizations in the Visual Cortex: Linking of Local Features into Global Figures?" In Neural Cooperativity, J. Kruger (editor). Springer Series in Brain Dynamics. SpringerVerlag, Berlin (1989). [7] J. L. Johnson, "Waves in Pluse-Coupled Neural Networks," Proc. World Congress on Neural Networks, Vol. 4, p. IV-299. INNS Press (1993). [8] R. Eckhorn, H. J. Reitboeck, M. Arndt, and P. Dicke, "A Neural Network for Feature Linking via Synchronous Activity: Results from Cat
1. Pulse-Coupled Neural Networks
55
Visual Cortex and from Simulations." In Models of Brain Function, R. M. J. Cotterill (editor), pp. 255-272. Cambridge University Press (1989). [9] R. Eckhorn and T. Schanze, "Possible Neural Mechanisms of Feature Linking in the Visual System: Stimulus-Locked and StimulusInduced Synchronizations." In Self-Organization, Emerging Properties and Learning, A. Babloyantz (editor), Plenum Press, New York (in press). 10] P. W. Dicke, "Simulation Dymanischer Merkmalskopplungen in Einem Neuronalen Netzwerkmodell," Inaugural Dissertation. Biophysics Department, Philipps University, Renthof 7, D-3550 Marburg (1992). 11] A. S. French and R. B. Stein, "A Flexible Neural Analog Using Integrated Circuits," IEEE Trans. Biomed. Eng. B M E - 1 7 , 248-253 (1970). 12] C. Giles and T. Maxwell, "Learning, Invariance, and Generalization in High-Order Neural Networks," Applied Optics 26 (23), 4972-4978 (1987). 13] C. Giles, C. Miller, D. Chen, H. Chen, G. Sun, and Y. Lee, "Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks," Neural Computation 2 (3), 393-405 (1992). 14] S. Grossberg, Studies of Mind and Brain, Reidel Publishing Company, Dordrecht, Holland (1982). 15] S. Grossberg and D. Somers, "Synchronized Oscillators During Cooperative Feature Linking in a Cortical Model of Visual Perception," Neural Networks 4, 453-466 (1991). 16] N. Farhat and M. Eldefrawy, "The Bifurcating Neuron," Digest of the Annual Optical Society of America Meeting, San Jose, CA, p. 10 (1991). 17] C. Giles, R. Griffin, and T. Maxwell, "Encoding Geometrical Invariances in Higher-Order Neural Networks," Proc. IEEE 1st Int. Neural Inf. Proc. Syst. Conf., Denver, CO, p. 301 (1987). 18] N. R. Pal and S. K. Pal, "A Review on Image Segmentation Techniques," Pattern Recognition 26 (9), 1277-1294 (1993). 19] J. L. Johnson, "Globally Stable Saturable Learning Laws," Neural Networks 4, 47-51 (1991).
56
Johnson, Ranganath, K u n t i m a d , and Caulfield
[20] I. Daubechies, "The Wavelet Transform, Time-Frequency LocaUzation, and Signal Analysis," IEEE Trans. Inf. Theory IT-10, 961-1005 (1990). [21] S. Mallat, "Multiresolution Approximations and Wavelet Orthonormal Bases of L 2 ( R ) , " Trans. Am. Math. Soc. 3 (15), 69-87 (1989). [22] C. K. Chui, An Introduction to Wavelets^ Academic Press, Boston (1992). [23] H. J. Caulfield and H. H. Szu, "Parallel Discrete and Continuous Wavelet Transforms," Opt. Eng. 31, 1835-1839 (1992). [24] Keiji Tanaka, "Neuronal Mechanisms of Object Recognition," Science^ 262, 685-688 (1993). [25] F. H. Rauscher, G. L. Shaw and K. N. Ky, "Music and Spatial Task Performance," Nature 365, 611 (1993). [26] J. L. Johnson, "Pulse-coupled neural networks," SPIE Critical Review Volume CR-55, Adaptive Computing: Mathematics, Electronics, and Optics, S. S. Chen and J. H. Caulfield (Eds.), pp. 47-76, Orlando, FL, 1994. [27] H. S. Ranganath, G. Kuntimad, and J. L. Johnson, "Pulse-Coupled Neural Networks for Image Processing," Proc. IEEE Southeastcon 95, IEEE Press, Raleigh, NC, 1995