Parallel implementation and capabilities of entropy-driven artificial neural networks

Parallel implementation and capabilities of entropy-driven artificial neural networks

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 14, 286-305 (1992) Parallel Implementation and Capabilities of Entropy-Driven Artificial Neural Ne...

2MB Sizes 8 Downloads 61 Views

JOURNAL

OF PARALLEL

AND

DISTRIBUTED

COMPUTING

14, 286-305 (1992)

Parallel Implementation and Capabilities of Entropy-Driven Artificial Neural Networks TOM TOLLENAERE Laboratorium

voor Nemo- en Psychobsiologie, Katholieke Universiteit te Leuven, Campus Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium

MARC M. VAN HULLE Departement

Electrotechniek, Divisie ESAT, Katholieke Universiteit te Leuven, Kardinaal Mercierlaan 94, B-3001 Heverlee-Leuven, and Laboratorium voor Neuro- en Psychofisiologie, Katholieke Universiteit te Leuven, Campus Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium

Belgium;

AND GUY

Laboratorium

voor

A. ORBAN

Neuro- en Psychofisiologie, Kathoiieke Universiteit te Leuven, Campus Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium

It is shown how the recently introduced Entropy-Driven Artificial Neural Network Model (EDANN) can be implemented on a parallel transputer array, using a simulation environment which makes all decomposition and mapping issuestransparent. Then, using the parallel simulator, the EDANN’s capabilities are exemplified in the caseof orientation extraction from retinal images. By means of simulations on the parallel machine, it is shown that the EDANN is able to adapt itself optimally to the stimulus it receives, and that the same network topology is able to accomplish both 1-D and 2-D orientation inference tasks. 8 1992 Academic PESS, IllC.

1. INTRODUCTION

To a visual system, curves and boundaries are important characteristics of structure, since they support not only the segregation of objects in a visual scene, but also their recognition. Curve and boundary detections based on contrast information are not easy tasks for artificial vision systems and, up to now, there is no final solution to these tasks. In computer vision, detection is traditionally divided into two steps: (1) the application of local line-detection operators to extract orientation, and possibly curvature from an image, and consequently, (2) the process of fitting global, analytical functions to the responses of these operators, so as to correct them. In natural vision, by analyzing the structure and function of biological visual systems, researchers have gained insights which are important for the design of artificial vision systems. In particular, this research work has had an 286 0743-7315192 $3.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

important contribution to the development of detection operators for step 1, and, in general, to the development of massively parallel network architectures called Artificial Neural Networks (ANNs). Following this lead we have, in a previous article, proposed an ANN model, which shares certain structural and functional features of the visual cortex of monkeys and cats. The laminar and modular organization of the visual cortex has inspired the construction of the Entropy-Driven Neural Network (EDANN) model [31]. The initial EDANN model implemented a two-step detection process of contour orientations. In the first step, orientation information is extracted from a retinal image, using local line-detection operators, the representation of which becomes adapted and improved in the second step. The EDANN model has been generalized to process visual parameters other than orientation. An EDANN now serves as a building block of a more complex, modular system, in which every module is an EDANN, processing a particular set of visual parameters, such as, e.g., orientation of contrast and texture edges, direction and speed of motion, or wavelength. The hierarchical structure of this complex system is inspired by the neuroscientific evidence. The detailed description of the generalized model is beyond the scope of this article (cf. [32] for more details). Since even a single EDANN is a large scale network model, considerable computation power is required to simulate it. Parallel computers can offer this power, and are now available at a reasonable price. In addition, neural networks have always been a favorite application on

PARALLEL

IMPLEMENTATION

parallel machines, since the neural network paradigm of parallel and distributed computing fits perfectly onto the structure of such machines [5, 6, 10, 18, 221. Notwithstanding, parallel computers have not gained wide acceptance among ANN researchers. The main reason for this is the fact that parallel programming is still seen as an insurmountable task for noncomputer scientists. This gap may be bridged by providing good programming environments, tailored to meet the needs of ANN researchers, and hiding all parallelism. In this article, we present the results of our work toward implementing the EDANN model on a transputer array and its use for edge-detection tasks. The article starts with an overview of previous work: Section 2 briefly outlines the theory of generalized EDANN networks, a detailed description of which can be found in [30, 31, 321, and Section 3 summarizes our approach toward the development of a simulation environment for EDANN networks on transputer arrays [28, 291. Following this overview, in Section 4 the numerical aspects of simulating the EDANN model are explicated, and it is demonstrated how they are encoded into the simulation environment. Then, in Section 5, the capabilities of the EDANN model are shown for the case of orientation extraction, by means of simulations obtained on the parallel machine. The article concludes with a discussion on both the implementation and the capabilities of EntropyDriven Artificial Neural Networks. 2. ENTROPY-DRIVEN

ARTIFICIAL

NEURAL

NETWORKS

An EDANN consists of three layers: Dl, D2, and C (Fig. 1). Dl is the input layer: Dl neurons take their input either from the “retina” (i.e., the input image the modular network “sees”) or from output neurons of EDANNs somewhere lower in the system’s hierarchy. The activations of Dl neurons reflect the presence of certain visual parameters in the stimulus: they may be sensitive to, e.g., orientations of edges, or wavelength. In addition to these feedforward inputs, Dl neurons also receive feedback input from neighboring neurons. The differential equation describing the state dynamics of the Dl layer is

in which cp(c, t) is the activation of a neuron with time constant 7, at locus 5 and time t, and v is the neuron’s output: the output is a nonlinear monotonically increasing function of the neuron’s activation. qi is the input into the Dl neurons, called the interlayer contribution; 0 denotes spatial convolution and ~~(5, 5’) denotes the connections within the layer Dl through which Dl neurons receive their intralayer inputs. Hence, ci([, 5’) 0 v(t’, t)

AND CAPABILITIES

OF EDANNS

287

FIG. 1. The structure of an EDANN network. Layer Dl receives input from the retina, and D2 receives input from Dl. The flow between Dl and D2 is modulated by neurons in layer C. Layer C receives input from layer Dl , and an entropy production from layer D2.

denotes the intralayer contribution to the Dl neurons. D2 is the output layer; its neurons receive input from Dl neurons, the efficiency of the input being modulated multiplicatively by neurons from the intermediate control layer C. The purpose of this modulation is to control the parameter selectivity of D2 neurons by changing their tuning bandwidth. Note that this is nor a gain control [31], as is explained in Fig. 2 in the case of orientation extraction. In order to perform selectivity control effectively, the parameter selectivity of the C neurons must be orthogonal to those of the D2 neurons they modulate [31, 321. C layer neurons receive feedforward input from layer Dl, detecting configurations of activations in Dl, and feedback input from D2, representing the “thermodynamic entropy production” of layer D2 activations. The basic operation of the EDANN is as follows: the D2 neurons are driven by the direct input from Dl neurons which, in turn, are driven by the input into the EDANN (i.e., the “retina” in the case of the lowest-level module). A change in the input results in a change in the activation of Dl and subsequently of D2 neurons. The latter induces a change in entropy production, which may in turn lead to a change in the D2 neurons’ selectivity by the control mechanism. Thus the change of connection strengths is, unlike, e.g., the weight changes in the training cycles of back propagation networks, not a permanent learning effect: it is adaptive in this respect. This control mechanism is modeled as follows: let C be the thermodynamic entropy production, defined as

m = I,,?45, tM5,

(2)

45, t>= Wy(‘5,N24r(5, m*

(3)

where

288

TOLLENAERE, relative magnitude

1.0, 0.9 _ 0.8 _ 0.7 _ 0.6 _ 0.5 _ 0.4 _ 0.3 _ 0.2 _ 0.1 _ o-

, -60

,

, -40

VAN HULLE,

1 I 1 1 I I 1 1 , -20 0 20 40 60 orientation (degrees)

AND ORBAN 1.0, 0.9 _ 0.8 _ 0.7 _ 0.6 _ 0.5 _ 0.4 _ 0.3 _ 0.2 _ 0.1 _ o- ,

, -60

,

, -40

,

,

,

,

I

-20 0 20 40 orientation (degrees)

I , 60

FIG. 2. Modulating tuning curves. The neuron’s tuning curve arises from the tuning of its response to an optimal value of a parameter such as orientation. Shown are orientation tuning curves tuned to orientation 0. (A) Shows the effect of gain control: stippled curve is changed into the dotted one when gain increases. (B) Shows the effect of selectivity control: the default curve (stippled) is changed into the dotted one (sharpening due to shunting inhibition), or changed into the solid one (broadening due to on-path facilitation).

in which A > 0 is a constant, and y is proportional to the activation (~(5, t) in D2. yn2 is an “area” in D2 over which to integrate. Then the state equation of C neurons becomes

where S([, t) denotes the activation of a C neuron at locus 5 and time t, rc is the C neurons’ time constant, and X is the input coming from Dl. (.) denotes the (time) average of its argument; CYand p are functions depending on entropy production. The state equation for the D2 neurons resembles the one for Dl neurons, except for its input 772: 772= s*c5, t> . Y(5, t)

(5)

and

in which Y is initially homeomorphic with the activation distribution in Dl, and S * is the synaptic weight function, which depends both on the default configuration, i.e., the interlayer connections from Dl to D2, denoted by $, and the activations of the C neurons; S(t, r) represents the (time) averaged activation of the C neuron at locus LJand time t. Adaptation of the default tuning bandwidth may physiologically be implemented by shunting inhibition [17]. In our network model, shunting inhibition is modeled by setting 6 < 0, which cause sharpening of the default tuning curve (Fig. 2). The reverse operation is called on-path facilitation and is modeled by 6 > 0. This causes a broadening of the default tuning curves, as seen in Fig. 2B.

A modular EDANN network consists of a hierarchy of three-layer networks described above. Figure 3A shows module i in such a hierarchy. The selectivity control mechanism in module i of such a modular system is not only mediated by feedback (entropy production) from D2’ but also by feedback from D2 layers in higher-level modules (Fig. 3B). 3. A SIMULATION

ENVIRONMENT NETWORKS

FOR MODULAR

Simulations of neural networks on message-passing multiprocessors are usually done as follows: neural networks are often considered as two-dimensional grid problems (cf. [l, 2, 81. Geometric decomposition divides the network in as many parts, or jobs, as there are processors, and neighboring jobs are mapped onto neighboring processors. If neurons connect only to neighboring neurons, communication between processors is needed only to update the neurons on the “border” of a job, and local communications lead to efficient parallel programs. For our hierarchical EDANN network, the situation is somewhat different. We have shown in [29] that for a particular modular, hierarchical network, one can always successfully apply a particular decomposition and mapping strategy, but there is no decomposition and mapping strategy optimally applicable to every modular network. Therefore, since the decomposition and mapping for any modular EDANN network is not a priori known, a general simulator for such networks should be capable of handling any decomposition and mapping strategy. We have shown that this is possible by performing the decomposition and mapping off-line [29]. By providing a programming environment for the parallel machine, which makes all parallelism, and hence all mapping and

PARALLEL

IMPLEMENTATION

Module

A

AND CAPABILITIES

1

Module

OF EDANNS

2

289

Module

3

Input

FIG. 3. Modular EDANN networks. (A) Outline of a single EDANN network, called a module; shown here is module i. (B) A hierarchy of EDANN modules. Note that within the hierarchy, feedback is modularly organized: entropy production is fed back from D2 layers of higher level modules. and nor between individual neurons.

decomposition issues, transparent to the programmer, a general simulation environment for modular EDANN networks can be constructed. We have shown [28] how such a simulation environment can be implemented on a transputer array. A transputer has four communications links, hence it is easy to configure a transputer array as a two-dimensional grid. We have shown that for the applications we are interested in, a simulation environment such as the one described above runs efficiently [27] on transputer arrays configured as a grid. The simulation environment we have developed for our simulations consists of three parts: a compiler, a splitter, and a parallel simulator. An overview of this system is shown in Fig. 4. In summary: the compiler produces network data structures, which are decomposed and mapped for simulation onto a physical machine by the splitter, and finally the decomposed network is read in and simulated by the parallel simulator. We have developed a special-purpose language, named i [26, 28, 291, in which (EDANN) networks and their interconnections can be described at a high level. Interconnections are specified by moving interconnection templates over windows in source and target layers. An interconnection template is a generic set of interconnections for one neuron. By moving templates, all neurons in the target window get a dendritic tree as specified by the template, and this dendritic tree connects neurons in the source window to neurons in the target window. For examples of templates cf. infia. The exact design of the splitter and simulator has been described elsewhere [28]. In this article we elaborate on

how the networks’ state equations are programmed into the simulator. Therefore we need to understand the overall structure of the parallel program: we make abstraction of both implementational details and the physical structure of the machine. The parallel simulator is actually a programming environment into which state equations of the EDANN model can be incorporated. Basically, it consists of a number of worker processors, controlled by a master

I-

B

4

disk

processor

n

FIG. 4. Overview of the simulation environment. The i compiler translates an i network description into an object network. Object networks are transformed into load networks by the splitter, after which they can be loaded into the parallel simulator.

290

TOLLENAERE,

VAN HULLE,

AND ORBAN

processor. Every worker is responsible for one or more “parts” or jobs of one or more EDANN modules. The structure of the process running the worker processor is while

(running = TRUE) { exchange-neighbor-information do-update-step () ;

Modular

Network

() ; a

b

> in which the variable running is controlled by the master processor. The exchange of neighbor information is handled by the code in a completely transparent way by the simulator kernel, which is embedded into exchange-neighbor-information (1 The programmer who wants to implement or alter the code to solve a particular network model’s state equations only has to write the code to perform the update steps in an iteration process. Therefore, a number of abstract data structures (ADTs) are at his/her disposition. As long as the programmer uses the ADTs taking into account the definitions of their interfaces, the code will transparently take care of decomposition and mapping, synchronization, consistency, exchange of information between neighbors, etc. We now outline the ADTs the programmer has access to, and as an example, we describe how the input into a layer is calculated. In the next section, we describe how the ADTs can be accessed to solve the network’s state equations. Basically, there are three types of ADTs: parts, interconnections, and workspaces. Parts contain neuron activation values for pieces of the networks. There are two kinds of parts: private and public (cf. Fig. 5). The public parts contain “border information” which is to be communicated between processors. The private parts contain the neuron activation values calculated by the worker processors. The values of neurons on the border of a job are calculated in a private part, and then copied into public parts for communication. Hence, every public part is available on two processors: in the example in Fig. 5, public part j is calculated as the right-hand border of part a on processor 1, and it is sent to processor 2, where it is used to calculate the input into the neurons in private part b. Interconnections between neurons: Every processor is responsible for the calculations within its private parts. In order to perform these calculations it needs to know the connections between neurons. All the connections within a private part are stored in private interconnection ADTs. Connections which extend over processor boundaries are stored in public interconnection ADTs. In the example of Fig. 5, if we assume that all connections are bidirectional, then processor 1 needs interconnections f, which connect the neurons on the left edge of part b to the neurons on the right edge of part a. In order to calcu-

Pri,,ate

oihi”.

d Pri~wre

Public 4 Public Corviecriori.s

C

Pril,afe

FIG. 5. Some of the ADTs built by the splitter. In this example, the splitter performs geometric decomposition of the network over four processors. The network is split into four private parts, labeled a,b,c, and d, which are placed on processors 1,2,3, and 4, respectively; the public parts have been labeled j,k,l,m,n,o,p,q,r,s,t, and u; and the publit interconnections have been labeled e,f,g,h, and i. The parts have been separated from the connections by a thick broken line.

late the inputs into the neurons on the right edge of part a, processor 1 must consult part k, which contains the neuron activations of all the neurons in part b which connect to neurons in part a. Similarly, processor 2 also needs interconnections f, because of the symmetry of the interconnections. Workspaces are labeled blocks of memory, needed to perform the calculations for the neurons’ state equations. In the example of Fig. 5, every processor is responsible for 16 neurons. This means that every processor needs to allocate memory for the calculations of 16 neurons. The size of this memory will always be a multiple of the number of neurons in the private part(s), in this case multiples of 16. The private parts have a default workspace: for the Dl and D2 layers, the private parts are connected by default to the workspaces containing the neuron’s output; for the C layer the private parts are connected by default to the workspaces containing elementary entropy production values (cf. infra). The number of workspaces can be adapted by the user. For example, if for every neuron we may want to store the interlayer input, the intralayer input, the neuron activation, and the neuron

291

PARALLELIMPLEMENTATlONANDCAPABILITlESOFEDANNS

output, we would need four workspaces. The following section elaborates on the use of workspaces for the calculations of EDANNs. The following is the code to calculate a particular type of input into a particular workspace:

do-input-layer(w,ctype,layer) workspace w; connection-type ctype; layer-type layer; { interconnection ic; synapse s; for (ic loops over all interconnections) if (connection-type(ic) = ctype) and (target = w) if (sourcelayertype(ic) = layer) { part ip; ip = source-part(ic); for (s loops over synapses in ic) w[target(s)] += ip[source(s]]* weight(s); 1 1

for layer C. Hence, these differential equations become sets of partial difference equations-one set of each layer and one equation for each neuron-and written as a function of time and the two spatial coordinates. We solve these three sets of difference equations by means of the forward Euler rule (forward differencing rule). This rule is computationally simple and easy to implement on a parallel machine, but needs small time steps in order to converge. This is not disadvantageous, and since we want to monitor network dynamics, the use of a more powerful rule, e.g., one which immediately gives us the stable state of the network, is not appropriate for our purpose. In this section we outline the code we use to simulate EDANNs. First we present the code, which makes extensive use of workspaces, without any reference to parallelism. When all aspects of the code have been explained, we discuss the consequences of running the code on a parallel machine.

4.1. Layer Dl In the parameter list of this code, w is the workspace into which input of type c type is calculated. Ctype can take three values: intra, inter, or entropy, denoting the type of connections to be evaluated. Layer denotes the type of layer the connections in question must originate from. To calculate the input into a layer, all connections to that layer are considered (outer loop). If an interconnection ic is of the appropriate type, a loop over all synapses in that interconnection is performed (inner loop). Every interconnection connects a part sourcepart (ic) , either private or public, to workspace w. Interconnections consist of a number of synapses s, and every synapse connects a source neuron source (s) to a target neuron target (s) by a weight weight (s) . Hence the code for do-input-layer () merely sums all inputs-entropy, inter-, or intralayer, depending on the value of parameters ctype and layer-into a particular workspace.

4. PARALLEL

Let us first look at the state equations for layer Dl. If we discretize Cp([, t) as (~(5, t + At) - (~(5, t>)lAt, and substitute this in Eq. (l), we get

(p(5

5

t + At) = (~45,

t> + cd-E, 5’) o 45’, 7

0 - (~(5, t))At

+ cp@, t).

From now on, we set r = 1 without loss of generality. Equation (7) is implemented as follows: every processor that is responsible for a layer Dl job of an EDANN module creates four workspaces, called Dl-input, Dl-intra, Dl-act, and Dl-output. Dl-input is used to store the interlayer input into the Dl layer, and Dl-intra contains the intralayer input into Dl neurons. Dl-act contains the activations of the Dl neurons, and Dl-output stores the outputs of those neurons. The code to simulate the Dl state dynamics has the following structure:

IMPLEMENTATION

For a particular EDANN, a system of three continuous differential equations, one for each layer, is to be solved. These continuous equations need to be discretized not only in time (processing steps), but also in space (over neurons). Indeed, the interconnections between neurons in (1) yields a spatial discretization of the intralayer contribution in the Dl and D2 state equations; the discretization of the interlayer contribution is straightforward, also

(7)

declare declare

default workspace workspace Dl-input,

Dl-output; Dl-intra,

transfer(wl,w2,gain,bias) workspace wl,w2; real gain,bias; { index i; for (i loops over wl) w2[i] = l/(1 + exp(-gain

* wl[i]

Dl-act;

+ bias));

292 update-D-layer(balance,w-inter,w-intra,w-act,dt) workspace w-inter,w-intra,w-act; real dt,balance; { index i; for (i loops over w-inter) w-act[i] = ((l-balance)*w-inter[i] w-intra[i]) * dt + w-act[i] * (lI simulate-D10 { do_input_layer(Dl_input,inter,D2) do-input-layer(Dl-intra,intra,Dl); /* neighbor contributions */ update~D~layer(al,Dl~input,Dl~intra,Dl~act,dt); transfer(Dl~act,Dl~output,gl,bl); I

TOLLENAERE,VANHULLE.ANDORBAN

+ balance* dt);

; /*

input

do-D2input_layer(D2-input,C_act,delta) workspace D2_input, C-act; real delta; { index i; do_input_layer(D2_input,inter,Dl); /* default input */ for (i loops over D2-input) D2_input[i] += D2_input[i]*

(delta

* C-act[i]);

*/ simulate-D20 { do-D2input_layer(D2_input,C_act,delta); do-input-layer (D2_intra,intra,D2); /* neighbor contributions */ update_D_layer(a2,D2_input,D2_intra,D2_act,dt); transfer(D2_act,D2_output,g2,b2);

The function do- input-layer ( ) is used to calculate the inter- and intralayer inputs. Note that the input into a Dl layer must originate from the “retina” or from a layer In this code, we use the parameters a2, b2, and 82, D2 of a module at a lower hierarchical level (recall Fig. indicating the ratio, threshold, and gain for D2 layer neurons, respectively. dt serves the same purpose as in the 3B). previous subsection. In addition, we use a parameter The transfer () routine simply applies a nonlinear input/output function to the neuron activations to obtain de1 ta, which corresponds to 6 in (6). the neuron outputs. The i/o function used here is the classic sigmoid function output = l/(1 + ~~~~~~~~~~~~~~~~~~~~ The parameter gain determines the “steepness” or de4.3. Layer C gree of nonlinearity of the i/o function [23]. The routine Update-D-layer ( ) implements the Finally, we come to the entropy production calculation actual Euler rule, assuming the appropriate values are and the state equation for the C neurons. In [31], as well available in workspaces Dl-act , Dl-input, and as in Eq. (2), it was only indicated that the entropy proDl-intra. Recall that the time constant for the neurons duction measure is to be a “global” measure, i.e., an is 1. The parameter balance, ranging between 0 and 1, integral over an area Ym . In this implementation, in condenotes the ratio between inter- and intralayer contributrast to the simulations reported in [31], we use a semilotions. In summary, the parameters controlling the layer cal entropy production measure: the integral syDZin (2) is Dl simulation are: al, the inter-/intralayer ratio; bl, the discretized as a sum of elementary entropy production bias for layer Dl neurons; gl, the gain for layer Di neu- contribution, denoted by or, taken over the area YnZ ; in rons; and dt, the simulation time step for layer Dl neu- addition, oe are discretized versions of m in (3). Hereto, rons, i.e., the ratio At/r in (7). each C neuron first calculates or, based on the activations of a restricted number of D2 neurons, and using connections from layer D2 to layer C. Then, for each C neuron, 2 is calculated by summing these individual (T,‘s 4.2. Layer 02 over a pool of surrounding C neurons, using layer C interLayer D2 is handled in a very similar way. The only connections. This way, every C neuron is not only condifference between Dl and D2 neurons lies in the way nected to Dl neurons (cf. X(5, t) in (4)) and to a number their input is calculated. Indeed, from (5) and (6) it can be of D2 neurons, but also to a pool of neighboring C neuseen that the total input into D2 depends both on the rons. default input into D2 and the activations of the C neuSeveral entropy production implementations are possirons. If we assume a workspace C-act has been deble; since we intend to present images to our network, we clared elsewhere (cf. infra) and assume it contains the assume that all layers are at least spatially two dimenactivations of the C neurons, the code for the D2 neurons sional. Let Z(x, y, t) be the output of a D2 neuron on becomes locus (x, y) at time t; we discretize X * (V-y@, t))*l(y@, t>)* = A . (V ln(r@, t))2 and set A = 1, without loss of generality, so that the elementary entropy production centered declare default workspace D2-output; declare workspace D2_input, D2-intra, D2-act; around (x, y) becomes

PARALLELIMPLEMENTATIONANDCAPABILITIESOFEDANNS

2(x - 1, y, t)Z(x + 1, y, 0 + (TAX, y, l> = i( l* Z(x,Z(x, y Y 1,+ tP(x, 1, oy y 1, 0 11

2

= (In Z(X - 1, y, t) + In Z(x + 1, y, 0 + In Z(X, y - 1, t) + In Z(x, y + 1, t) -4 In Z(x, y, t))2 and the semilocal thermodynamic comes

entropy production be-

where YPD2(x,y) denotes the “area” in C, defined by the intralayer connections of the C neuron at (x, y), Note that only derivatives to spatial parameters (X and y) are considered. Basically, to calculate (8) every C neuron at (x, y) receives inhibitory input from the D2 neuron at (x, y) and excitatory input from the D2 neuron’s four nearest neighbors. The template can be found in Table I. Following Eq. (4), we set CY= 1 and /3(x, y, t) = power * C(x, y, t), so that the code becomes declare default workspace /* communicate sigma-e's declare workspaces C-act,

C-entropy; */ C-intra,

C-input;

do-input-layer-in(w) workspace w; { interconnection ic; index i; synapse s; for (ic loops over all interconnections) if (connection-type(ic) = entropy) (target = w) if (sourcelayertype(ic) = D2) { part ip; ip = sourceqart(ic); for (s loops over synapses in ic) w[target(s)] += (ln(ip[source(s)l)* weight(s) ;

and

I for

(i w[i]

loops over = w[i]*w[il;

w)

update-C-layer(w-inter, w-intra, w-act, power,tc,dt); workspace w-inter,w-intra, w-act; real tc, power,dt; index i; for (i loops over w-inter) w-act[i] = power * w-inter[i] * w-intra[i] * dt / tc + w-act[i] * (1 - dt/tc);

293

simulate-CO { do-input-layer-ln(C-entropy); /* get do-input-layer(C-intra,intra,C); /* sum sigma's */ do-input-layer(C-input,inter,Dl) ; /* input from Dl */ update_C_layer(C_input,C_intra,C_act,power,tc,dt); I

sigma's

*/

The semilocal entropy production is calculated in workspace C-intra, and is for every C neuron the result of a summation over its intralayer inputs. The exact extent of the area 9pD2is specified by the size of the C neuron’s intralayer template. To understand that this can indeed be done by do-input-layer () we should recall that, for the C neurons, the workspace the private parts are connected to is C-entropy. This means that for the C layer, elementary entropy production values are exchanged between processors. Recall that the C layer also gets input from Dl, the calculation of which proceeds in the standard way. The update rule for C neurons is forward Euler. The parameter power scales the contribution of 2, and the parameter tc indicates the time constant for C neurons. The control mechanism can only start improving D2 results when these are available. Therefore the feedback effect must run at a slower speed than the rate at which D2 converges, hence the time constant for C neurons must be larger than that of the Dl and D2 neurons (cf. also discussion section). Since the time constants for Dl and D2 neurons were taken as 1, tc represents a factor by which the C time constant is scaled (cf. also simulation results, next section). Finally, note that the C neurons’ i/o function is the identity function, so C-input can be used as the output.

4.4. Parallel

Update Scheme

In a parallel implementation, physical communications between processors will be needed. Due to the fact that layers are decomposed and mapped onto different processors, and since templates may extend over job bor-

TABLE I The Template Used to Calculate the Entropy Production Components a, we template 0.0 1.0 0.0

1.0 -4.0 1.0

0.0 1.0 0.0

294

TOLLENAERE,

VAN HULLE,

ders, communication steps are needed to calculate interand intralayer contributions. We claimed that our simulation environment makes all parallelism transparent. To demonstrate this, let us follow the flow of calls in the code, and analyze between which calls processor communications should occur. The flow of calls is depicted in Fig. 6. Note that a processor communication step, i.e., a call to exchange -neighbor-information () exchanges all public parts within the entire system. This implies that the simulator performs both the interlayer and the intralayer communications simultaneously. The Dl layer is simulated first. Assuming that the input into the EDANN module is available, the first call (do-input-layer () ) does not give rise to any problems. The second call, however, calculating the intralayer contributions to the input of Dl neurons, causes a problem: communications are needed for connections that fall over edges of private parts. However, if we initialize all workspace values to zeros, the very first calculation of the intralayer contribution for Dl neuron inputs will be zero. If we then calculate the Dl neurons’ activations for the first time, and transfer these activations into their output values, the first layer simulation has started correctly. Indeed, by the next call of simulate-D1 () , a communication will have happened, and the simulator kernel will have communicated the private parts, needed to calculate the intralayer input. After the first call to simulate-D1 ( ) , we want to

AND ORBAN

calculate the input into C and/or D2. However, this is only possible after a communication step, since the C and the D2 neuron input templates may stretch over processor boundaries. Hence the simulation of C and D2 should only start after their inputs are available on every processor, i.e., after the first communication step. There is very little point in calculating the input into C at this stage, since to calculate the C neurons’ activations we also need some feedback from D2. Hence we first initialize D2 in a sensible way. Therefore we need the input in D2, which depends on the C neurons, the activations of which we do not yet know. However, if parameter 6 were to be zero, we could calculate the D2 input directly, since for this case the D2 input would simply be determined by the default configuration: after the first communication step, the routine do-D2input-layer () is sure that the private parts needed to perform the call to do-input -layer ( ) contain the appropriate values, calculated in Dl . The workspace C-act has not yet been updated, and hence contains zeros, implementing the effect of 6 = 0. The argument we made about the Dl intralayer contributions also holds for the D2 intralayer contributions, hence we can safely calculate the D2 neuron activations and outputs in simulate-D2 () . Thus far, we have performed one update for DI ; a processor communication step, providing the input for D2; and an update step for D2, which is followed by the second processor communication step. At this point, the C neurons have all the information necessary to calculate

simulation I

D2

;

D1

!lII2][31I4]:

clock

j

0

2

1 3

: :



:

’ !ilI

i-l

n l-l l-l q! start-up

1

I

i+l

i+?

:

i+l

i+2

i+3

:1

l-l l-l l-l i

running

:

41

communication

t+1

time

d-

FIG. 6. Gantt chart of the calculation/communications in the parallel code. The clock represents interprocessor communication. It can clearly be seen that the system comprises a four-stage pipeline; the entropy production calculation runs three clock ticks behind the DI layer calculations. The arrow indicates the fact that update step i for the C neurons uses the information of update i + 2 of the DI neurons. The discrepancy of 2 instead of 3 is due to the backward order in the code to simulate a network as explained in Section 4.4. For an evaluation of the validity and consequences of this update order, cf. Section 6. I.

PARALLEL

IMPLEMENTATION

v,(x, y, t). In order to calculate (9) another communication step is needed, since the area YPDZ(x,y) over which to integrate may extend over different processors. This means that the code in simulate-c () must be interrupted after the first call, to perform a communication step. Now, since the feedforward input from Dl into C was already available after the very first processor communication step, we have all the information we need to calculate the C neurons’ activations. From this point onward, inputs into D2 will be tuned by the multiplication delta * C-act [index] in do-D2input-layer (1. The sequence of operations constitutes a pipeline (running mode in Fig. 6); we need a number of updateicommunication steps before all neurons have been properly updated (startup in Fig. 6). This gives rise to the following code:

do-update-step0 if (TIME-STEP do-input-layer( do-input-layer( update-C-layer( 1 if (TIME-STEP do-input-layer-ln( if (TIME-STEP simulate-D2( simulate-Dl( I

{ > 3)

{ __.

); ); );

/* /*

sum sigmas input from

*/ Dl

*/

> 2) 1; >

1) ); 1;

Careful analysis of this code will reveal that this pipeline performs the operations in the order described above. Assuming that the simulator kernel keeps the value of the variable TIME-STEP up-to-date, the reader can easily verify this. For a discussion on the consequences and the validity of this update order we refer the reader to Section 6.

5. CAPABILITIES

OF EDANN

AND

CAPABILITIES

OF

295

EDANNS

and hence is situated at the lowest level in a hierarchical network-consisting of a three-layer network as described above, receives input from a retina sized 20 by 20 pixels. Every layer in the EDANN module has three dimensions. The first two dimensions are spatial, and are organized retinotopically; i.e., nearby positions in the retina correspond to nearby neurons in these dimensions. The third dimension is the orientation dimension, and is discretized into four regions centered around 0, 45, 90, and 135 deg. To each of these regions corresponds one sublayer, called a leaflet (cf. also [31]). A particular neuron in a leaflet then has a coordinate (x, y, 0) with x and y the two spatial coordinates and 8 one of the four central orientations (see Fig. 8). Layer Dl. The receptive fields of the Dl neurons are discretized Gabor functions. Daugman [3] has shown that such a receptive field type can extract orientation information. In addition, it is known to be physiologically plausible [21]. A Gabor function is a two-dimensional Gaussian, modulating an oriented sine or cosine. Figure 7 shows an example of a 90 deg Gabor function, and Table II presents a discretized version. This discretized Gabor function constitutes the weights of the 2.5 connections in the receptive field of the Dl neurons for the 90 deg orientation. Each of the four Dl leaflets consists of 20 by 20 neurons, hence the Dl layer consists of 1600 neurons. For example, neuron (3,5, 135) of layer Dl receives input from the 5 by 5 area around pixel (3, 5) on the retina, and is sensitive for orientations centered around 135 deg. Within the Dl layer, we place some cross-orientation inhibition connections [20]. In particular, every neuron (x, y, 0) is connected to neurons (x, y, 0 + 45) and (x, y, 6’ - 45) by a connection with strength -0.25, and to (x, y, @) by a connection with strength -0.5, wherein 8’ denotes the orientation orthogonal to 19(see Fig. 8). This cross-orientation inhibition is known to perform a winner-take-all mechanism [25], and is used to increase contrast [4] in the responses of the Dl layer across the orien-

NETWORKS

This section is organized as follows: first we build an example EDANN, to a large extent similar to the one used in [31], with which to perform edge detection in images. A detailed overview of network topology and connections is given, with focus on reproducibility. Then a number of simulation results are presented, which were obtained using the parallel simulator described above. Finally, a parameter strategy is developed. 5.1. Example

A lower-order EDANN module for orientation extraction-i.e., a module which receives input from the retina,

TABLE II A Two-Dimensional Gabor Function, Discretized as a 5 by 5

Template, from (x - 2, y - 2) to (x + 2, y + 2) in the Center Portion of the Function Shown in Fig. 7 Gabor template -0.153421 -0.291399 -0.356134 -0.292141 -0.154303

-0.082696 -0.171072 -0.211399 -0.169053 -0.080295

0.548405 0.889303

1.048407 0.889303 0.548405

-0.080295 -0.169053 -0.211399 -0.171072 -0.082696

-0.154303 -0.292141 -0.356134 -0.291399

-0.153421

296

TOLLENAERE,

VAN HULLE,

AND ORBAN

Gon x

FIG. 7. A two-dimensional Gabor function. Neurons with this receptive field are sensitive to vertically oriented contrast edges in the image. A discretized version of this function can be found in Table II.

tation dimension. Finally, to prevent the occurrence of boundary effects, Dl is toroidally connected in any dimension, Layer 02 consists of the same number of neurons as layer Dl. The connections between Dl and D2 neurons are formed by a discretized 3 by 3 two-dimensional Gaussian. These connections simply convey a blurred vision of the Dl responses into the D2 layer. As was the case for Dl, D2 is toroidally connected. The intralayer connections in D2 leaflets implement a diffusion process performing a smoothing along the leaflet’s preferred orientation. For example, in the 45 deg leaflet, neuron (x, y, 45) is connected to itself by a connection with strength 0.4, to (x + 1, y + 1,45) and (X - 1, y - 1, 45) with strength 0.2, and to (X + 2, y + 2, 45) and (X - 2, y - 2, 45) with strength 0.1. Layer C. The C neurons receive input from the Dl layer. To implement selectivity control in the orientation domain, the preferred orientation of the C neuron must be orthogonal to that of the D2 neuron it modulates [3 1, 321: a C neuron at position (x, y, 13)in layer C is connected to Dl neurons (x, y, 8 + 45) and (x, y, 19 - 45) by a connection with strength 0.25, and to (x, y, P) by a connection with strength 0.5. This C neuron will then control the bandwidth of the tuning curve of the D2 neuron at position (x, y, 13)(see Fig. 8). The o, values are calculated by means of connections from D2 to C. To yield C, every C neuron sums we’s over

D2

,

7

Dl Y dimensions

x 8

f-

FIG. 8. Leaflet structure of EDANN module for orientation. Each layer consists of four leaflets, labeled by 0 values. Note that the C neurons of a particular leaflet have an orthogonal orientation preference to the ones in the corresponding Dl and D2 leaflets, as indicated by the small oriented bar next to each leaflet. In accordance with Fig, I, the neurons of the shaded C leaflet modulate the flow from the shaded D1 leaflet into the neurons of the shaded D2 leaflet.

PARALLEL

IMPLEMENTATION

AND CAPABILITIES

297

OF EDANNS

its 25 spatial neighbors. In order to do this, every C neuron is connected to its 25 neighbors with a connection strength of 1. 5.2. Simulation

I

I

FIG. 9. Input stimuli used in the simulations. The first stimulus represents two touching triangles, and is called “butterfly;” the second stimulus represents two triangles separated by a gap and is called “butterfly with gap;” the third stimulus is a noisy version of the “butterfly,” generated by disrupting every pixel in the original image by Gaussian noise with standard deviation 0.4. The fourth stimulus consists of horizontal and vertical lines, comprising a “grid.”

I

O&8

Results

The example described here results in a system comprising 5200 neurons, 1600 for each layer, plus 400 for the retina, and 126,400 interconnections, simulated on a 16processor transputer system. The network was decomposed and mapped onto the machine by means of stacking decomposition [29, 281. Every transputer was responsible for about 300 neurons and about 8000 interconnections. The various stimuli presented to the network are depicted in Fig. 9. For the network described above and subjected to the “butterfly” stimulus, Fig. 10 presents the activations of all leaflets in both Dl and D2 layers. It can clearly be seen that in layer Dl, the network reveals no activation for neurons receiving input from the center of the retina, and that this erroneous absence of activa-

45 de8

135 de8

90&8

. .. . . .. . . ..

.. : .. . . .. . . .. . ..

~ .. .. .. . ... ’ ?;. .. . . . a..

45

deg

45

C

deg

activation

4.5

.. ..’ ..* ‘?A. ..’ . ..,

deg

D1 + Cinput

FIG. 10. Result of a simulation of the network as described in the text. The “butterfly” was used as a stimulus. Shown are the activations in the Dl and D2 leaflets (see Fig. 8) ordered vertically from 0 to 135 deg. In addition, for the 45 deg leaflets, X, the input from Dl into C; Z, the entropy production from D2; and S, the C neuron activations are shown. The arrows in the 45 deg leaflets label cross sections (see Fig. 12). The parameters were dt = 0.3; gl = 2; g2 = 2; al = 0.1; a2 = 0.3; bl = 0.7; b2 = 0.4; delta = 1; tc = 0.5; power = 2. All values have been scaled between 0 and 1, except for the entropy production, which has been scaled between 0 and 150.

298

TOLLENAERE,

VAN HULLE,

activatic

D?

spatial dimension

x

(4.5 deEree leaflets

FIG. 11. The effect of feedback in the case of on-path facilitation. Both curves are horizontal cross sections in the Dl, D2 45 deg leaflets of Fig. 10. In Dl there is no distinct response, whereas in D2 the activation at the center of the leaflet is “blown up.” The left and right “hills” in Dl on the other hand are suppressed in D2. This qualitative difference between Dl and D2 activation distributions is solely due to the control mechanism.

tion is corrected in the D2 layer: in D2 the edges of the figure in the retina are extracted as a full “line”. In addition, in Fig. 10, the activation distributions are detailed for the 45 deg leaflets. The orthogonality tuning of the C neurons can clearly be seen: the C neurons in the 45 deg leaflet receive Dl input, which is tuned to 135 deg. The figure also shows that the control mechanism truly adapts itself to the stimulus: combining the Dl input and the entropy production yields the C neurons activations, which are significantly higher in the region to be filled-in in D2, compared to other regions.

45 deg

FIG. 12. Activation

distribution

AND ORBAN

The arrows in the 45 deg leaflets of Fig. 10 indicate cross sections in Dl and D2, and these are detailed in Fig. 1 I. This figure clearly shows the selectivity of the adaptation mechanism: only the central “hill” in the Dl activation is “blown up”; the other ones are suppressed. The simulation took 83 steps to converge, which resulted in an execution time of 7.359 s and a parallel efficiency of about 83%. In agreement with [8], we define parallel efficiency as the ratio of problem calculation time over total execution time. A complete analysis of the performance and efficiency of the parallel simulator can be found in [27]. The same network is now presented the stimulus “butterfly and gap.” In this case, there are four edges present in the retina, instead of two. As is apparent from Fig. 12, both Dl and D2 do not reveal any activity around their center regions. This clearly illustrates the selectivity of the filling-in done by the network. The gap is genuine, and not dependent on the choice of the parameter values: indeed, increasing the values of the parameters controlling the action of the C neurons (parameters power and de1 ta) will create spurious activations in D2, rather than filling-in the gap. We now illustrate the network’s performance in the presence of noise in the stimulus. Hereto, the butterfly was corrupted with Gaussian noise, and presented to the network used in Fig. 10 and 12. Figure 13 depicts the resulting activation distributions in Dl and D2 leaflets. For this stimulus, it took significantly longer for the network to converge: 336 steps were needed to converge, which is four times more than the examples of Fig. 10 and 12. As explained in the previous sections, reversing the sign of 6 will reverse the network operation from on-path facilitation to shunting inhibition or vice versa. In Fig. 14, the effect these two operations have on network perfor-

90 deg

135 deg

in Dl and D2 leaflets if the butterfly with gap is presented. The parameters and layout are as in Fig. 10.

PARALLEL

IMPLEMENTATION

AND CAPABILITIES

45 deg

o&Q

.. . .

135 deg

W&Q

.. . .

.

299

OF EDANNS

.. ..<

.ii .. . .;?’

. .. . . . . . .. .. .;:* :

.. .. . .. .. . . ..

.. FIG, 13. The influence of noise. The network structure, parameters, and layout values. In this example, however, the stimulus was the noisy butterfly.

Oh2 / i F j.

. 1 ::: : :: :: [ ‘f

. ..I... . . . . . . . . . . .

D1 j::::::::::::: 1. 1.. .. ....... , . . I

are

identical to those used in Fig. IO. with the same parameter

90 deg . . :. :. ::. . :. :. ::. . : : :: ; ; i’

)

135 deg . .. . .

I I ..I. . . ..

I

‘.” . . . . . . . . :t?::::r: .” . . ...*. ...‘... . D2 :tY:*r l. *..i&i . . . . . *..

-iI

;;... ..;; i?%%;E .. .. .. ...... .... .. . . . . .. .. . ::::::::: :n:?t::: .. ..

: :t?zwm

Ii . . ‘..

. . ‘..

.

DI .i: ..

. . ‘.. . . .

FIG. 14. Network operation in case of on-path facilitation and shunting inhibition. The stimulus used here is the grid, and the network structure and layout are the same as in Fig. 10. The first series of responses show the Dl and D2 leaflets for an example of on-path facilitation (6 > 0); the second series show an example of shunting inhibition (6 < 0). The network structure used was identical to that of Fig. 10. The parameters were dt = 0.3; gl = 2; g2 = 2; al = 0.1; a2 = 0.3; bl = 0.45; b2 = 0.6; delta = 0.75; tc = 0.5; and power = 1 for facilitation and dt = 0.3; gl = 4: g2 = 4; al = 0.1; a2 = 0.3; bl = 0.3; b2 = 0.4; delta = -0.02; tc = 0.5; and power = 0.1 for shunting inhibition.

300

TOLLENAERE,

VAN

mance is illustrated for the same network as the one used in Fig. 10, 12, and 13. When the “grid” is presented as a stimulus, in the case of shunting inhibition, the network extracts the edges of the grid, whereas in the case of onpath facilitation, it performs a spatially filling-in of the grid. This should be contrasted to the network operation on the “butterfly with gap,” as shown in Fig. 12, where missing information is not filled-in. The reason why filling-in occurs here is due to the presence of significant orthogonal orientation information: indeed, if either the horizontal or the vertical lines in the grid were left out, no filling-in would occur. Note that the inhibitory effect is much stronger than the facilitatory effect, as it is reflected by the magnitude of 6’s (see figure caption): 6 = 0.75 for on-path facilitation, whereas only -0.02 for shunting inhibition.

HULLE,

AND

ORBAN

ters controlling the feedback (delta and power) to obtain sufficient activations in D2. In the shunting inhibition case, exactly the opposite arguments hold. Since the feedback effect will suppress activations in the D2 layer, one must ensure that enough input reaches D2 for the control mechanism to operate. This calls for higher gain and lower biases in both Dl and D2. An additional argument to use high gain and low bias in Dl is the fact that the feedback effect can only suppress activations which are clearly detectable in Dl. Again, if the parameters for Dl and D2 have been chosen, carefully increasing the values of the parameters controlling the feedback effect will lead to an appropriate parameter set. A few examples may clarify the parameter strategy: consider the example of Fig. 10. If, e.g., gl is set to 1 instead of 2, the contrast of the Dl response becomes less, hence, the input into D2 is less powerful and no response appears in D2. Increasing delta to 2 or in5.3. Parameter Strategy and Robustness creasing power to 4 increases the effect of the C neurons The simulations described above are controlled by 10 and yields, qualitatively, the same result as was obtained parameters. Reproducibility of neural network simula- with the parameter values outlined in Fig. 10. tion results is often considered a problem [35, 151. One Another example would be decreasing g2 from 2 to 1. may wonder how the parameter values for the examples In that case the contrast of the response in D2 is lesswere chosen, and how robust the network is to changes ened. Since entropy production is basically a sum of spain these values. In order to obtain the parameter values, a tial derivatives in D2, the total entropy production seen parameter strategy has been designed: i.e., a set of sim- by the C neurons will decrease, and this will weaken the ple rules which will lead to the appropriate parameter effect of the C neurons. This can be compensated by values. The parameter strategy has two versions: one for increasing the other parameters controlling the action of shunting inhibition and another for on-path facilitation. the C neurons, e.g., increasing de1 ta to a value between First, let us consider the case of on-path facilitation. 4 and 6 and power to 6. It should be stressed that all parameter sets yield qualiExamples are Fig. 10 and the first example in Fig. 14. If 6 is positive, we know that the D2 layer will certainly con- tatively similar results in Dl and D2. Furthermore, Dl tain all activations in the Dl layer, possibly somewhat will never be capable of filling-in the gap in Fig. 12, irresmoothed by the receptive field of the D2 neurons. The spective of the chosen parameter values. Hence, a unifeedback effect can only increase activations, never de- layer network without specialistic connections will not be crease them. Hence we must assure that the default input able to solve this problem (cf. Discussion). into D2, i.e., the input in the case 6 = 0, does not generate excessive activation which the control mechanism would amplify and spread out in D2. Hence the Dl layer 6. DISCUSSION parameters al, gl, and bl must be chosen so that Dl’s In a previous article [31], the theory behind the activation is dampened, however, without losing too much of its discontinuity information. This explains why EDANN model has been elucidated. Focus was on a the bias for Dl neurons in the example of Fig. 10 is rela- formal treatment of the control mechanism and the two operations it implements: sharpening tuning curves, ustively high, and the gain is relatively low. Also the paraming shunting inhibition, and broadening these curves useters in D2 must be chosen with this argument in mind. The workings of the control An additional argument for dampening the Dl response is ing on-path facilitation. mechanism were exemplified in the case of orientation the following: the C neurons take their input in the Dl extraction, a low-level visual task, albeit using a limited layer; therefore, C neurons can only “blow up” existing number of simulations, without actual visual stimuli. activations. If there are any spurious activations in Dl, the C neurons may fortify these. If the Dl activations are Since the EDANN model goes beyond the paradigm of damped, the C neurons will hardly detect anything, and relatively simple ANN models, dynamics is rather comhence cannot generate spurious activations in D2. Once plex. Observing this dynamics in simulations often more the parameters for Dl and D2 have been chosen, it is only clearly demonstrates the network’s capabilities than a theoretical analysis can provide. Such simulations rea matter of gradually increasing the values of the parame-

PARALLEL

IMPLEMENTATION

quire large computational power, such as the one provided by parallel computers. In this article, an example EDANN extracting orientation information from a retinal input has been implemented on a parallel computer, a transputer system. With this system, several simulations have been carried out. These simulations clearly demonstrate the EDANN model’s capabilities. To come to an efficient implementation, two modifications to the original EDANN model have been made: First, the area Yn2 over which to integrate the entropy production in Eq. (2) is semilocal in the implementation described here (cf. Eq. (9)). There are two reasons for this: it allows a more precise adaptation of the feedback to the stimulus and, since a local summation results in local communication structures on the parallel machine, it results in a more efficient parallel implementation. Second, in the present implementation, the mechanism used for the entropy production calculation has been realized completely with “neural hardware,” i.e., using intralayer C connections. As in the original model, intralayer connections were not required in the state equation of the C layer, these connections are available for implementing the discretized calculation of the entropy production integral (8). Note that the global entropy production calculation of the original model (cf. Eq. (2)) may also be performed in this way, albeit at the cost of a loss of local connectivity in layer C. We now continue with a discussion of the implementation we have developed followed by a discussion of the EDANN model’s capabilities. 6.1. Parallel Implementation

The parallel simulator is actually a programming environment into which the iteration rule which simulates a network’s state dynamics can be incorporated. All code has been written in c, and runs on a Meiko Computing Surface, which is a transputer array, embedded in a SUN workstation. The code uses the Meiko CSTools libraries for its communications, but this fact is completely transparent. In order to implement a particular network, the programmer merely needs to add some c code, like that described in Section 4. Although the simulation environment has been developed primarily for simulations of EDANN networks, it can be used to simulate other networks as well. In [29] we describe the implementation of a Hopfield network [12, 13, 61 for Bayesian image restoration [9]. We claimed here and elsewhere [27, 29, 281 that our simulation environment makes parallelism, decomposition, and mapping transparent. Indeed, the code in Section 4 does not contain any reference to any of these issues. It should be stressed however that although the programmer does not need to write any parallel code, in order to simulate the networks’ state dynamics, he or

AND

CAPABILITIES

OF EDANNS

301

she should still be aware of the fact that the code runs on a parallel machine, as was illustrated by the Gantt chart of Fig. 7. On the other hand, the programmer need not be aware of problem decomposition and mapping at all. The approach outlined here could provide the answer to industry’s need of accessible interfaces to parallel machines. In addition, all too often the performance of a (parallel) programming environment is inversely proportional to its ease of use. However, we have analyzed the parallel efficiency of our simulation environment, and have found it to be extremely satisfactory [27]. Let us now reexamine the parallel update scheme which was put forth in Section 4.4. First, note that the pipelining implies only an initialization problem: once the workspaces have been updated for the first time, i.e., when the pipeline has been filled, simulation proceeds without further ado. To see this, compare in Fig. 7 the difference between “start-up” and “running” modes. Now, we have to consider state dynamics on three different time scales. First, there is the neuron time scale. We have mentioned before that C neurons should evolve on a time scale larger than that of the Dl and D2 neurons. The reason for this is that the Dl and D2 neurons themselves evolve in time, so that it will take some time constants 7 before they will have reached their stable states. Since the C neurons have a time constant ~7, they will react more slowly than the D2 neurons. Therefore, they start detecting entropy production in D2 only when the D2 neurons have (almost) reached their stable states. If this were not to be the case, the C neurons might not detect any activation in D2, and hence no entropy production will be fed back. This also explains why the time discrepancy between input from Dl and entropy production contribution into the C neurons will not cause any numerical problems, as long as the input into Dl remains stationary. From this follows a second time scale: the module time scale. Since the C neurons have a larger time constant than the other neurons, they largely determine the time scale on which the module as a whole evolves. The time needed for convergence of the module will be a multiple of the C neuron’s time constant. In the example of Fig. 10 this was about 257, i.e., about 12.5 C neuron time constants. Recall that this was a relatively simple example: the noisy input example of Fig. 13 took four times longer to converge. Due to the existence of a “module time constant,” in case a module is subjected to temporally changing inputs, the rate at which these inputs change must be proportionally lower than the rate of convergence of the C neurons. Else, the control mechanism will not be able to adapt the D2 neurons to the changing input conditions. A third time scale is the hierarchy time scale, when a hierarchy of multiple EDANN modules is simulated. In that case, intermodule feedback is communicated between modules, as was shown in Fig. 3B. This implies

302

TOLLENAERE,

VAN

that the time needed for the hierarchy to converge will be a multiple of the module “time constant.” Evidently, for a hierarchy of modules, a pipelining strategy similar to the one explained in Fig. 6 will be needed. 6.2. Capabilities Although it may seem more convenient to use networks that are tailored to meet the needs of the task at hand, we have proposed a nonspecialistic network model for various visual processing tasks, exemplified here for one module performing orientation extraction, an early vision task. In the general EDANN network, each module implements a certain visual task by processing one or more visual parameters or parameter combinations. To specify these, only the receptive fields (RFs) of the Dl neurons have to be specified. Hence, altering the task a module is to perform does not amount to changing the topology of the module’s connections: these connections are generic in this regard. Three types of generic connections have been used. First, the RFs of the D2 and C neurons are simply blurred versions of those of layer Dl from which the receive input. Hence, no other specialistic RFs, other than those already residing in the first layer, have been used in the EDANN. This way, our approach is different from Zucker et al.‘s [40] where, in order to detect orientation discontinuities, specialistic RF types have been used. Second, for the intralayer connections, only homogeneously distributed connection weights have been used. Still, layer D2 has the ability to selectively fill-in orientation information that was missing in layer Dl of Fig. 10 but not in Fig. 12. Worgotter and Koch [36], e.g., were only able to solve this problem by introducing excitatory intralayer connections between neighboring neurons with colinear preferred orientations, in addition to inhibitory connections between the other neighbors. Hence, their model has more specialistic intralayer connections. Third, the entropy production feedback is diffuse (“patchy”) and is homogenously organized: C neurons receive gradient information from a group of D2 neurons centered around the corresponding D2 neurons, all in the same way. Albeit a scalar and thus not very much detailed, the feedback is still specific enough to modify the degree of orientation selectivity of D2 neurons, rather than to control their gain or, simply, to excite or inhibit them. This way, our feedback is different from that of [33, 7, 11, 51, among others. The EDANN model is also different from the unilayer network models that have been used to solve early vision tasks: in these approaches the task is formulated as a cost function which is to be minimized. This cost function is mapped onto a network which implements the function as its energy function. To prevent smoothing over discontinuities, such a network implements two processes:

HULLE,

AND

ORBAN

one which smooths over discontinuities and another which explicates them (“line-processes”) (for an exposition, cf. [37]). This requires specialistic connections which need to be fine-tuned to prevent dominancy of one of the two processes. In the EDANN model, Dl extracts continuity information as well as discontinuity information; the distinction is made in D2. In other words, the decision is postponed one layer further and it is entirely driven by modulating the orientation selectivity of the D2 neurons. To see this, compare Fig. 10 with Fig. 12. The decision to fill-in the missing orientation information leads to a continuation of the edge of the butterfly in D2, whereas in the other example, this decision is not made, which implies that the presence of the discontinuity in Dl is confirmed. It is generally acknowledged that orientation-selective neurons yield a quantized-in space and orientationvector field of tangents: each neuron approximates orientation within the aperture formed by its RF. Zucker and co-workers [38,39,40] have argued that the task of inferring the trace of a curve or an edge is inherent to the inference of the tangent field of orientations. The inference process is guided by the dimensionality of the tangent field: in both the one-dimensional and the two-dimensional cases, additional orientation information may be needed (filled-in) or unnecessary information may be suppressed to arrive at an appropriate representation. Albeit generic, the EDANN model is able to extract I-D and 2-D vector tangent fields with remarkable efficiency, using the same sign for 6 (6 > 0). In case of the butterfly, with and without the gap (Figs. 10 and 12), we clearly have a 1-D inference task. In case of the butterfly without the gap, additional orientation information is filled-in, notwithstanding the trace not forming a straight line, and unnecessary information is suppressed selectively (cf. Fig. 11). On the other hand, filling-in is not performed when a gap is present on the retina, though this discontinuity is only one pixel wide. Note that this hyperacquitylike [34] performance is accomplished without using dedicated energy functionals or curve fitting processes as, e.g., in [40, 161. To see that the EDANN model truly performs here a 1-D inference task, and not some other process, observe the activations in the leaflets other than those representing the edge traces. At the region where both edges cross (Fig. 10) or where the edges are close enough to be covered by several RFs with different preferred orientations (Fig. 12), orientation information becomes distributed over several leaflets simultaneously. This is a common phenomenon in any case where 1-D vector tangent fields are inferred and it simply indicates the presence of a high curvature portion in the trace or, simply, a discontinuity (cf. [40]). Hence, since this is also the case here, even in D2, we may conclude that the EDANN model indeed implements a 1-D inference process.

PARALLEL

IMPLEMENTATION

In case of the grid, we have a 2-D inference process in the first series of responses shown in Fig. 14: missing orientation information is filled-in laterally in D2’s leaflets. The lateral filling-in is stopped at the border of the grid, since there a discontinuity in orientation is clearly present. Note that this is possible despite the fact that D2 neurons receive a blurred version of layer D 1‘s activation distribution. Hence, a 2-D tangent field is made explicit in D2 since contextual information indicated the presence of constant orientation information. One could however argue that the 1-D and 2-D inference processes are simply the result of diffusion processes laterally spreading out activation in D2 just as temperature spreads out in a metal plate. That this is not true can be easily verified as follows: Consider first the butterfly example (Fig. 10). Herein, the neuron whose activation is filled-in is more strongly active than the other neurons of the edge trace. If this filling-in would be due to diffusion, the filled-in neuron’s activation would simply be lower than that of the neurons from which it receives its activation. In addition, in case of diffusion, the high curvature portion in the trace would simply be smoothed over in D2. This would also be the case with the gap in Fig. 12. Finally, the dynamics of filling-in is different: In the case of diffusion, the activation of the filled-in neuron will increase at the same rate as that of the neurons from which it receives its activation laterally. In the present case, only when there is a substantial entropy production buildup in D2, the control mechanism will be activated, and the missing orientation information will be filled-in. Hence, the process of filling-in is largely attributable to the control mechanism and not to some lateral diffusion process. For the grid (first series in Fig. 14), orientation information is filled-in in the two spatial dimensions without crossing the borders. This would not be the case when filling-in resulted from lateral diffusion, since then the border would simply be smoothed over. Hence, also for 2-D inference, filling-in is driven by the control mechanism. Now since the EDANN model does not smooth over edge trace discontinuities, even if they are only one pixel wide, or over borders, it is concluded that the control mechanism subtends a filling-in process that is much more selective than would be possible with diffusion processes. Note that this was achieved without using line processes (cf. [37]), albeit these processes may be needed for more complex stimuli. One attraction of diffusion processes is their ability to suppress noise: diffusion processes usually yield continuous, smoothly varying activation distributions, smoothing over discontinuities and, therefore, suppressing the presence of noise (cf. regularization theory; for references cf. [37]). Here we distinguish between functional and structural robustness. Functional robustness is de-

AND

CAPABILITIES

OF

EDANNS

303

fined as the degree to which the system is sensitive to noise in its input; structural robustness is defined as the degree to which the system is sensitive to parameter choices. We start with the latter. The EDANN model is specified by 10 parameters. In addition, in ANN literature, it has repeatedly been demonstrated that, as network size grows, it becomes increasingly difficult to specify the network’s parameters [35, 151. Hence, the selection of the EDANN’s parameters may seem cumbersome at first glance but, fortunately, there is a clear strategy available to select them. This is advantageous since, adding diffusion in the leaflets of the Dl neurons, or changing the actual values of the inhibiting connections for Dl neurons, deleting or changing the orientation diffusion process in D2, or using more or less contrast in the input images presented at the retina probably requires other parameters for the system to work properly, but following the same parameter strategy one can always find appropriate parameter values. The fact that such a strategy exists derives from the EDANN’s organization. Each EDANN is a module, part of a hierarchy, and within each module, neurons are organized in layers. The fan-in and fan-out of the neurons is restricted and the inter- and intralayer connections are local and homogeneous. Due to this, parameters can be grouped and the selection can proceed independently from the other groups (cf. the parameter selection for Dl and D2). Furthermore, the precise ratio between the interlayer and intralayer contribution for Dl and D2 is unimportant since it yields similar results in D2. This can be contrasted with other network models. In Waxman’s NADEL model [24], the amount of diffusion is essential since it drives the feature detection process in a later stage. In Worgotter and Koch’s network [36], the ratio between the excitatory and inhibitory intralayer contribution is quite critical for their orientation extraction process to work properly. With respect to functional robustness, we observe in Fig. 13 that, in the presence of noise, the network’s performance is still satisfactory: missing orientation information is still filled-in to form the edge traces, despite the presence of noise. Hence, the EDANN model’s performance is robust for parameter selection and noise. Using the more recent neuroscientific evidence as a guide, many research groups are complexifying the relatively simple ANN paradigm, leading toward more powerful ANNs. However, to simulate these more complex ANNs, extensive computational resources are required. It is precisely there that parallel computers can be extremely useful: the network researcher is able to run several simulations much more rapidly than on a serial computer, frequently with the facility to follow network dynamics on-line. This way, the researcher’s insight into such networks is dramatically improved. This article clearly shows that ANN research can receive valuable

304

TOLLENAERE,

VAN HULLE,

input not only from neuroscience, but also from computer science, in order to be successful. However, since ANNs are a textbook example of parallel computer applications, computer scientists and ANN researchers can both benefit from an interdisciplinary collaboration.

ACKNOWLEDGMENTS The implementation of the parallel simulator described here was funded by Grants RFO/AI/01/003 and RFO/AI/01/005 from the Belgian Ministry of Science, and done by T.T. at the Edinburgh Parallel Processing Center. The support of Professor D.J. Wallace at the University of Edinburgh is kindly acknowledged. The transputer configuration at K. U. Leuven has been funded by Grants FGW0.3.0101.90 and FGWO 9.003889 from the Belgian Medical Research Council and the National Lottery to G.A.O. as well as by Grant IT/IF/O02 from the Belgian ministry of Science to G.A.O. M.V.H. holds a postdoctoral research fellowship from the Research Council of the Katholieke Universiteit te Leuven. The authors are grateful to Professor M. Berkley, University of Tallahassee, Florida, for interesting discussions, to R. Maex and S. Raiguel for critical comments on earlier versions of this text, and to C. Huyghens, G. Meulemans, and G. Vanparrijs for technical support.

AND ORBAN

gent collective computation abilities. Proc. Nat. Acad. Sci. USA 79 (1982), 2554-2558. 13. Hopfield, J. J., and Tank, D. W. Neural computations of decisions in optimization problems. Biol. Cybernet. 52 (1985) 141-152. 14. INMOS Ltd. Transputer Reference Manual. Prentice-Hall, Hempstead, 1988. 1.5. Kamgar-Parsi, B., and Kamgar-Parsi, B. On problem solving with neural networks. Rio/. Cybernet. 62 (1990), 415-423. 16. Kass, M., et al. Snakes: Active contour models. Proc. First Znternational Conference on Computer Vision, 1987, IEEE Computer Society Press, pp. 259-268. 17. Koch, C., and Poggio, T. The synaptic veto mechanism: Does it underlie direction and orientation selectivity in the visual cortex? In Rose, D., and Dobson, V. G. (Eds.). Models of the Visual Cortex. Wiley, Chichester, 1985, pp. 408-419. 18. Koikkalainen, P., and Oja, E. The CARELLA simulator: A development and specification environment for neural networks. Adu. Control Networks and Large Scale Parallel ing, TEKES-FINSOFT-III 419611989. 19. 20.

Distributed

Process-

Meiko Scientific, CSTools User Guide. Bristol, 1989. Morrone, M. C., Burr, D. C., and Maffei, L. Functional implications of cross orientation inhibition of cortical visual cells. I. Neurophysiological evidence. Proc. R. Sot. Sect. E. 216 (1982), 335354.

Palmer, L., et al. Constraints on the estimation of spatial receptive field profiles of simple cells in visual cortex. In Marmarelis, V. Z. (Ed.). Advanced Methods of Physiological System Modeling. Biomedical Simulations Resource, Los Angeles, 1987. 22. Poggio, T., Gamble, E. B., and Little, J. J. Parallel integration of vision modules. Science 242 (Oct. 1988), 436-439. 23. Rumelhart, D. E., and McClelland, J. L. Parallel and Distributed 21.

REFERENCES I. Akl, S. The Design and Analysis of Parallel Algorithms. PrenticeHall, Englewood Cliffs, NJ, 1989. 2. Bertsekas, D. P., and Tsitsiklis, J. P. Parallel and Distributed Computation-Numerical Methods. Prentice-Hall, Englewood Cliffs, NJ, 1989. 3. Daugman, J. G. Uncertainty relation for resolution in space, frequency, and orientation optimized by two-dimensional visual cortical filters. JOSA A 2, 7 (1985) 1160-I 169. 4. Ferster, D., and Koch, C. Neuronal connections underlying orientation selectivity in cat visual cortex. TINS 10 (1987), 487-492. 5. Finkel, L. H., and Edelman, G. M. Integration of distributed cortical systems by reentry: A computer simulation of interactive functionally segregated visual areas. J. Neurosci. 9 (1989), 3188-3208. 6. Forrest, B. Restoration of binary images using networks of analogue neurons. Proc. Computer Vision Workshop, University of Oxford, 1987. 7. Fukushima, K. A hierarchical neural network model for selective attention in visual pattern recognition. Rio/. Cybernet. 50 (1984), 10.5-113. 8.

Problems and Regular

on Concurrent Problems.

Processors,

25.

26.

27.

Vol. I,

General Techniques Prentice-Hall, Englewoods Cliffs, NJ, 1988. 9. Geman, S., and Geman, D. Stochastic relaxation, the Gibbs distribution and the Bayesian restoration of images. IEEE Trans. PAMZ 721-741.

330.

11. Grossberg, S. Cortical dynamics of three-dimensional form, color and brightness perception. I. Monocular theory. Percept. Psy41(1987),

29.

Explorations

into the Microstructure

of Cognition,

Vol

(1991)

361-379.

Van Hulle, M. M., and Orban, G. A. A field approach to higher order ANNs for sensorial representation. In Personnaz, L., and Dreyfus, G. (Eds.). Neural Networks from Models to Applications. I.D.S.E.T., Paris 1989, pp. 254-263. 31. Van Hulle, M. M., and Orban, G. A. Entropy driven networks for sensorial representation: A proposal. J. Parallel Distrib. Comput. 6

30.

10. Gosh, J., and Hwang, K. Mapping neural networks onto messagepassing multiprocessors. J. Parallel Distrib. Comput. 6 (1989), 291-

chophys.

24.

28.

Fox, G., et al. Solving

5 (1984),

Processing:

I. MIT Press, Cambridge, MA, 1986. Seibert, M., and Waxman, A. M. Spreading activation layers, visual saccades and invariant representations for neural pattern recognition systems. Neural Networks 2 (1989) 9-27. Shamma, S. Spatial and temporal processing in central auditory networks. In Koch, C., and Segev, I. (Eds.). Methods in Neuronal Modeling: From Synapses to Networks. MIT Press, Cambridge MA, 1989. Tollenaere, T. Technical Reports TR90009, TR90012, TR900013, TR90014, TR90016, TR90018. Laboratorium voor Neuro- en Psychofysiologie K.U. Leuven, Belgium, 1990. Tollenaere, T., and Roose, D. Performance Modeling of a parallel neural network simulator. To appear in Allen, A. (Ed.) WoTUG 15, IOS, Amsterdam, 1992. Tollenaere, T., and Orban, G. A. Transparent Problem Decomposition and Mapping-A CSTools based implementation. In Welsh, P. et a/. (Eds.) Transputing ‘91, 107-123, IOS, Amsterdam, 1992. Tollenaere, T., and Orban, G. A. Simulating modular neural networks on message-passing multiprocessors. Parallel Computing 17

(1989),

87-116.

12. Hopfield, J. J. Neural networks and physical systems with emer-

32.

256-290.

Van Hulle, M. M. Building blocks of sensory artifical neural net-

PARALLEL

33.

34. 35.

36.

IMPLEMENTATION

works: From neuron to symbolic analogon, Ph.D. thesis Electratechnical Engineering Dept., K.U. Leuven, 1990. von Seelen, W., Mallot, H. A., and Giannakopoulos, F. Characteristics of neuronal systems in the visual cortex. Biol. Cybernet. 56 (1987), 37-49. Westheimer, G. Visual hyperacquity. Prog. Sens. Physiol. 1(1981), l-30. Wilson, G. V., and Pawley, G. S. On the stability of the traveling salesman problem algorithm of Hopfield and Tank. Biol. Cybernet. 58 (1988), 63-70. Worgotter, F., and Koch, C. A detailed model of the primary visual pathway in the cat: Comparison of afferent excitatory and intracortical Inhibitory Connection Schemes for Orientation Selectivity. J. Neurosci. 11(7)(1991) 1959-1979.

Received August 19, 1991; revised September 27, 1991; accepted October 15, 1991

AND CAPABILITIES

OF EDANNS

305

37 Yuille, A. L. Energy functions for early vision and analog networks. Biol. Cybern. 61 (1989), 115-123. 38. Zucker, S. W. Early orientation selection: Tangent fields and the dimensionality of their support. Compur. Vision Graphics Image Process. 32 (1985), 74-103. 39. Zucker, S. W., and Iverson, L. From orientation selection to optical flow. Comput. Vision Graphics Image Process. 37 (1987), 196200. 40. Zucker, S. W., David, C., Dobbins, A., and Iverson, L. The organization of curve detection: Coarse tangent fields and fine spline coverings. Proc. Second International Conference on Computer Vision, 1988, IEEE Computer Society Press, pp. 568-577.