Adaptive training of cortical feature maps for a robot sensorimotor controller

Neural Networks 44 (2013) 6–21 Contents lists available at SciVerse ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet ...

Download PDF

2MB Sizes 0 Downloads 30 Views

Report

PDF Reader
Full Text

Neural Networks 44 (2013) 6–21

Contents lists available at SciVerse ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

Adaptive training of cortical feature maps for a robot sensorimotor controller Samantha V. Adams a,∗ , Thomas Wennekers a , Sue Denham b , Phil F. Culverhouse a a

Centre for Robotics and Neural Systems, School of Computing and Mathematics, University of Plymouth, PL4 8AA Plymouth, United Kingdom

b

School of Psychology, University of Plymouth, PL4 8AA Plymouth, United Kingdom

article

info

Article history: Received 1 April 2012 Received in revised form 2 March 2013 Accepted 3 March 2013 Keywords: Spiking neuron SOM Activity dependent learning Robot controller

abstract This work investigates self-organising cortical feature maps (SOFMs) based upon the Kohonen SelfOrganising Map (SOM) but implemented with spiking neural networks. In future work, the feature maps are intended as the basis for a sensorimotor controller for an autonomous humanoid robot. Traditional SOM methods require some modifications to be useful for autonomous robotic applications. Ideally the map training process should be self-regulating and not require predefined training files or the usual SOM parameter reduction schedules. It would also be desirable if the organised map had some flexibility to accommodate new information whilst preserving previous learnt patterns. Here methods are described which have been used to develop a cortical motor map training system which goes some way towards addressing these issues. The work is presented under the general term ‘Adaptive Plasticity’ and the main contribution is the development of a ‘plasticity resource’ (PR) which is modelled as a global parameter which expresses the rate of map development and is related directly to learning on the afferent (input) connections. The PR is used to control map training in place of a traditional learning rate parameter. In conjunction with the PR, random generation of inputs from a set of exemplar patterns is used rather than predefined datasets and enables maps to be trained without deciding in advance how much data is required. An added benefit of the PR is that, unlike a traditional learning rate, it can increase as well as decrease in response to the demands of the input and so allows the map to accommodate new information when the inputs are changed during training. © 2013 Elsevier Ltd. All rights reserved.

1. Introduction The current work is part of a larger research project which aims to transfer novel principles from the field of Computational Neuroscience to a practical robotics application. A small, autonomous humanoid robot will learn basic visuomotor coordination skills using an approach based upon the self-organising topological map as a representation of the mammalian cortex. This sort of approach to modelling the cortex is not new and much work has been done previously on the development of preferences in the visual cortex (Goodhill, 1993; Miikkulainen, Bednar, Choe, & Sirosh, 1998, 2005; Willshaw & von der Malsburg, 1976). The method has also been applied to practical sensory-motor tasks such as visuomotor control (Alamdari, 2005; Kikuchi, Ogino, & Asada, 2004; Metta, Sandini, & Konczak, 1999; Morse & Ziemke, 2009; Ogino, Kikuchi, Ooga, Aono, & Asada, 2005; Paine & Tani, 2004; Ritter, Martinez, & Schulten, 1989; Rodemann, Joublin, & Korner, 2004; Toussaint, 2006). A variety of approaches have been used for visuomotor control in these previous works such as learning robot arm kinematics

∗

Corresponding author. Tel.: +44 0 1752 586294; fax: +44 0 1752 232540. E-mail address: [email protected] (S.V. Adams).

0893-6080/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.neunet.2013.03.004

and dynamics directly (Ritter et al., 1989), learning the coordination between visual input and ‘motor primitives’ (Kikuchi et al., 2004; Metta et al., 1999; Ogino et al., 2005), incorporating traditional search and reinforcement learning techniques (Alamdari, 2005; Toussaint, 2006) and even using evolutionary techniques to learn mappings (Paine & Tani, 2004). Whilst all these approaches have been successful, the majority of them would not be suitable for implementation in an autonomous robot operating in real time, because of the amount of computation required for some of the techniques and, more importantly, they have all required the use of a host PC to do the computations even when real robot hardware is used. Achieving a human-like level of skill in a robot is a challenging task as the sensory pre-processing and higher level cognitive processing that is required needs significant computing power which is in conflict with the limited energy resources available on an autonomous robot. However, natural systems somehow manage to achieve speed, fault tolerance and flexibility despite having very low power requirements. Since the current capabilities of robots do not match even the simplest animal, it seems logical to explore in more depth bio-inspired approaches to robotics: in particular, where artificial neural systems are implemented using techniques inspired by a greater understanding of how real neurons, and brains work. Computational Neuroscience

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

has made considerable progress in recent years on spiking neuron based models of sensory and cognitive processes in the mammalian neo-cortex. Spiking Neural Networks (SNNs) are the ‘third generation’ of Neural Networks (Maass, 1997); the first generation being networks consisting of simple McCulloch–Pitts neurons (McCulloch & Pitts, 1943) with binary outputs and the second generation consisting of neurons with continuously-valued activation functions. Spiking neurons mimic how real neurons compute: with discrete pulses rather than a continuously varying output. The spiking neuron is, of course, still an abstraction from an actual neuron, but a much more biologically plausible one especially as models can incorporate spike-timing based learning which is believed to be an important mechanism in natural systems. Advances in software and hardware over the last ten years or so have made SNNs an increasingly feasible option for robotics applications. On the software side several general purpose spiking neuron simulators are freely available which means that researchers do not have to code a modelling framework from scratch, and they also benefit from a community of users using the same tool. Desktop computing hardware is now available that can perform parallel processing (e.g. GPU) at an affordable price. But this can only take us so far. The emerging field of Neuromorphic Engineering is making it possible to simulate large spiking neural networks in hardware in real time with modest power requirements. ‘Neural chips’ are massively parallel arrays of processors that can simulate thousands of spiking neurons simultaneously in a fast, energy efficient way (Jin et al., 2010; Serrano-Gotarredona et al., 2009; Silver, Boahen, Grillner, Kopell, & Olsen, 2007). To realistically be able to implement the complexity of neural networks required for human-like behaviour on-board robots in the future will require implementation in such neuromorphic hardware. Our approach of using Spiking Neural Networks has been directly motivated by the possibility of using this emerging technology to implement sensory-motor controllers directly on-board robots operating in real time and with lower power consumption than traditional computing technologies. The underpinning concept of the current work is the cortical self-organising feature map (SOFM) which is an analogue of how biological brains manage to represent complex multidimensional information from their environment as a 2D map in the cortex. The SOFM methodology is inspired by the Kohonen Self-organising Map (SOM) (Kohonen, 1995). The original Kohonen SOM is an unsupervised learning technique most commonly used for machine learning, for example, data clustering applications. It is usually a two layer network: the ‘output’ layer which forms the map and the ‘input’ layer which passes in the data to be represented. The two layers are usually fully interconnected. The input layer has as many neurons/nodes as there are dimensions of data. During the training process, the Kohonen SOM selforganises to represent the range of input data available and in the final map the data is topologically arranged (similar inputs are mapped to similar locations in the map). The weights on the connections between the two layers ‘store’ the patterns and thus the number of connections to a neuron/node determines the maximum dimensionality of the map. The process of map training can be summarised as follows: 1. Present an input vector of training data. 2. Select the winning node in the output layer with the highest activation. 3. Determine a spatial neighbourhood around the winning node. 4. Adjust weights in the neighbourhood by using the learning  equation 1wij = k xi − wij yj . 5. Decrease the neighbourhood size N, and the learning rate k. 6. Repeat steps 1–5 for the desired number of training cycles.

7

The methodology used in the current work is based upon the traditional SOM as described above but with several key differences. Essentially spiking neurons are used instead of traditional artificial neurons with continuous activation. The learning rule incorporates both spatial and temporal factors to learn the mapping of input patterns. In addition a self-regulating process is used to adapt the learning rate in an online fashion so that during training, patterns can be selected and presented on the fly rather than using predefined datasets. The structure of the paper is as follows. Section 2 describes the details of the methodology used to create a prototype cortical motor map, including details of the spiking neural network setup, learning rules and training process. Section 3 describes the development of a simple adaptive plasticity method for regulating map training and how this has been used to replace the learning rate parameter traditionally used in SOMs. Section 4 describes the results from several experiments which demonstrate the benefits of this learning method and Section 5 includes a discussion and comments on areas for future work. 2. Methods 2.1. Overview As outlined in the introduction, the current work has adapted the traditional SOM methodology in several key ways. Firstly, spiking neurons are used which means that instead of using the highest activation to determine the winner, it is the temporal response of the neurons that becomes important. Using spiking neurons introduces the concept of spike timing as a means to carry additional information. Traditional SOMs use only a spatial neighbourhood around the winner, but in the current work spatial and temporal neighbourhoods are used to develop the map organisation. We have also made some amendments to the traditional SOM to make it better suited for use in autonomous robotic applications where the goal is to enable the robot to learn from the information available in its environment in a completely unsupervised way. SOM network development is generally thought of in terms of two distinct phases: initial topological ordering followed by weight convergence and in computational models the phases are usually managed explicitly by the manipulation of neighbourhood size and learning rate parameters. Both of these parameters are normally systematically reduced in a non-linear fashion during the course of training according to predefined schedules. In the case of an adaptive sensorimotor controller for a robot it is not ideal to have to predefine such schedules to control map development. Instead the development should be self-regulating as it is in natural systems. The issue of defining learning rate and neighbourhood parameter reduction in relation to traditional SOMs has been noted by several previous researchers. For example, Berglund and Sitte (2006) and more recently in Berglund (2010) the Parameter-Less SOM or PLSOM is described. These works developed a method of controlling the learning in a SOM by using the ratio of the last error between the input vector and weight vector of the winning node to the largest previous error as a scaling factor (Berglund & Sitte, 2006). In later improvements, the ratio of the last error to the diameter of the input space is used (Berglund, 2010). Shah-Hosseini and Safabakhsh (2000, 2001) developed the TASOM or Time-Adaptive SOM. Here, each neuron has its own learning rate and neighbourhood parameters and these are changed according to the distance measure between the current input vector and the synaptic weight vector of the neuron. More recently, Shah-Hosseini (2011) has developed a variant called the Binary Tree TASOM, which incorporates the removal and addition of neurons during training to allow adaptation in an environment

8

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

where the inputs can change. Brohan, Gurney, and Dudek (2010) tackled the problem in a slightly different way by allowing the learning rate and neighbourhood parameters to reset to their initial values based upon the novelty of the input, but otherwise using linear reduction schemes for both. None of these previous works have applied their ideas to a spiking neuron implementation of SOMs. Furthermore, the methods used in these works also assume that the SOM input is in the form of a pattern vector from which a meaningful distance to a weight vector can be calculated. In our work we take a different approach which uses a replacement for learning rate reduction which is self-regulating and entirely dependent on input activity as measured by the ratio of potentiation to depression of the weights on the input synapses. Although in the work described here the input data is presented as vectors our method does not require this to be the case as we intend to apply the same principles to the development of a visual SOFM where the inputs will not be vectors of data but instead a spatiotemporal pattern of spike events. In conjunction with our replacement for learning rate, we also use an existing technique of a fixed neighbourhood based upon a Gaussian function which is not reduced during training, thereby avoiding the need for a neighbourhood reduction schedule. Traditional SOMs are very commonly used for high-dimensional data clustering applications due to their dimension reducing properties. Usually a long training phase is used to guarantee a precisely ordered map for the purpose of accurately representing the input data. The amount of training data is also usually defined in advance, either by generating it from existing empirical datasets or by creating specific training datasets. In our work we use random generation of inputs from a set of exemplar patterns in conjunction with our adaptive learning method which enables maps to be trained without creating training datasets or deciding in advance how much training will be required. The methodology for the motor map is based primarily upon three previous works which have developed self-organising maps using spiking neural networks: Marian (2002), Pham, Packianather, and Charles (2006) and Ruf and Schmitt (1998). Essentially, the aim is to create cortical-like maps which can self-organise to develop spatial and temporal selectivity to input patterns from an initially random state. Correlated (temporal and spatial) activity between neurons should strengthen both excitatory connections (‘cooperation’) and inhibitory connections (‘competition’) forming a topological map where neurons responding similarly to inputs are located together and those responding to different inputs are spatially separated. Although here we describe the methods specifically in relation to training motor maps, the principles can be used for other modalities. 2.2. Spiking Neural Networks Traditional Artificial Neural Networks (ANNs) use a neural model where the neuron activation function (or activity) is continuous and at any point in time its value signifies the activity level of a neuron. In contrast, the Spiking Neural Network (SNN) uses a neuron model where neurons are not continuously activated but are either spiking or at rest, much like real neurons. There are several options for coding information in spiking neural models which are reviewed in Gerstner (1999). Rate encoding uses the spike rate over time of a population or single neuron. Spiketime encoding allows many options for encoding with firing times ranging from the time of individual neuron spikes, variations between the times of a group of neurons, and synchrony of firing between neurons. A point in favour of using a spike-time based method is that it has been established that spike timing is important in the mechanism of learning (Bi & Poo, 1998; Froemke & Dan, 2002; Song, Miller, & Abbott, 2000). In natural systems firing

time appears to be an important source of information to achieve fast computation. For example, the work of Thorpe et al. showed evidence for fast processing in the human visual system and for some motor behaviours and concluded that the information was mediated by the neuron firing time as the response was much too fast to have used a spike rate (Thorpe, Delorme, & Van Rullen, 2001; Thorpe, Fize, & Marlot, 1996). There have been relatively few research works using SOMs made of spiking neural networks. Of those that exist, some use a spike-rate based form of learning, for example Choe and Miikkulainen (1998), while others use various spike-timing based methods, for example, Bohte, La Poutre, and Kok (2002), Marian (2002), Panchev and Wermter (2001), Pham et al. (2006) Ruf and Schmitt (1998) and Sala, Cios and Wall (1998). When implementing an SNN there is a choice of neuron models that can be used depending upon the type of biological properties one requires. Fig. 2 in Izhikevich (2004) gives a summary of the types of model available, their properties and an indication of the computational overhead. At one extreme is the Hodgkin–Huxley model which can represent many biological properties, but is computationally expensive, and at the other the simple Leaky Integrate and Fire (LIF) neuron model which is computationally cheap but can only represent the most basic spiking behaviour. In our work we only require a simple spiking behaviour and thus use a Leaky Integrate and Fire model. In this model, contributions from connecting neurons increase the membrane voltage of a target neuron until the voltage reaches a threshold value and the neuron fires. This is followed by a refractory period during which the neuron cannot fire. Our implementations are created using the Brian Spiking Neural Network simulator1 (Goodman & Brette, 2008) where many standard models are available or custom ones can be defined using differential equations. We adopt a Leaky Integrate and Fire (LIF) model based upon the well-known CUBA (CUrrent BAsed) model described in Vogels and Abbott (2005) and Brette et al. (2007) which is described mathematically by Eq. (1).

τm

dV dt

= (ge + gi − V )

(1)

where: V is the neuron membrane voltage. ge is the voltage contribution from excitatory synapses. gi is the voltage contribution from inhibitory synapses. τm is the membrane time constant. The neuron receives input from both excitatory and inhibitory synapses (represented by the terms ge and gi in Eq. (1)) using the fast AMPA (α -amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid) receptor model which assumes that the action potential generated by the presynaptic neuron is instantaneous and decays exponentially over time in between further action potentials (Dayan & Abbott, 2001). This behaviour is represented by Eq. (2).

τs

dg dt

= −g

(2)

where: g is the effective conductance for an excitatory or inhibitory synapse. τs is the synaptic time constant. When a presynaptic neuron fires the effective conductance, g of the postsynaptic target neuron is updated by Eq. (3). gnew = gold ∗ w

1 www.briansimulator.org.

(3)

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

9

Table 1 Summary of neuron model, network and run parameters. Parameter

Value

Neuron model Vreset , reset voltage VThresh , neuron threshold

τm , membrane time constant τs , excitatory/inhibitory synaptic time constant τda , delay on afferent synapses τdl , delay on lateral synapses

τrefrac , neuron refractory

0 mV Randomly initialised as 3.9 mV plus noise normally distributed between 0 and 0.5 mV 5 ms 5 ms 2 ms Set as distance between pre and postsynaptic neuron plus noise added by a Gaussian with mean 0 and standard deviation 0.5 10 ms

period Network architecture

Fig. 1. The motor map architecture.

where: gold is the original effective synaptic conductance. gnew is the updated effective synaptic conductance. w is the synaptic weight. See Table 1 for a summary of the neuron model parameters and their initialised values. 2.3. Network architecture The aim is to create cortical-like maps which can self-organise to represent input patterns encoding 8 directions of motion (N, NE, E, SE, S, SW, W and NW). The learned output response is in the form of spatially and temporally localised spikes which are topologically arranged (i.e. input patterns of adjacent directions should produce a response that activates adjacent groups of output patterns). Fig. 1 shows a diagram of the network architecture which follows a typical SOM setup. It has a 16 neuron input layer and a 256 neuron output layer and is based directly upon that used in Marian (2002). Here the output map is arranged as a 2D grid following the convention of previous experimental and modelling studies of cortical maps. In the output layer 20% of the neurons are randomly assigned as inhibitory and 80% as excitatory as these appear to be the proportions of inhibitory to excitatory neurons in real cortex (Kandel, Schwartz, & Jessell, 2000). The input layer is fully connected to the output layer with feedforward connections (afferent connections). For clarity Fig. 1 only shows the connections from the input layer to one output neuron. Afferent connection weights are set to a random value between 0.4 and 0.5. In addition the output layer is recurrently connected. The arrangement of the output layer as a 2D grid is important for the determination of these connections as they are distance based. In real cortex sparse lateral connections follow a typical ‘Mexican hat’ profile of short-range excitation and long-range inhibition (Kohonen, 1984; Miikkulainen et al., 2005). Fig. 1 shows an example of typical lateral connections for one excitatory and one inhibitory neuron. Lateral connection weights are set to a random value between 0.3 and 0.4. The connection structure also incorporates delays. There is a constant delay of 2.0 ms for afferent connections. The delays for the lateral connections are calculated according to the distance between the two neurons with added Gaussian noise (refer to Table 1 for details).

Nin , number of neurons in input layer Nout , number of neurons in output layer waff , afferent synaptic weights wlat , lateral synaptic weights Exc_pconn , connection probability for lateral excitatory connections Inh_pconn , connection probability for lateral inhibitory connections

16 256 Randomly initialised between 0.4 and 0.5 Randomly initialised between 0.3 and 0.4 (exc) and −0.3 and −0.4 (inh) Calculated as exp(-dist/sigma) where dist is the Euclidean distance between the neurons and sigma is 3.5 Calculated as exp(-sigma/dist) where dist is the Euclidean distance between the neurons and sigma is 8.0

Learning rules

η, the traditional SOM learning rate Wmax , maximum allowed lateral synaptic weight σ , neighbourhood spread Ap , LTP rate Am , LTD rate τltp , LTP time constant τltd , LTD time constant

Initial value 0.5, decaying by 0.949 each cycle (used in experiment 1 only) 1.0 for excitatory, −1.0 for inhibitory (used in experiment 1 only) 3.0 Various values used—see experiment section 1.05 ∗ Ap 10 ms 10 ms

Run time Tint , the integration time Tout , the time out parameter

9 ms 30 ms

2.4. Input patterns The form of the input patterns was guided by those described in Marian (2002). However, as a full description of how the patterns were constructed is not given in that work, the method of construction for the current work was inferred based upon the description given. Exemplar input patterns representing 8 compass directions (N, NE, E, SE, S, SW, W, NW) were artificially constructed as two vectors consisting of 16 elements each. The first vector holds neuron IDs and the second neuron firing times. Of the 16 elements, 4 are ‘salient’ (i.e. they provide important information) and the rest are ‘noise’. The firing times were chosen so that the more recent the spike time (shorter latency), the greater the saliency. Salient neurons have firing times close to a reference time Tint (between 7.0 and 9.0 ms) and noise neurons have firing times set to values between 0 and 2.5 ms. Creation of the exemplar pattern set was done using the following process: 1. 8 sets of 4 ‘salient’ firing times in the range 7–9 ms were randomly generated 2. 8 sets of 12 ‘noise’ firing times in the range 0–2.5 ms were randomly generated

10

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

Table 2 Salient Neuron ID and spike times for the exemplar patterns.

The 160 patterns generated by this process were then written in a random order to a dataset file. As a final validation step a Fisher’s Linear Discriminant Analysis (FDA) was performed on one training dataset using SPSS software and confirmed that 8 well-separated clusters were present in the data.

do incorporate plasticity on both afferent and lateral connections (although they do not use spike-timing based plasticity in the majority of their models). Through various experiments they have demonstrated the important role of lateral connections in initial map development and adaptation during recovery from lesion. For example, they use the RLISSOM (Reduced LISSOM) model to demonstrate how lateral connections are responsible for reorganisation in the visual cortex after a retinal lesion. They also use the PGLISSOM (Perceptual Grouping LISSOM) model to demonstrate the role of lateral connections in perceptual grouping. The LISSOM models also allow plasticity on both excitatory and inhibitory connections. In most previous works incorporation of inhibitory plasticity was rare due to a lack of strong experimental evidence for it in the real cortex. Miikkulainen et al. (2005) justified it on the grounds of a modelling abstraction: an inhibitory connection is taken to incorporate a plastic excitatory connection onto an inhibitory interneuron. The LISSOM models demonstrate the role of lateral inhibitory connections in providing competition in the response of the map. The long-range inhibitory connections suppress the activity of neurons in remote areas when a particular area of the map is active, hence neurons which develop similar response preferences are spatially co-located. The PGLISSOM model showed how the inhibitory connections perform the role of segmentation in perceptual grouping experiments. Recently there has been more support for inhibitory plasticity in vivo in several brain areas and evidence that it operates by similar mechanisms to plasticity on excitatory connections (for example, see Gaiarsa & Ben-Ari, 2006, Vogels, Sprekeler, Zenke, Clopath, & Gerstner, 2011). Vogels et al. (2011) proposed that inhibitory plasticity could play a role in the balance between excitation and inhibition in cortical networks and furthermore that it could enable latent storage of activity patterns below background activity. Their learning rule for inhibitory plasticity is a simple Hebbian STDP rule where correlated firing of pre and postsynaptic neurons potentiates the connection between them and presynaptic firing only leads to depression. The current work incorporates STDP on afferent connections and also on both excitatory and inhibitory lateral connections to provide the competition and cooperation required for the development of the map. Learning on the afferent connections is based upon the method originally created by Ruf and Schmitt (1998) and subsequently modified by Marian (2002). The form of the learning rule is given in Eq. (4).

2.5. Learning

1wij = η

Pattern

ID/Time

North

13 7.1

6 8.1

11 8.6

14 8.7

North–East

13 7.1

4 7.2

11 8.6

0 8.8

East

4 7.2

12 7.4

0 8.8

1 8.9

South–East

12 7.4

5 7.5

2 8.1

1 8.9

South

5 7.5

10 7.7

2 8.1

7 8.8

South–West

10 7.7

8 7.9

7 8.8

9 8.9

West

5 7.4

11 7.8

8 7.9

9 8.9

North–West

5 7.5

8 7.5

14 7.8

11 8.4

3. Neuron IDs (in the range 0–15) were manually assigned to the firing times, with adjustments to ensure that adjacent directions always have two neurons in common and directions more than 45° apart have at most one neuron in common. In the generation of the firing times in steps 1 and 2 the precision of the values is 0.1 ms to match the timestep used in the Brian simulations. The Neuron IDs and spike times for the salient neurons for all 8 exemplar patterns are given in Table 2. For training with the initial version of the motor map predefined datasets were required. A single dataset was created by making 20 instances of each of the 8 exemplar directional patterns perturbed from the original values. Perturbing of the exemplar patterns was performed as follows: 1. Only spike times are perturbed. 2. Noise values are adjusted by a random value within the range (−1.0, 1.0). Minus values are clipped to 0 ms. 3. Salient values are adjusted by a random value within the range (−0.5, 0.5). Values greater than 9.0 are clipped to 8.9 ms.

Of the previous implementations of spiking SOMs described in Section 2.2, only a few use some form of Spike Timing Dependent Plasticity (STDP) which is the currently favoured model for learning in real neurons. Experimental and modelling studies have shown that this form of Hebbian plasticity, where the relative firing times of pre and postsynaptic neurons influence the strengthening or weakening of connections, is the mechanism that real neurons use (Song et al., 2000). When firing times are causally correlated (i.e. the presynaptic spike is emitted before the postsynaptic spike) then the synapse is strengthened (Long Term Potentiation or LTP). When firing times are not causally correlated (i.e. the postsynaptic spike occurs before the presynaptic one) then the synapse is weakened (Long Term Depression or LTD). In general, not many works have incorporated plasticity on both afferent and lateral connections, nor on inhibitory connections. The various LISSOM (Laterally Interconnected Synergetically Self-Organising Map) models described in Miikkulainen et al. (2005) have recreated many features of self-organising maps in the visual system and

Tout − tj  Tout

 εi − wij · d(j, winner)

(4)

where:

wij is the weight on the connection between input neuron i and output neuron j. η is the learning rate parameter. tj is the firing time of output neuron j. Tout is the duration of the activity integration and forms an upper bound on the firing times of the output neurons. εi is the PSP contribution from input neuron i calculated as (T −t ) exp(− intτ i ) where τm is the membrane time constant, Tint is m the base integration time and ti is the firing time of the input neuron, i. distance(j,winner)2

d = exp(−( )) where distance() is a function 2σ 2 calculating the Euclidean distance between output neuron j and the winner neuron for the current learning cycle and σ is a neighbourhood spread parameter. See Table 1 for details of the parameter values.

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

As in a traditional SOM, this approach uses a spatial neighbourhood to ensure that neurons with the same responses are spatially Tout −t

co-located. A temporal neighbourhood T j is also used to deout velop organisation of the neural responses in the time domain. The Tout parameter represents the time during which activity is allowed to propagate through the network, and ensures that for neurons which fire the quickest in response to an input pattern, (Tout − tj ) is large and thus their weight update is large. Conversely, neurons which fire late get a smaller update. In summary, the magnitude of the afferent weight update is determined by the following factors:

• The value of εi which is determined by the presynaptic (input) neuron firing time.

• The difference between the value εi and the current weight wij . • A temporal neighbourhood: synapses where the firing time of the postsynaptic (output) neuron in response to the input pattern are quickest get a larger update. • A spatial neighbourhood: synapses with firing postsynaptic (output) neurons located close to the current winner get a larger update. • A learning rate parameter η which is reduced gradually during training. In order to fulfil the aim of dispensing with a neighbourhood reduction schedule the original learning rule of Marian (2002)/ Ruf and Schmitt (1998) has been modified with a fixed spatial neighbourhood implemented by a Gaussian function. In Ruf and Schmitt (1998) the neighbourhood size was changed during training by manipulation of lateral connections and in Marian (2002) a traditional SOM neighbourhood with a reduction schedule was used. There is some evidence from previous work in generic SOM theory that reducing the neighbourhood size during training may not be crucial for map formation. For example the work of Raginsky and Anastasio (2008) demonstrated that the optimum neighbourhood size for best representation of input information is finite and small. Furthermore, a Gaussian neighbourhood has already been successfully used in the spiking SOM implementation of Alamdari (2005) for a path planning task and also in Pham et al. (2006) to train a map for a data classification task. It should be noted that the rule in Eq. (4) still employs a traditional learning rate parameter η which needs to be reduced gradually during training to control map development. This form of the rule was used for initial prototyping only. Section 3 describes how the learning rule was amended to dispense with this. For learning on the lateral connections, Marian (2002) used an amended version of the Ruf and Schmitt (1998) afferent learning rule with slightly different formulations for excitatory and inhibitory connections. In their original work, Ruf and Schmitt (1998) did not use plasticity on lateral connections but manipulated the weights directly. One problem with most kinds of Hebbian plasticity (including STDP) is that they are positive feedback systems. Correlated activity between a pre /post neuron pair strengthens the connection between them which makes further correlated activity much more likely. Without compensating adjustments, the weight increments will proceed unchecked. It is therefore usual to include some form of normalisation of all weights after a learning phase or apply hard limiting to a maximum/minimum value. For example, in Marian (2002) global normalisation was used to keep the lateral weights from getting too large. Neither global normalisation nor hard limiting is particularly biologically plausible (although there is some experimental evidence that forms of normalisation may occur in biological systems). Another problem specific to STDP is that in the general formulation (Song et al., 2000) the resulting weight distribution is bimodal which does not match with experimental results (van Rossum, Bi, & Turrigiano, 2000).

11

The lateral learning rule in the current work retains some elements of Ruf and Schmitt (1998) (e.g. the temporal neighbourhood) and has been modified to use STDP following the basic method of Song et al. (2000) but incorporating weight dependent learning as suggested by van Rossum et al. (2000). Here the weight updates are dependent on the existing connection weight, with LTP updates being additive and LTD multiplicative. These rules result in a similar unimodal weight distribution to the experimentally observed one. In the original formulation of their rules van Rossum et al. (2000) also incorporate a weight dependent Gaussian noise term with a zero mean and standard deviation derived from experimental data. In the current work the noise term is omitted. The update rules for Long Term Potentiation (LTP) and Long Term Depression (LTD) are given as Eqs. (5a) and (5b) respectively and are used for both excitatory and inhibitory connections.

1wij = η

Tout − ti  Tout

 wmax − wij · deltaSTDP · d(j, winner),

(5a)

wt +1 = wt + 1w 1wij = 1 − η

Tout − ti Tout

· deltaSTDP · d(j, winner),

(5b)

wt +1 = wt ∗ 1w where:

wij is the weight on the connection between input neuron i and output neuron j.

wmax is the maximum allowed lateral weight. η is the learning rate parameter. Tout is the duration of the activity integration. deltaSTDP is the STDP function:

 Ap · exp −

 Am · exp

tj − ti



τltp

tj − ti

τltd



tj − ti > 0





(tj − ti ) < 0

ti , tj are the firing times of presynaptic neuron i and postsynaptic neuron j respectively. Ap , Am are the STDP potentiation and depression rates respectively. τltp , τltd are the time constants for potentiation and depression. distance(j,winner)2

d = exp(−( )) where distance() is a function 2σ 2 calculating the Euclidean distance between output neuron j and the winner neuron and σ is a neighbourhood spread parameter. See Table 1 for details of the parameter values. These rules avoid the need for either global normalisation or hard limiting and ensure that the connection weights cannot change sign. In summary, the value of the lateral weight update is determined by:

• Presynaptic and postsynaptic neuron firing times (using the exponential STDP equation of Song et al., 2000).

• The current weight (using update rules similar to van Rossum et al., 2000 but without noise).

• A temporal neighbourhood: synapses where the firing time of the postsynaptic neuron in response to the input pattern are quickest get a larger update. • A spatial neighbourhood: synapses with firing postsynaptic neurons located close to the current winner get a larger update. • A learning rate parameter η which is reduced gradually during training.

12

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

As for the afferent rule, Eqs. (5a) and (5b) still employ a learning rate parameter η. The LTP rule given in Eq. (5a) also incorporates a Wmax parameter which is used to ensure that as the weights increase they asymptotically approach a maximum value. Section 3 describes how an adaptive plasticity method was used to dispense with both of these. The learning update cycle is based upon the methods used by Marian (2002) and Ruf and Schmitt (1998) and involves the following steps: 1. A direction pattern of 16 spike times is presented to input layer of the network and causes the input layer neurons to fire over an interval of 0 ms up to Tint (the integration time). 2. From Tint up to Tout the activity is allowed to propagate through afferent and lateral connections. 3. After Tout , a ‘winner’ neuron is selected randomly from the group of output neurons that fired the quickest during the activity propagation period. 4. Afferent learning is applied using Eq. (4). 5. Lateral learning is applied using Eqs. (5a) and (5b). 6. The network is reset for the next pattern. 7. The learning rate η for both afferent and lateral connections is decreased. 3. Adaptive plasticity 3.1. Overview As previously stated, a main aim of this work was to develop a self-regulating, activity dependent learning method so that learning rate reduction schedules could be dispensed with. Our inspiration came from considering the type of process that might determine when a particular map refinement phase is ‘finished’ in a real cortical map. In his study of the formation of the retinotectal map in goldfish, Schmidt (1985) noted that there was a correlation between the rate of synaptogenesis and the amount of disruption to map organisation: as time goes on there is less synapse refinement and thus less potential for disruption. Schmidt explained this by change in arbour size (spread of dendritic projections) of the retinal ganglion cells. Reduction in arbour size over time means that there are effectively fewer potential connections available to be made. In very general terms regulation of plasticity in development can be viewed in terms of a ‘resource’ being consumed over time where the level of the resource influences how much plasticity is allowed. van Ooyen (2001) reviewed the theory behind competitive synaptogenesis and various models that existed at the time. The majority of the reviewed models are based upon the concept of ‘consumptive’ competition, i.e. there is a limited resource (a Neurotrophic Factor or NTF) which is consumed when connections are made thus limiting the potential for further connections. Such processes involve both electrical and chemical regulation. The majority of works exploring self-organising map formation do not incorporate any kind of activity dependent regulation. 3.2. Development of a plasticity resource Our ‘adaptive’ plasticity methods are based upon some of the above concepts and in the current paper we apply them to models incorporating functional plasticity (weight changes) only. However, there is experimental evidence that in real systems both functional plasticity and structural plasticity (synapse creation and pruning) work together both in development and in the adult brain (Butz, Wörgötter, & van Ooyen, 2009; Chklovskii, Mel, & Svoboda, 2004). A future aim of our work is to explore activitydependent synapse formation and pruning in conjunction with

weight changes using the adaptive plasticity methods described here. Network development is monitored and controlled by a global ‘plasticity resource’ (PR) which can be viewed as an abstraction of an NTF. We propose a simple model where network activity (via the LTP/LTD processes stimulated by input patterns) affects the level of the plasticity resource which in turn will affect the capacity for plasticity by regulating weight changes. The inspiration for using the levels of LTP/LTD to monitor training comes from the work of Harris, Ermentrout, and Small (1997) which implemented a consumptive scheme to specifically model Ocular Dominance map formation and included Hebbian LTP/LTD processes controlled by the consumption of an NTF-like resource. In the Harris model, synaptic weights change as a consequence of Hebbian LTP and LTD processes. Each individual neuron has an allocation of NTF available to be distributed amongst its synapses. When LTP strengthens an afferent synapse, there is a corresponding increase in the ability of the synapse to uptake NTF. This causes competition between the synapses as the stronger they become, the more NTF they can consume thus decreasing the NTF available to other synapses connected to the same neuron. The rate of LTP is also increased as the amount of NTF acquired increases thus reinforcing the process. In the current work a simpler scheme is used whereby there is one global NTF pool or plasticity resource (PR). Also the PR can increase and decrease by the direct action of both LTP and LTD on the afferent connections. Following on from initial benchmark experiments, we were aware of what the organisation and response of a trained network looked like and approximately how many training cycles it required for that particular data (see Section 4.1—benchmark experiment). We hypothesised that the amount of weight change in response to input patterns should level off once a sufficient number of the input patterns had been presented. Early on in the map development process the network activity is high (the map is learning novel features in the input). When the network has learned the range of patterns in the input, the activity should level off. To verify this we ran an experiment for 10 training cycles (1600 patterns) using the methods described in Section 2 (i.e. using a traditional learning rate reduction schedule and predefined datasets) and collected data on the values of the LTP/LTD changes on all connections. Fig. 2 shows total cumulative LTP and LTD weight updates per cycle for afferent connections only displayed with the traditional learning rate, eta. These data have been normalised so that all values lie between 0.0 and 1.0 to aid comparison. Afferent weight changes are the primary driver for learning as they directly involve the input patterns: the Ruf and Schmitt afferent learning rule given in Eq. (4) amends the afferent weights by directly comparing the input vector to the current weight vector. In contrast, lateral learning (Eqs. (5a)–(5b)) is an indirect consequence of the spread of activity after the initial firing. Fig. 2 shows that early on in training, the value of weight updates in terms of LTP (increases) and LTD (reductions) starts off relatively high, decreases and eventually levels off which supports our hypothesis. In contrast, the traditional learning rate parameter eta decays according to its predefined schedule regardless of what is actually happening. To better express the stabilisation of afferent learning in terms of the relationship between LTP and LTD processes, Fig. 3 shows the proportion of LTP to LTD weight updates over the 10 training cycles which exhibits a distinct pattern of exponential increase and levelling off. Although these specific results are a feature of the particular afferent learning rule we use in our model the principles should follow for other self-organising map learning rules where the afferent synaptic weights undergo potentiation and depression in order to represent input patterns. Eq. (6) captures the essence of

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

13

Table 3 Parameters for experiment 1. Parameter

Description

Value

Ap Am N

LTP rate LTD rate Number of training cycles Initial learning rate Learning rate reduction

0.1 0.105 ∗ Ap 10 0.5 0.949

η ηr

3.3. Adjustments to the learning rules In order to dispense with the learning rate parameter (η) used in both the afferent and lateral weight update rules and also the wmax parameter used to keep the lateral weight updates in range, new learning rules were created incorporating the PR value. These are given as Eqs. (7) (afferent updates), (8a) and (8b) (lateral updates)

1wij = PR ·

Tout − tj 

1wij = PR ·

Tout − ti

Fig. 2. Comparison between afferent weight updates from LTP and LTD and learning rate.

 εij − wij · d(j, winner)

Tout

Tout

· deltaSTDP · d(j, winner),

(7)

(8a)

wt +1 = wt + 1w 1wij = 1 − PR ·

Tout − ti Tout

· deltaSTDP · d(j, winner),

(8b)

wt +1 = wt ∗ 1w where: PR is the plasticity resource value as calculated by Eq. (6) and the other parameters are as described for Eqs. (4), (5a) and (5b). An experiment using these new rules showed qualitatively the same training result as the benchmark experiment (see Sections 4.2 and 4.3) with the added benefits that the rate of plasticity is now controlled solely by the PR value. 4. Results

Fig. 3. Proportion of LTP to LTD on afferent connections per learning cycle.

the information in Fig. 3 so that it can be used as a single global ‘plasticity resource’ (PR) to control learning.

    LTP , abs LTD     max abs LTP , abs LTD 

PR = 1.0 −

min abs



(6)

where: PR is the plasticity resource value.  LTP is the sum of afferent LTP updates.

 

LTD is the sum of afferent LTD updates.



LTP and LTD are global totals maintained throughout a training run. They are updated using Eq. (4) during the afferent learning phase. For each learning update the value of △wij is calculated for each synapse. If △wij is positive then the weight value is added to LTP and if it is negative the weight value  is added to LTD.At the endof the learning phase the ratio of absolute values of LTP and LTD is taken, with the minimum of the two as the numerator to ensure the ratio is positive and less than 1.0. The PR value is then calculated as 1.0 minus this ratio. The PR value is updated using this calculation after presentation of each individual pattern once the weight updates have been completed. This results in the PR value decreasing over time and thus its action is to reduce the amount of learning possible as training progresses.

This section details experimental results which demonstrate various aspects of the work described previously. Firstly, Experiment 1 is a ‘benchmark’ experiment, which used the traditional learning rate parameter schedule and predefined datasets. The benchmark also included the fixed Gaussian neighbourhood (as described in Section 2.5) as it was a proven technique used elsewhere (Alamdari, 2005; Pham et al., 2006) and we saw no need to do extra experiments comparing its performance with a traditional SOM neighbourhood reduction. The results from the benchmark are used as a baseline with which to compare the results of subsequent experiments. Experiment 2 demonstrates that qualitatively the same results as the benchmark were achieved when the learning rate parameter was replaced with the new plasticity resource. Experiment 3 replaces the original predefined data sets with input data presented randomly ‘on the fly’ and shows that the PR value can be used as an indicator of the progress of training. This experiment also analyses in detail the results of training over multiple runs. Experiment 4 demonstrates how the PR value adjusts and the map adapts when the training inputs are changed during the course of training. The remainder of this section describes each experiment in turn and the main results. 4.1. Experiment 1—benchmark The setup as described in Section 2 was used. Therefore the learning rules incorporated a traditional learning rate parameter reduction and used predefined input datasets. Table 3 gives a summary of the relevant experimental parameters.

14

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

Fig. 4. Afferent weights before and after training.

A training dataset consisted of a file containing 20 examples of each of the 8 directional patterns as described in Section 2.4. These were generated by randomly perturbing the exemplar patterns. The order of the patterns within the file was also randomised. A different training dataset was created for each training cycle. Fig. 4 compares patches from different locations of the afferent weight matrix before and after training and shows that the majority of the afferent weights have learned to represent specific input patterns. In the left of this figure both patches show no discernible pattern, with all weights falling between their initial values of 0.4 and 0.5. In contrast, the corresponding patches of the weight matrix after training show how selected connections have been strengthened and weakened in order to represent specific input patterns. The table on the far right of the figure shows the actual patterns represented. Only ‘salient’ portions of the input pattern are shown for clarity. During training, updates to the lateral weights also occur in response to the spreading of activation amongst neurons in the output layer. The lateral connections are a mixture of short-range excitatory and long-range inhibitory synapses with delays set according to the distance between neurons. Strengthening of excitatory lateral connections results in clusters of neurons located together firing for the same or similar input. Strengthening of inhibitory lateral connections results in clusters of neurons with similar preference inhibiting the action of clusters with a different preference. Fig. 5 shows a plot of the spatiotemporal response of the output layer following the presentation of two dissimilar patterns before and after training. In the plots the x, y axes are spatial location of the neuron and the z axis is firing time. The data for these plots were produced by disabling learning and presenting exemplar patterns to the saved network and recording which neurons fire and their firing time. Initially, the network is fairly equally responsive to both patterns with a large proportion of the neurons firing at various times over almost the whole output period (which is up to 30 ms). After training the majority of neurons are silent (due to lateral inhibition from the winning area) but a spatially distinct patch of neurons

respond to each of the patterns and the range of firing times of the responding populations are also different to the initial state. Fig. 6 shows composite motor maps for before and after training, showing topological ordering and indicating the direction preference of neurons in the output layer. This 2D grid representation based upon the spatial location of neurons responding to each pattern is a visualisation in keeping with the convention in previous modelling works to represent the direction preferences in the same fashion that experimental cortical maps are usually portrayed (for an example of a real direction preference map see Fig. 2(a) in Weliky, Bosking, & Fitzpatrick, 1996). Fig. 6(a) (the map before training) shows no evidence of spatial organisation: the majority of neurons fire for all patterns. In Fig. 6(b), over 90% of the cortical neurons have developed a preference for at least one of the 8 directions and neurons with the same preference are mainly grouped in the same spatial location. In the main, neighbouring patterns are also located near to each other, for instance patterns W and SW are next to each other as are patterns E and SE. The final map shows a less clear-cut separation of clusters than would be seen in a traditional SOM because in the case of real cortical maps, the response to a stimulus is more complicated (due to the effects of the lateral connectivity and the temporal component of activation). Typically, they exhibit a distributed response of a population of neurons and also individual neurons may have a preference that is finely or quite broadly tuned so there is considerable overlap. There are limitations to this sort of visualisation as it only shows spatial arrangement of neurons and does not take into account the firing times of neurons when responding to patterns. Fig. 5 showed that firing time is an important factor in the response. More detailed analysis and visualisation of individual neuron preferences and whole cortical map responses are explored in Section 4.3, experiment 3. 4.2. Experiment 2 This experiment was performed incorporating the plasticity resource (PR) with modifications to the learning rules as described in

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

15

Fig. 5. Experiment 1 network response to two different patterns before and after training.

Fig. 6. The initial (a) and final (b) motor maps from experiment 1.

Table 4 Parameters for experiment 2. Parameter

Description

Value

Ap Am N

LTP rate LTD rate Number of training cycles

0.02 0.105 ∗ Ap 20

Section 3 but with the same predefined datasets as used in experiment 1. Table 4 gives a summary of the relevant parameters. In this experiment the rate of map training is controlled solely by the plasticity resource. In the original learning rules the wmax parameter was used to keep the maximum synapse weights to 1/−1, and removing it results in the maximum weights settling at much higher values. Although this does not affect the performance of the map learning, we chose to use a lower STDP learning rate to keep the final maximum excitatory and inhibitory weights

at around 1/−1. Training was stopped when the PR value was changing by only 0.01 units between training cycles. The results showed that the self-regulating learning regime can qualitatively reproduce the results of experiment 1. The output response for the final network for patterns North and South is shown in Fig. 7. Comparing this to the same response from the network in experiment 1 (Fig. 5) we see similar behaviour in that the response after training is much more spatiotemporally distinct from that of the initial network and the response to the two dissimilar patterns is quite different. Fig. 8 shows a graph of the plasticity resource (PR) value as measured at the end of each of the training cycles. Here we see that the PR trace behaves as we expect: an initial high value which decreases gradually as the network learns the range of input patterns. In the final few cycles the trace levels off and changes remain around 0.01. Due to the action of the constantly decreasing plasticity resource twice as many training cycles were required compared to the benchmark.

16

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

Fig. 7. Experiment 2 network response to two different patterns before and after training.

4.3. Experiment 3 Up until this point, training patterns had been presented in the form of predefined datasets as described in Section 2.4. In order to see the real benefit of using the PR trace to control learning the training setup was changed to select a training pattern randomly from the eight exemplars and perturb the values on the fly before presenting it to the network. We ran the training in blocks of 160 patterns to capture the state of the network during the course of training but did not plan to run a specific amount of patterns. Therefore in contrast to the setup in experiment 2 which used fixed datasets, the composition of the training data was not guaranteed (i.e. there is not necessarily an equal number of each pattern presented and the random perturbation of each pattern will add different noise each time). Apart from these changes to the input, the other experimental parameters were the same as for experiment 2. To check the stability of the training performance across different initial network instantiations, 5 separate runs were performed. In each case training was stopped when the PR trace value was changing by only 0.01 units. Fig. 9 shows the network response from one of the runs to the ‘North’ and ‘South’ patterns before and after training, for comparison with Fig. 7. The results are qualitatively the same as experiment 2: before training, both patterns elicit a response from the majority of the neurons across the area of the map around the integration time of 9 ms. After training, both the spatial range and firing times of the responding neurons have become distinctive for each pattern. Fig. 10 shows a plot of the PR traces for all five runs and shows that the behaviour is consistent and qualitatively the same as previous experiments. Even though initial conditions vary between the five runs, in all cases the PR value settles into the same pattern of exponential decrease and finally, levelling off. The results of this experiment show that the plasticity resource can successfully be

Fig. 8. Plasticity resource trace for experiment 2.

used to control and monitor the training of a map where randomly generated patterns are presented and it is not known in advance how much training will be required. Furthermore, the results are stable across different network initialisations. Up until this point in the discussion, the development of neuron preferences for the input patterns has only been verified visually: i.e. by inspection of plots such as those shown in Fig. 6. As previously mentioned, in this visualisation only the spatial component of the responses is shown and as in real cortical maps the responses overlap in many cases. The cortical responses have both a spatial and temporal component which is difficult to represent visually in a composite map of all 8 directions. Therefore, to verify that individual neurons do develop preferences during training we collected data on how many of the 8 directions a

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

17

Fig. 9. Experiment 3 network response to two different patterns before and after training. Table 5 Calculation of preference strengths after training for neuron 247, run 1. Direction

Raw firing time (ms)

South–East 17.6 South 9.1 South–West 9.1

Adjusted by Eq. (9) (to 3 d.p.)

Normalised over total (to 3 d.p.)

0.413 0.697 0.697

0.228 0.385 0.385

neuron responds to and also the strength of the preference before and after training. The preference strength is calculated based upon how many neurons are responded to and also by the firing times with earlier firing times indicating a stronger preference. To calculate the preference strength, for each pattern responded to, the firing time is converted using the rule in Eq. (9). tnew = (Tout − told )/Tout

(9)

where: tnew is the converted firing time. told is the original firing time. Tout is the time out parameter as described in Section 2.5 and is 30 ms. This conversion effectively reweights the times so that the values are between 0 and 1.0 with larger values indicating earlier firing times. These values are then normalised by dividing by the total of the values for the neuron across all patterns it responded to, in order to allocate a ‘share’ of preference. An example calculation for one neuron after training is given in Table 5 to illustrate the method. The last column gives the final preference strength. Given that neurons should develop selectivity to an input during training, we expected that the number of patterns

Fig. 10. Plasticity resource traces for five runs for experiment 3.

responded to by neurons would decrease on average and the preference strength would increase. We averaged the number of patterns responded to and preference strengths over all neurons from all 5 runs and the summary statistics (to 2 decimal places) are presented in Table 6. A t-test performed on the raw data showed that the differences between the means for the before and after cases were statistically significant. As expected the number of patterns responded to does decrease, and the preference strength increases. Example visualisations of preference for typical individual neurons are shown in Fig. 11. In these plots the directions are presented as in a compass and the preference strength for each pattern is plotted at the relevant

18

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

Fig. 11. Four typical individual neuron preferences before and after training (radial scale is 0–1.0). Table 6 Experiment 3: average number of patterns responded to and preference strength before and after training.

Before training After training

Number of patterns

Preference strength

7.97 2.45

0.13 0.44

direction and the resulting area formed by all the preferences is shaded. The radial scale from the centre of the plot outwards is 0–1.0. The lighter shaded areas are the preferences before training and it can be seen that in all cases here the initial preferences are low as the neurons respond to all patterns with the firing times being the same. The cases after training (darker shaded areas) are quite different. At most, neurons respond to 2 or 3 patterns which are adjacent directions. In Fig. 11, the plot for neuron 247 (bottom right) shows a situation where the directions S and SW are responded to equally (the firing times are the same) and the direction SE less so (the firing time is later than for the other two directions). The plot for neuron 223 (bottom left) shows that only one pattern (NE) elicits a response. As the intended future use for this work is an autonomous robotic application it was also important to be able to identify robustly, from the whole cortical response, which directional pattern had caused the response. In the current work we have used a measure to compare whole cortical map responses based upon the van Rossum metric. Usually this is used in in-vivo neuroscientific work to compare two different spike trains from the same neuron, from different experimental runs and determine if it is in fact the same response within a certain tolerance (van Rossum, 2001). A spike train is the term used for a series of spikes generated over a specific time period. The van Rossum metric is a type of similarity score with low values indicating

Table 7 Van Rossum metric analysis of spatiotemporal responses (averages over 5 runs).

N NE E SE S SW W NW

N

NE

E

SE

S

SW

W

NW

0.0 – – – – – – –

0.86 0.0 – – – – – –

0.99 0.94 0.0 – – – – –

0.99 0.99 0.33 0.0 – – – –

0.99 0.99 0.99 0.99 0.0 – – –

1.0 0.99 0.99 1.0 0.65 0.0 – –

0.94 0.99 1.0 1.0 0.95 0.77 0.0 –

0.55 0.95 0.99 0.99 0.99 0.95 0.84 0.0

a greater match between two spike trains. In our work we use the measure in a slightly different way to compare different map responses (sets of neuron IDs and firing times). Full details of the van Rossum metric and how we have adapted it are given in the Appendix to this paper as here we concentrate purely on the results. For each of the trained networks from the 5 independent runs we presented 10 perturbed instances of each exemplar and collected the spatiotemporal responses (neuron IDs and firing times). The van Rossum scores were calculated for each possible pair of responses and the scores averaged over the runs. The scores (to 2 decimal places) are shown in Table 7. Here we can see that in each row of the table the lowest scores (of 0.0) happen when there is a perfect match (within the time tolerance of 1.5 ms used in calculating the van Rossum metric—see Appendix). In some cases adjacent patterns have relatively low scores (e.g. E–SE, N–NW) indicating some similarity but otherwise scores between different patterns are close to the maximum of 1.0, indicating dissimilarity. Given that the patterns presented are perturbed and that the scores are averaged over 5 runs, this indicates a robustness in distinctly identifying the patterns by cortical response over different initial network instantiations.

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

19

Table 8 Experiment 4: average number of patterns responded to, preference strength and response to individual patterns at various stages of training.

Start Step 640 End

Num

Pref

Patt 1

Patt 2

Patt 3

Patt 4

Patt 5

Patt 6

Patt 7

Patt 8

7.97 2.25 2.08

0.13 0.48 0.53

99.61 27.34 3.91

99.61 60.94 35.94

99.61 72.66 72.27

99.61 37.89 29.30

99.61 0.00 21.88

100.00 0.00 22.67

100.00 0.00 11.33

98.83 28.83 3.52

responding mainly to patterns 1–4 as expected, but also there is a response to pattern 8 which was not in the training set. The most likely explanation for this is that as patterns 1 and 8 are adjacent the perturbation process is creating sufficient inputs which are close to both and cause activation by the same neurons. At the end of training (bottom row), when the second half of the patterns have been presented, there is a further reduction in the average number of patterns responded to (and the average preference strength increases correspondingly). The adaptation of the training to the changed inputs can be seen in the responses to the eight patterns: There is now a response to all patterns and clearly some neurons have changed their preference to respond to the new inputs. 5. Conclusions Fig. 12. Plasticity resource traces for experiment 4.

4.4. Experiment 4 In Section 3.2 we stated that in our model the PR trace is allowed to increase or decrease depending upon the input activity and is thus more flexible than the learning rate used in a traditional SOM. For autonomous robotics applications it is crucial that the learning process is both unsupervised and adaptive so it can respond to changes in the environment without intervention to change parameters. To demonstrate that the PR does behave in the way expected and can accommodate a change in inputs, an experiment was run in two parts using the setup as described in experiment 3 (i.e. random presentation of patterns). In the first part of the experiment only patterns in the range North–SouthEast were presented and the training was run to conclusion (i.e. the PR trace was changing by only 0.01 units). Following this, the experiment was restarted using the saved network after pattern 640 (i.e. about halfway through) and then only patterns in the range South–NorthWest were presented and the experiment run to conclusion. Fig. 12 shows a composite PR trace with the first part of the experiment shown as a solid line and the second part as a dashed line. The behaviour for the first part of the experiment using the first half of the training patterns is as expected: the steady exponential decrease and levelling off as seen in previous experiments. In the second part of the experiment, when the inputs are changed after pattern 640 there is a clear bump in the PR trace and after pattern 800 the usual pattern of decrease continues. At the end of training for this half of the experiment the PR trace has settled at a higher value. To confirm if changing the inputs midway through a training session affected the neuron preferences, we collected data during the second part of the experiment on the average number of patterns responded to and the average preference strength (method as described for experiment 3) for the network before training, at step 640 (when only patterns 1–4 had been presented) and at the end of training (all patterns presented). In addition, we also calculated the percentage of neurons firing for each pattern. This data is shown in Table 8. The situation for the number of patterns responded to and preference strength is much the same as seen previously. Neurons in the untrained network (top row) respond to every pattern and thus have a weak preference. At step 640 (middle row), neurons are responding to less patterns and have a stronger preference. In terms of which patterns are responded to, Table 8 shows that the neurons are

The main achievement of this work has been the development of a method of self-regulation of learning in an SOFM controlled solely by input activity. With respect to autonomous robotics, this is an improvement over a traditional SOM as it is not necessary to make any assumptions about the amount of training data that will be required or define any learning rate schedule in advance. Furthermore as the learning rate is directly linked to input activity it has the possibility to increase as well as decrease during training to allow for some flexibility to accommodate changing input. The method used here also does not require the input data to be in any particular format (for example as vectors of values) and is not specific to any learning rule. The only requirement is that Long Term Potentiation and Long Term Depression should be occurring on the connections. The experimental results have shown that neurons become more selective during training and develop stronger preferences for particular directions. Where they have more than one preference, it is generally for adjacent directions (cf. Fig. 11). The global PR trace shows the same qualitative behaviour over different instantiations of the same experiment and with both fixed and randomly presented training data. Ideally more experiments need to be done to verify that the technique works for different types of cortical map and training inputs. Work applying the PR with visual maps using spiking input is currently in progress and shows the same behaviour and benefits as seen in the current work (results as yet unpublished). In this work we have not thoroughly addressed the issue of at exactly what value of the PR trace the training should be terminated. In the experiments presented here the learning was terminated when the PR value was changing by less than 0.01. It may be the case that a different value would be needed for different types of input data. This would need to be resolved before incorporation into a fully autonomous system which needs to handle a variety of different input data. We have also not addressed the possibility of the LTP changes exactly equalling LTD at some point early in the training where the ratio would be 1.0 and the PR 0.0 resulting in no learning. In the experiments we have done so far with different motor datasets and also visual map learning (as yet unpublished) we have never seen this happen, but it should be noted as a potential problem that may occur if the method is used with very different input data from different modalities. The results of experiment 4 show that there is adaptation to changing inputs with the current methods. The PR trace (controlling the learning) adapts when the inputs change, resulting in the ability of the neurons

20

S.V. Adams et al. / Neural Networks 44 (2013) 6–21

to reallocate their preferences to include new directions. Future work will explore structural plasticity (rewiring) in conjunction with functional plasticity (weight changes) using the PR resource method described here. Previous work combining both forms of plasticity is scarce and we believe it warrants more investigation as a method of making more adaptive maps. A final point is that the map described in this work is fairly small and more work needs to be done to prove that the methods are scalable to larger maps so that they can be implemented in neuromorphic hardware for implementation on real robots. Acknowledgements The authors would like to thank the anonymous reviewers, whose comments and suggestions have helped to improve this paper enormously. Appendix A.1. Analysis of spatiotemporal responses using the van Rossum metric According to van Rossum (2001), to calculate the metric, firstly the delta function associated with each spike in a spike train is replaced with an exponential function as shown in Eq. (10). f (t ) =

M 

H (t − ti ) e

−(t −ti ) tc

(10)

i

where: f (t ) is the modified spike train. M is the number of spikes in the original spike train. ti are their spike times. tc is a time constant. H is the Heaviside step function (H (x) = 0 if x < 0 and H (x) = 1 if x ≥ 0). Following the above operation, the integral of the difference squared between two modified spike trains gives a distance measure, as shown in Eq. (11). D2 (f , g )tc =

1 tc

∞



[f (t ) − g (t )]2 dt

(11)

0

where: D2 is the van Rossum distance metric. f (t ) and g (t ) are two modified spike trains. tc is a time constant in milliseconds which controls the amount of contribution to the integral based upon how far apart the corresponding spikes of f () and g () are. The distance metric as given above requires a normalisation with factor 2/M to get scores between 0 and 1.0. Low scores indicate that the distance between spike trains is short and there is a close match while high scores indicate a bigger difference. In the current work the van Rossum metric is used to compare sets of spikes generated by the cortical map in response to different patterns, and in particular to take into account both temporal and spatial aspects. The procedure is summarised as follows: 1. Input patterns are presented to the trained motor map to generate the response (a set of neurons firing at particular times). 2. Responses for pairs of input patterns are compared. 3. For every cortical neuron check whether it has fired in either or both responses:

a. If it has fired in either or both of the responses then the van Rossum metric is calculated using M = 1, tc = 1.25 ms. b. If it has not fired in either response it is ignored. 4. The final score is calculated as Eq. (12).

 VRpx_py =

N 

D (f _n, g_n) 2

 N

(12)

n=1

where: VRpx_py is the van Rossum score calculated from the response to two different input patterns px and py. N is the total number of output neurons. f _n and g_n are the spike responses of output neuron n to patterns px and py. This method implicitly takes into account spatial as well as temporal aspects of the responses as in step 3a the smallest score contributions will come from matching neurons (i.e. the same spatial location and firing times within 1.25 ms). In contrast if a neuron fires in only one response the score generated will be high. References Alamdari, A. (2005). Unknown environment representation for mobile robot using spiking neural networks. In Proc. of WEC, Transactions on Engineering, Computing and Technology. Berglund, E. (2010). Improved PLSOM algorithm. Applied Intelligence, 32, 122–130. Berglund, E., & Sitte, J. (2006). The parameterless self-organizing map algorithm. IEEE Transactions on Neural Networks, 17, 305–316. Bi, G., & Poo, M. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18, 10464–10472. Bohte, S., La Poutre, H., & Kok, J. (2002). Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks. IEEE Transactions on Neural Networks, 13, 426–435. Brette, R., Rudolph, M., Carnevale, T., Hines, M., Beeman, D., Bower, J. M., Diesmann, M., Morrison, A., Goodman, P. H., Harris, F. C., Zirpe, M., Natschläger, T., Pecevski, D., Ermentrout, B., Djurfeldt, M., Lansner, A., Rochel, O., Vieville, T., Muller, E., Davison, A. P., El Boustani, S., & Destexhe, A. (2007). Simulation of networks of spiking neurons: A review of tools and strategies. Journal of Computational Neuroscience, 23(3), 349–398. Brohan, K., Gurney, K., & Dudek, P. (2010). Using reinforcement learning to guide the development of self-organised feature maps for visual orienting. In The Proceedings of the International Conference on Artificial Neural Networks, ICANN 2010. Butz, M., Wörgötter, F., & van Ooyen, A. (2009). Activity-dependent structural plasticity. Brain Research Reviews, 60, 287–305. Chklovskii, D., Mel, B., & Svoboda, K. (2004). Cortical rewiring and information storage. Nature, 431, 782–788. Choe, Y., & Miikkulainen, R. (1998). Self-organization and segmentation in a laterally connected orientation map of spiking neurons. Neurocomputing, 21, 51–60. Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience: computational and mathematical modeling of neural systems. Cambridge Massachusetts: MIT Press. Froemke, R., & Dan, Y. (2002). Spike-timing-dependent synaptic modification induced by natural spike trains. Nature, 416, 433–438. Gaiarsa, J., & Ben-Ari, Y. (2006). Long-term plasticity at inhibitory synapses: a phenomenon that has been overlooked. In The dynamic synapse: molecular methods in ionotropic receptor biology. CRC Press. Gerstner, W. (1999). Spiking neurons. In Pulsed neural networks (pp. 1–53). MIT Press. Goodhill, G. (1993). Topography and ocular dominance: a model exploring positive correlations. Biological Cybernetics, 69, 109–118. Goodman, D. F., & Brette, R. (2008). Brian: a simulator for spiking neural networks in Python. Frontiers in Neuroinformatics, 2. Harris, A., Ermentrout, G., & Small, S. (1997). A model of ocular dominance column development by competition for trophic factor. PNAS, 94, 9944–9949. Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks, 15, 1063–1070. Jin, X., Lujan, M., Plana, L., Davies, S., Temple, S., & Furber, S. (2010). Modeling spiking neural networks on spiNNaker. Computing in Science and Engineering, 21(5), 91–97. Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of neural science (4th ed.). McGraw Hill. Kikuchi, M., Ogino, M., & Asada, M. (2004). Visuo-motor learning for behavior generation of humanoids. In The Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 521–526). Kohonen, T. (1984). Self-organisation and associative memory. Berlin: SpringerVerlag. Kohonen, T. (1995). Self-organizing maps. Berlin: Springer-Verlag.

S.V. Adams et al. / Neural Networks 44 (2013) 6–21 Maass, W. (1997). Networks of spiking neurons: the third generation of neural network models. Neural Networks, 10, 1659–1671. Marian, I. (2002). A biologically inspired model of motor control of direction. Dissertation University College Dublin. McCulloch, W., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 7, 115–133. Metta, G., Sandini, G., & Konczak, J. (1999). A developmental approach to visuallyguided reaching in artificial systems. Neural Networks, 12, 1413–1427. Miikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (1998). A self-organizing neural network model of the primary visual cortex. In Proceedings of the Fifth International Conference on Neural Information Processing (pp. 815–818). Miikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (2005). Computational maps in the visual cortex. New York: Springer. Morse, A., & Ziemke, T. (2009). Action, detection, and perception: a computational model of the relation between movement and orientation selectivity in the cerebral cortex. In The Proceedings of the 31st Annual Conference of the Cognitive Science Society. Ogino, M., Kikuchi, M., Ooga, J., Aono, M., & Asada, M. (2005). Optic flow based skill learning for a humanoid to trap, approach to, and pass a ball. In LNCS: vol. 3276. RoboCup 2004: Robot Soccer World cup VIII (pp. 323–334). Springer-Verlag. Paine, R., & Tani, J. (2004). Motor primitive and sequence self-organization in a hierarchical recurrent neural network. Neural Networks, 17, 1291–1309. Panchev, C., & Wermter, S. (2001). Hebbian spike-timing dependent selforganization pulse neural networks. In Proceedings of the World Congress on Neuroinformatics (pp. 378–385). Pham, D., Packianather, M., & Charles, E. (2006). A novel self-organised learning model with temporal coding for spiking neural networks. In Proceedings of Intelligent Production Machines and Systems. Raginsky, M., & Anastasio, T. (2008). Cooperation in self-organizing map networks enhances information transmission in the presence of input background activity. Biological Cybernetics, 98, 195–211. Ritter, H., Martinez, T., & Schulten, K. (1989). Topology-conserving maps for learning visuo-motor coordination. Neural Networks, 2, 159–168. Rodemann, T., Joublin, F., & Korner, E. (2004). Saccade adaptation on a 2 DOF camera head. In Proceedings of the 3rd Workshop on Selforganization of Adaptive Behaviour, SOAVE 2004. Ruf, B., & Schmitt, M. (1998). Self-organization of spiking neurons using action potential timing. IEEE Transactions on Neural Networks, 9, 575–578. Sala, D., Cios, K., & Wall, J. (1998). Self-organization in networks of spiking neurons. Australian Journal of Intelligent Information Processing Systems, 5, 161–170.

21

Schmidt, J. (1985). Formation of retinotopic connections: selective stabilization by an activity dependent mechanism. Cellular and Molecular Neurobiology, 15, 65–84. Serrano-Gotarredona, R., et al. (2009). CAVIAR: a 45k-neuron, 5M-synapse, 12Gconnects/sec AER hardware sensory-processing-learning-actuating system for high speed visual object recognition and tracking. IEEE Transactions on Neural Networks, 20, 1417–1438. Shah-Hosseini, H. (2011). Binary tree time adaptive self-organizing map. Neurocomputing, 74, 1823–1839. Shah-Hosseini, H., & Safabakhsh, R. (2000). TASOM: the time adaptive selforganizing map. In The Proceedings of the IEEE International Conference on Information Technology: Coding and Computing (pp. 422–427). Las Vegas, Nevada. Shah-Hosseini, H., & Safabakhsh, R. (2001). Automatic adjustment of learning rates of the self-organising feature map. Scientia Iranica, 8(4), 277–286. Silver, R., Boahen, K., Grillner, S., Kopell, N., & Olsen, K. L. (2007). Neurotech for neuroscience: unifying concepts, organizing principles, and emerging tools. Journal of Neuroscience, 27, 11807–11819. Song, S., Miller, K. D., & Abbott, L. F. (2000). Competitive Hebbian learning through spike-timing dependent synaptic plasticity. Nature Reviews Neuroscience, 3, 919–926. Thorpe, S., Delorme, A., & Van Rullen, R. (2001). Spike based strategies for rapid processing. Neural Networks, 14, 715–726. Thorpe, S. J., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522. Toussaint, M. (2006). A sensorimotor map: modulating lateral interactions for anticipation and planning. Neural Computation, 18, 1132–1155. van Ooyen, A. (2001). Competition in the development of nerve connections: a review of models. Network: Computation in Neural Systems, 12, 1–47. van Rossum, M. (2001). A novel spike distance. Neural Computation, 13, 751–763. van Rossum, M., Bi, G., & Turrigiano, G. (2000). Stable Hebbian learning from spike timing-dependent plasticity. Journal of Neuroscience, 20, 8812–8821. Vogels, T. P., & Abbott, L. F. (2005). Signal propagation and logic gating in networks of integrate-and-fire neurons. The Journal of Neuroscience, 25(46), 10786–10795. Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., & Gerstner, W. (2011). Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks. Science, 16, 1569–1573. Weliky, M., Bosking, W. H., & Fitzpatrick, D. (1996). A systematic map of direction preference in primary visual cortex. Nature, 379, 725–728. Willshaw, D. J., & von der Malsburg, C. (1976). How patterned neural connections can be set up by self-organisation. Proceedings of the Royal Society B: Biological Sciences, 194, 431–445.

Adaptive training of cortical feature maps for a robot sensorimotor controller

Adaptive training of cortical feature maps for a robot sensorimotor controller

Recommend Documents