Imitation learning with spiking neural networks and real-world devices

ARTICLE IN PRESS Engineering Applications of Artiﬁcial Intelligence 19 (2006) 741–752 www.elsevier.com/locate/engappai Imitation learning with spiki...

Download PDF

499KB Sizes 2 Downloads 70 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS

Engineering Applications of Artiﬁcial Intelligence 19 (2006) 741–752 www.elsevier.com/locate/engappai

Imitation learning with spiking neural networks and real-world devices Harald Burgsteiner Department of Information Engineering, InfoMed/Health Care Engineering, Graz University of Applied Sciences, Eggenberger Allee 11, A-8020 Graz, Austria Received 18 May 2006; accepted 21 May 2006 Available online 18 July 2006

Abstract This article is about a new approach in robotic learning systems. It provides a method to use a real-world device that operates in realtime, controlled through a simulated recurrent spiking neural network for robotic experiments. A randomly generated network is used as the main computational unit. Only the weights of the output units of this network are changed during training. It will be shown, that this simple type of a biological realistic spiking neural network—also known as a neural microcircuit—is able to imitate robot controllers like that incorporated in Braitenberg vehicles. A more non-linear type controller is imitated in a further experiment. In a different series of experiments that involve temporal memory reported in Burgsteiner et al. [2005. In: Proceedings of the 18th International Conference IEA/AIE. Lecture Notes in Artiﬁcial Intelligence. Springer, Berlin, pp. 121–130.] this approach also provided a basis for a movement prediction task. The results suggest that a neural microcircuit with a simple learning rule can be used as a sustainable robot controller for experiments in computational motor control. r 2006 Elsevier Ltd. All rights reserved. Keywords: Robotics; Learning; Spiking neural networks

1. Introduction Living and moving creatures most perfectly exhibit abilities that are the focus of many research areas. Some of them will be discussed in parts in this article: information processing of real-world sensors in real-time, learning of complex tasks on a variety of time scales and controlling actuators to move around in the habitat. The idea of creating artiﬁcial creatures that can move autonomously and exhibit a certain learning behaviour is not new. Many successful approaches have been found to accomplish some tasks that involve the areas mentioned above. These include e.g. the use of reinforcement learning techniques that reward or punish a subject for good or bad actions, respectively, genetic algorithms that ‘‘evolve’’ robots over many generations, Corresponding author. Tel.: +43 316 5453 6515; fax: +43 316 5453 96515. E-mail address: [email protected]. URL: http://www.igi.tugraz.at/harry/.

0952-1976/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2006.05.007

echo-state networks that incorporate recurrent artiﬁcial neural networks (ANNs) and ﬁnally also, approaches based on biologically realistic neural networks (i.e. ‘‘spiking neural networks’’, SNNs). Often combinations of these techniques are used. Some of them will be brieﬂy discussed in this introduction. One of the most fascinating approach seems to be the last one, since advances in the research would include to enable us to understand parts of the biological processes that occur in everyone of us. Furthermore, SNNs are proven to be computationally more powerful than conventional ANNs (Maass, 1997). This can be interpreted in a way that a neural network consisting of spiking neurons needs fewer nodes to solve a problem than an ANN. Hence, e.g. an implementation of a robot controller in silicon would need less space and also less electrical power. SNNs also exhibit other interesting features that recommend them for use in real-world robot control, like noise robustness and the need for only simple communication mechanisms, as spikes can be modelled as binary events that occur on certain time points. In contrast to that, ANNs in most cases require

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

742

high precision analog values that have to be communicated between single nodes of the network. Hence, controllers based on SNNs would be ideal candidates for autonomous robots. Still, known approaches for learning architectures and algorithms for SNNs rely mostly on time-consuming genetic algorithms. This will we explained in more detail in Section 1.2. Though an extensive search in literature has been done, no other usable implementations (apart from genetic algorithms) of complex spike-based controllers for robots that operate in real-time and employ learning mechanisms have been found. Hence, this article mainly focuses on two areas: (a) learning in biologically realistic neural networks and (b) robotics using neural network controllers.

a recurrent neural network. With this recurrency, activity can be retained by the network over time. This provides a sort of memory within the network, enabling them to compute functions that are more complex than just simple reactive input–output mappings. This is a very important feature for networks that will be used for generating adaptive behaviour in robotics, because in most cases the current behaviour of the robot is not solely a function of the current sensory input, but a function of the current and previous sensory inputs and also of the current and previous internal network states. This allows a system to incorporate a much richer range of dynamical behaviours. Many approaches have been elaborated on recurrent ANNs. Some of them are

1.1. Robotics based on ANNs

dynamic recurrent neural networks (with backpropagation through time, see Pearlmutter, 1995), radial basis function networks (when one views lateral connections also as a recurrency) from Bishop (1995), Elman networks (recurrency through a feedback from the hidden layer onto itself, see Elman, 1990), self-organizing maps (Kohonen, 2001), Hopﬁeld nets (Hopﬁeld, 1982) and the ‘‘echo-state’’ approach from Ja¨ger (2001) which will be covered in more detail in Section 1.3.

The term ANN commonly refers to a net of neurons that do not try to be biologically realistic, in that sense that they do not imitate behaviours of biological neurons (e.g. action potential generation, post-synaptic potential shaping, follow refractory periods, model dynamical synapses, etc.). Instead, one tries to model neurons that exhibit a computationally similar behaviour as a single biological neuron or even as a pool of biological neurons. Typical examples are multi-layer perceptrons, radial basis function networks or self-organizing maps. The common view of a neural network is that of a set of neurons plus a set of weighted connections (synapses in the biological context) between the neurons. Each neuron comes with a transfer function computing an output from its set of inputs. In multi-layer networks these outputs can again be used as an input to the next layer of neurons, weighted by the relevant synaptic ‘‘strength’’. Feedforward networks have only connections starting from external input nodes, possibly via one or more intermediate hidden node processing layers, to output nodes. Recurrent networks may have connections feeding back to earlier layers or may have lateral connections (i.e. to neighbouring neurons on the same layer). See Fig. 1 for a comparison of the direction of computation between a feed-forward and

...

...

...

k input neurons

...

...

...

n hidden neurons

..

... ..

... ..

..

..

..

..

..

m output neurons

Fig. 1. Comparison of the architecture of a feed-forward (left-hand side) with a recurrent artiﬁcial neural network (right-hand side); the grey arrows sketch the direction of computation.

In case of autonomous agents it is rather difﬁcult to employ strictly supervised learning algorithms for recurrent ANNs such as backpropagation, Boltzmann machines or learning vector quantization (LVQ), because the correct output is not always available or computable. It is also very difﬁcult to set the weights of a recurrent ANN directly for a given non-trivial task. Hence, other learning techniques have to be developed for ANN, that could simplify the learning process of complex tasks for autonomous robots. Two approaches—genetic algorithms and echo-state networks—will be reviewed in the next two subsections. 1.2. Genetic algorithms Genetic algorithms (Goldberg, 1989) are one approach that has been used widely to employ ANNs in robot control. The idea behind genetic algorithms in robotics is that one does not have to specify a controller or a target function directly (which would be required by e.g. a learning algorithm like backprop). Instead, one tries to let autonomous agents evolve ‘‘automatically’’ over many generations. In most cases, simple genetic algorithms are being applied. Floreano and Mondada (1996) give a good description of experiments with genetic algorithms in robotics (with a set-up similar to the one used in this article). In principle different parameters of the system that should evolve are coded in e.g. bitstring (as its ‘‘chromosome’’). An initial population with variations in these parameters is formed. Each individual has a given ‘‘lifetime’’ in which it can perform according to the settings in its chromosome, after which the performance is evaluated

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

(either in form of a ﬁtness function which in turn yields a similar task for an experimenter as a classical supervised learning algorithm or simple success/failure measures like task fulﬁlled/not fulﬁlled). After all individuals of a population are tested genetic operators are applied (selective reproduction, crossover, mutation) to produce the next generation. A major drawback of evolution as a method for learning in robotics is the time that is needed until a ﬁt controller for a non-trivial task evolves (Floreano and Mondada, 1996). This time even gets worse when one imagines that the more complex the structure of a controller gets, the more bits have to be encoded in the chromosome and evolution will take even longer. 1.3. Echo-state networks The term ‘‘echo-state’’ refers to a new method in analysing and learning recurrent neural networks. Echostate networks from Ja¨ger (2001) are in principle recurrent ANNs whose network architecture is shown on the righthand side of Fig. 1: k input units project arbitrarily onto a recurrent network of n units. These units form the ‘‘dynamic reservoir’’ from which the desired m outputs are combined. Recurrent neural networks represent a non-linear dynamical system with a high-dimensional internal state, which is driven by the input. Due to the recurrency the current state is ‘‘echoed’’ back into the network again. The complete internal state is determined by the current input and all past inputs that the network has seen so far. Hence, a history of (recent) inputs is preserved in such a network and can be used for computation of the current output. This was ﬁrst noticed by Gers et al. (2000) who developed a network with a specialized architecture to provide ‘‘long short-term memory’’. More common approaches without specialized network architectures widely use gradient descent-based learning algorithms to set the weights of the connections of the recurrent ANN. The idea behind echo-state networks (and the liquid state machine, LSM) is, that one does not try to set the weights of the connections within the pool of neurons but instead reduces learning to setting the weights of the readout neurons. This reduces learning dramatically and much simpler supervised learning algorithms which e.g. minimize the mean square error in relation to a desired output can be applied. No specialized gradient descentbased algorithms are necessary and one of the many available linear regression algorithms can be used. The weights for the connections within the dynamic reservoir can be chosen from a random distribution at the beginning of an experiment.

743

Early experiments for real-time robotics were based on special purpose hardware. The disadvantages are that in most cases very restricted or simpliﬁed models of spiking neurons could be used (i.e. no dynamic synapses, limited connection modelling, etc.) although it is possible to simulate quite large networks like in Hartmann et al. (1997). Further examples can be found in e.g. Eckmiller (1988, 1991). Hopﬁeld and Herz, (1995) provide empirical evidence that SNNs are powerful pattern recognition tools. Maass and Bishop, (1999) prove that SNNs are signiﬁcantly more powerful for problems in temporal coincidence detection and classiﬁcation than similar sized ANNs consisting of e.g. multi-layer perceptrons. A disadvantage of SNNs was, that due to the noncontinuous output function computed by such neuron models, standard learning algorithms based on gradient descent methods did not apply. Attempts to modify backpropagation and other methods have yielded little success. Hence, learning rules have so far concentrated mainly on unsupervised adaptation rules like Hebbian learning and other non-associative dynamic synaptic effects like in Maass and Bishop (1999). So far SNNs were not used widely as controllers for autonomous agents. However, recent evolutionary robotic experiments carried out by Floreano and Mattiussi (2001) have shown the potential power of SNNs for generating adaptive behaviour. A further new framework for computing and learning in SNNs was introduced in Maass et al. (2002) and will be covered in more detail in the following section. 2. LSM with real-time extensions The LSM from Maass et al. (2002) is a new framework for computations in neural microcircuits. The term ‘‘liquid state’’ refers to the idea to view the result of a computation of a neural microcircuit not as a stable state like an attractor that is reached. Instead, a neural microcircuit is used as an online computation tool that receives a continuous input that drives the state of the neural microcircuit. The result of a computation is again a continuous output generated by readout neurons given the current state of the neural microcircuit. Echo-state networks are very similar to the idea to the LSM. The main difference is that the LSM is based on a pool of biological realistic spiking neurons instead of the recurrent ANN network used by the echo-state networks. An example network architecture of a LSM—the model of a neural microcircuit together with six input neurons on the lefthand side and two output neurons on the most right-hand side—as it was used in the robot controller experiments in this article is shown in Fig. 5 of Section 4. 2.1. Neural microcircuit

1.4. Spike-based networks Computations with networks of neurons that are based on pulses are relatively new to applications in robotics.

The model of a neural microcircuit as it is used in the LSM is based on evidence found in Gupta et al. (2000) and Thomson et al. (2002). Still, it gives only a rough

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

744

approximation to a real neural microcircuit since many parameters are still unknown. The neural microcircuit is the biggest computational element within the LSM, although multiple neural microcircuits could be placed within a virtual model. In a model of a neural microcircuit N ¼ nx ny nz neurons are placed on a regular grid in 3D space. This can be seen e.g. in Fig. 5 where the model of a neural microcircuit represents the central element. The number of neurons along the x, y and z axis, nx , ny and nz , respectively, can be chosen freely. One also speciﬁes a factor to determine how many of the N neurons should be inhibitory. Another important parameter in the deﬁnition of a neural microcircuit is the parameter l. Number and range of the connections between the N neurons within the LSM are determined by this parameter l. The probability of a connection between two neurons i and j is given by 2

pði;jÞ ¼ C expDði;jÞ =l ,

(1)

where Dði;jÞ is the Euclidian distance between those two neurons and C is a parameter depending on the type (excitatory or inhibitory) of each of the two connecting neurons. There exist four possible values for C for each connection within a neural microcircuit: C EE , C EI ,C IE and C II may be used depending on whether the neurons i and j are excitatory (E) or inhibitory (I). In our experiments we used spiking neurons according to the standard leakyintegrate-and-ﬁre (LIF) neuron model that are connected via dynamic synapses. The time course for a post-synaptic current is approximated by the equation vðtÞ ¼ Ak et=tsyn , where Ak is a synaptic weight and tsyn is the synaptic time constant. The ‘‘weight’’ Ak of a dynamic synapse depends on the history of the spikes it has seen so far. According to the model from Markram et al. (1998), Ak —the absolute synaptic efﬁcacy for the kth spike—is given by the equations Ak ¼ A uk Rk , where uk ¼ U þ uk1 ð1 UÞ eDk1 =F , Rk ¼ 1 þ ðRk1 Rk1 uk1 1Þ eDk1 =D . In the equations above, A is the absolute synaptic efﬁcacy, U is the utilization of the synaptic efﬁcacy, D is the time constant that models the recovery from depression and F is the time constant for the recovery from facilitation. For synapses transmitting analog values (such as the output neuron in our experimental set-up) synapses are also simply deﬁned by their strength w (weight). Additionally, synapses for analog values can have delay lines, modelling the time a potential would need to propagate along an axon.

2.2. Real-time extensions of the LSM CSIM—the simulator for the LSM (Natschla¨ger et al., 2002) is a tool that was mainly designed for fast ofﬂine experiments, i.e. pre-recorded or synthetic data is used as input to the network and results are written to ﬁles that can be analysed afterwards. This work expands the possibilities of the simulator to do real-time experiments. A feature that is needed when one not only wants to use synthetic or ofﬂine input data to a SNN. The extension makes it possible to use data from arbitrary real-world sensors in real-time as inputs to a SNN. The specialized core of the LSM software computes the network’s dynamics. Any output of the network can then again be applied to realworld actuators, like motors that drive a robot in an experimental environment. During standard simulations of neural microcircuits one is usually interested in running the simulation as fast as it possibly can. A connection between the time passing in the simulation and the time passing meanwhile in the realworld is not necessary. This assumption does not hold anymore for robot experiments that incorporate real-world devices. Since the training of the readout neurons of a LSM is based on supervised learning algorithms one requires not only input data but also an appropriate target function. In e.g. Natschla¨ger et al. (2003) and Maass et al. (2002) the target vectors have been calculated in advance and the input data at a time t2 is not dependent from any output values at time t1 ot2 , i.e. the input is passive in relation to the output of the LSM and no feedback is in use. Hence, the state vector of the LSM xðtÞ :¼ f ðuðÞÞ and the output is deﬁned as f 1 ðxðtÞÞ. This mode of operation is often called ‘‘open loop’’ control system, with applications in e.g. prediction tasks and speech recognition (see Natschla¨ger et al., 2003). Additionally, it is often not possible to calculate a target function before an experiment is run in the realworld because a simulation of the whole environment would be a computational too complex task. Hence, it would enrich the possible input functions if one could use real-world sensory data as input to simulated robot controls. Since the controller inﬂuences these sensory readings, the timing of the real-world has to be considered in the simulations. This can be efﬁciently done with the real-time extensions of the LSM. In most cases real-time applications on standard personal computers require special purpose operating systems. The simulator for the LSM was developed under the free operating system Linux. Although so called ‘‘hard real-time’’ and ‘‘soft real-time’’ solutions are available for Linux, we decided to take a less restrictive—but also less real-time—approach. All common real-time extensions to Linux require a modiﬁcation of the kernel, as only this can guarantee ‘‘hard real-time’’ process scheduling. Our basic approach is illustrated in Fig. 2. Without the real-time extensions a simulation of a neural microcircuit will be processed as fast as possible, as it is shown in panel A of Fig. 2: every simulation time step Dt1 (e.g. equivalent to the

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752 Time

Usual simulation Δt1

Δt2

Δt2

Δt1

Δt2

Δt2

Δt1

Δt1

.....

.....

(A) Time

Simulation with real-time extensions Δt1

Δt1

Δt1

Δt1

.....

745

where also the dedicated time for the operating system itself is restricted, can only be reached through the use of the hard real-time kernel extensions. The second problem concerning CPU power and memory can in theory easily be addressed by employing fast and memory-rich workstations or by incorporating special purpose hardware that has been especially designed for the simulation of such neural microcircuits. Both possibilities impose an enormous cost factor. Thus, in our experiments we restricted the number of neurons in the neural microcircuit. 3. Experimental set-up

Δt2

wait

Δt2

wait

Δt2

wait

Δt2

wait

Δt2

.....

(B) Fig. 2. (A) Normal simulation: as long as the network size is small enough, the computer processes each simulation time step Dt1 in a time Dt2 oDt1 . (B) Real-time extension of the LSM: the simulation time is synchronized with the real-time by inserting wait states after each simulation step.

numerical integration time constant) is processed in a time Dt2 in real-world time. Note that we neglect here that Dt2 will vary in dependence of the current spiking activity in the network, i.e. since we incorporated partially an eventdriven simulation, more spiking activity implies more processor time necessary to complete a simulation time step. In this ﬁgure we assume that Dt2 oDt1 . Without this restriction real-time simulations would not be possible. As long as this assumption holds, one can insert wait states in the main loop of the simulator to adjust the time that passed in the simulation domain to the real-world time, as it is shown in Fig. 2B. The waiting in this case is performed with a simple loop in which the current system time in ms is polled. The real-time extensions were designed in a way that in the case that a simulation time step cannot be ﬁnished in the corresponding real-world time, the synchronization time-point will be delayed (i.e. it will be synchronized with the ending time of the last simulation time step). Hence, a limited number of failures will only have little impact on the overall simulation performance in the form of a short delay in total running time. 2.3. Limitations of this approach Our real-time approach implicitly relies on a computer system that can be solely used by our simulation software and the corresponding drivers. Eventually, also sufﬁcient resources concerning CPU power and memory have to be available. The ﬁrst problem can be addressed on two levels. First, by restricting other users and processes to consume CPU power or memory. This can easily be done by dedicating the hardware to a single simulation. Still, the operating system itself can potentially consume CPU time for management purposes. Hence, dedicating hardware does not completely solve this problem. The second level,

A framework consisting of the LSM and a standard miniature robot called Khepera was used to implement ‘‘imitation learning’’. The robot Khepera is well-known in the robot community and has been used in a variety of experiments, see e.g. Lo¨fﬂer et al., 1999. In this set-up a Khepera is ﬁrst steered by a programmed controller in a precisely deﬁned environment. All infrared (IR) sensor and motor speed data are recorded. These data are presented to a SNN simulated by the LSM. The goal is to imitate and generalize the behaviour by a controller now consisting of a trained SNN (more exactly by trained readouts of the network). The difﬁculty is, that the original pre-deﬁned behaviour is only available in the form of the previously recorded data and not in form of any rules that could be extracted out of the controller that was used before. In this chapter some experimental results with a controller for a real robot based on the foundations explained in the previous chapters are reported: The controller itself consists of a single column of a LSM. The framework of the simulator for the LSM is the version of CSIM extended by the real-time elements. Khepera, the miniature robot is used as the real-world device in all experiments. The weights that are connecting some spiking neurons from the neural microcircuit to the linear output units— which are directly connected to the motor units of the miniature robot Khepera—are the only parameters that are trained during the learning phase. The input to this LSM consists solely of the values that are read from the frontal sensors of the Khepera. Hence, the continuous input to the neural microcircuit is determined by the movement of the robot, which in turn is controlled by the LSM. The term ‘‘imitation learning’’ is often used to describe training procedures in which the behaviour that should be learnt is ‘‘shown’’ to the controller architecture in some form in order to imitate it afterwards. This principle is also used in the following experiments. This is usually called imitation learning to emphasize the difference to inverse modelling, where one constructs a certain model for a controller and tries to compute the corresponding

ARTICLE IN PRESS 746

H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

parameters with some algorithm. In imitation learning most commonly also the model of the controller itself is unknown and subject to ‘‘learning by viewing’’. For a comprehensive introduction to imitation learning see e.g. Billard (2000). In our case, the robot was ﬁrst controlled by two different algorithms, that were programmed to display an obstacle avoidance behaviour. During these initial runs, sensor and motor values were recorded. These values were later used in the training phase as the input and target vectors, respectively. 3.1. Braitenberg type controller For the ﬁrst experiments in robot control with a neural microcircuit the master models were restricted to simple reactive controllers. One of the controllers used in the experiments was a Braitenberg type controller with obstacle avoidance behaviour. Such controllers typically consist of two perceptrons with sigmoidal output functions fully connected with Khepera’s six frontal IR proximity sensors. A robot with such types of controllers can only move in a forward direction. Hence, the robot cannot collide with an obstacle at its back and the rear sensors are not taken into account during the experiments. The outputs of the two perceptrons are directly ‘‘wired’’ with the two motors. The 12 synaptic weights determining the share of each of the six frontal sensors to the two output neurons are set in the way that the robot exhibits an obstacle avoidance behaviour. An overview of the locations of the IR proximity sensors and the motors of a Khepera can be seen on the most right-hand side of Fig. 4. The values for the pairs of weights of the six input neurons from IR1 to IR6 are set to f0:3; 0:3g; f0:6; 0:6g; f1; 1g; f1; 1g; f0:6; 0:6g and f0:3; 0:3g, respectively. Each weight vector contains the synaptic strength for the connection to the left and to the right motor neuron. With the weights set to these values, the robot exhibits a sharp turn when an obstacle appears directly in front of it. The turn becomes the smoother, the more the obstacle is detected only with one of sensors located at the side of the Khepera. The speed of the left and right motor is computed by a linear sum over all IR proximity sensor values times the corresponding synaptic weight. This sum is used as the input to the sigmoidal squashing function. Additionally, the two neurons both get a small positive bias to provide the robot with a constant forward movement of about 5 cm per second for the case that no obstacles can be detected within the range of the proximity sensors. The values delivered by the six frontal proximity sensors are normalized in the range ½0; 1 where a value of 0 indicates no obstacle in the range and 1 denotes a very close obstacle. 3.2. Non-linear obstacle avoidance controller In a second set of experiments the Braitenberg controller was replaced with a non-linear reactive controller. In this case, the six input neurons are not directly connected to the

motor output neurons. Instead, the controller chooses between three types of movements: (i) straight forward movement when no collision can be detected on any of the sensors. In this case, the motor speeds are set to values of {5, 5} (the ﬁrst value is the speed for the left motor, the second one for the right motor), producing a speed of about 5 cm per second; (ii) a turn to the right on its location when an obstacle is detected with one of the left three sensors, resulting in a speed vector of f5; 5g; and (iii) a turn to the left when an obstacle is detected with one of the sensors on the right-hand side. Here, the motor speeds are set to f5; 5g. A collision is deﬁned as a sensor value crossing a threshold of about 90% of its maximum value. Initial experiments suggested this threshold to be reasonable. In case of a collision, the turns are carried out until all sensors report no further collision. In case of a full frontal collision where an obstacle is detected on both frontal sensors, a turn to the left was arbitrarily chosen to have a higher priority. An example of an obstacle avoidance behaviour with this controller is shown in Fig. 3, which will be explained in more detail in the next section. 3.3. Creation of training and test data All experiments are carried out in the same way. In a ﬁrst phase, sensor and motor data are collected while the robot is controlled with one of the previously described controllers. The collected data is then partitioned into segments with a single occurrence of an obstacle. These episodes are stored individually in a database. The beginning of an episode is deﬁned to be about 100 ms before the occurrence of an obstacle at any one of the six sensors after no obstacle could be detected for at least than 100 ms. The end of an episode is marked when all sensors signal that the obstacle is no longer in their range (i.e. when the values of all six frontal IR proximity sensors decrease beneath a threshold of about 2% of their maximum value which is beyond the usual noise level). Fig. 3 shows a typical example of the sensor readings of a Khepera when it detects an object in its way. In this case, the non-linear obstacle avoidance controller was used. The ﬁrst six rows show the courses of the frontal six IR proximity sensors from top to bottom. The bottom two rows show the motor speeds during the obstacle avoidance task. According to the plots, an obstacle is detected with the frontal two sensors. After a collision (more than 90% of the maximum value) is detected ﬁrst with sensor number 3 on the left-hand side of the robot, a turn to the right is initiated, which can be seen as the right motor speed is set from þ5 to 5. From the point of view of the robot, the obstacle now starts to move from the front to the left-hand side until no more collision is detected. At this time, the

ARTICLE IN PRESS

IR 1

1 0.5 0

IR 2

1 0.5 0

IR 3

1 0.5 0

IR 4

1 0.5 0

IR 5

1 0.5 0

IR 6

1 0.5 0

Left

5 0 -5

Right

H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

5 0 -5

747

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100 120 Time: 25ms per unit

140

160

180

200

Fig. 3. Typical sensor readings of Khepera’s six frontal IR proximity sensor when detecting and avoiding an obstacle in its way. In this case the Khepera is controlled by the non-linear obstacle avoidance controller. The plots labelled IR1 to IR6 show the courses of the six IR proximity sensors from left to right. The bottom two plots marked ‘‘Right’’ and ‘‘Left’’ show the motor, commands for the right and left motor, respectively.

right motor speed is set to þ5 again and the robot takes up its forward movement again. During the training phase, arbitrary episodes are selected from the database containing all episodes. Additionally, a silent phase with random length is inserted between every two episodes. This way, a huge number of different training data can be generated, without having to construct different environments after each training run. After the training phase, the ﬁtness of the network is tested with several more episodes again with random pauses in between. Up to this point the simulation can run as fast as possible to keep the training and test phase short (which is in practice about 1 min). After training and testing, the CSIM simulator is switched to real-time mode as explained in Section 2.2. The Khepera is now controlled by the SNN in a real-time mode and exhibits an identical behaviour as the behavioural template used during the generation of the test data. This will be shown in more detail in Section 5. 3.4. Process communication and data flow Fig. 4 shows the typical processes that are involved in a real-time experiment with CSIM (the simulator for the

neural microcircuit) and a Khepera. The rectangles with rounded corners labelled CSIM, Matlab and khepCom each display one of the processes involved. Note that CSIM actually is called from within Matlab. khepCom is the communication interface between CSIM and the Khepera that transmits motor commands from CSIM to the Khepera and IR proximity data from Khepera to CSIM. CSIM communicates with khepCom via a shared memory interface. The boxes labelled IR1 to IR6 , Left and Right in Fig. 4 each illustrate 4 bytes (long integer data type) of shared memory to hold the values for the six frontal sensor values and the left and right motor speeds, respectively. khepCom is the process that incorporates a standard RS-232C serial port to ﬁnally exchange the IR proximity and motor data with Khepera. The communication with Khepera is relatively slow compared to the data ﬂow between the other processes. The maximum transfer rate is limited by the Khepera hardware to 38 400 bits per second. Since the communication of commands and values with Khepera is done using ASCII coding in plain text, a full communication cycle including setting a motor speed and retrieving the current IR proximity values takes approximately 18 ms. Initial experiments suggested that a

ARTICLE IN PRESS 748

H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752 IR1

CSIM

IR2 IR3 IR4 IR5 IR6 Left Right

khepCom

Matlab Shared Memory

Khepera

Fig. 4. Involved processes and the information ﬂow in a typical real-time experiment with Khepera and CSIM.

time window of 25 ms yielded a stable and reliable periodic communication with Khepera. Thus, in all experiments the communication with Khepera was synchronized within khepCom to a period with a duration of Dt ¼ 25 ms. The task of khepCom is to provide the communication of sensor and motor speed values between Khepera and the simulation process. This is accomplished using POSIX conform shared memory. The real-time extension of CSIM uses the same part of the physical memory. Both processes can read and write at the same time. In this case no precautions to avoid write conﬂicts are necessary, because the number of memory slots reserved for the IR proximity values will exclusively be written by khepCom and read by CSIM. In contrast, the memory slots available for the motor values will only be written by CSIM and read by khepCom. This ﬂow of information is also illustrated with the grey arrows in Fig. 4. 4. Parameter settings of the neural microcircuit 4.1. Parameter values of the intra-column neurons Fig. 5 shows a typical LSM architecture as it was used in the imitation learning experiments. For these experiments, neural microcircuits consisting of 54 LIF neurons arranged on the regular grid points of a 3 3 6 cube in 3D were considered. About 20% of these neurons were randomly chosen to be inhibitory (these can be seen as the darker neurons in Fig. 5). The rest consisted of excitatory neurons. The probability of a synaptic connection between any two intra-column neurons is modelled as in Eq. (1). This way, the probability distribution is chosen to prefer local connections. In the imitation learning experiments we set l ¼ 2. The remaining parameter C of Eq. (1) is depending on the types—excitatory (E) or inhibitory (I)—of the connected neurons: C EE ¼ 0:3, C EI ¼ 0:4, C IE ¼ 0:2 and C II ¼ 0:1. These values are chosen according to Gupta et al. (2000). The parameters determining the dynamical behaviour of the intra-column neurons and synapses were chosen from data given in Gupta et al. (2000) and Markram et al. (1998). These data are based on neural microcircuits in the rat somatosensory cortex. The membrane time constant of all neurons is 30 ms (tm ¼ Rm C m , with the input resistance Rm ¼ 1 MO and the input capacitance

Fig. 5. A LSM as it is used in the imitation learning experiments: the 3 3 6 spiking neurons of the LSM are placed on a regular 3D grid. Additionally, six linear input neurons are used for sensor input (left-hand side) and two linear output neurons are used for motor output. All randomly generated connections for l ¼ 2 are shown.

C m ¼ 30 nF), the absolute refractory period is set to 3 ms for excitatory neurons and 2 ms for all inhibitory neurons. Presuming a membrane resting potential of 0 V, the threshold of each neuron is set to 15 mV and the reset voltage for each neuron is drawn from a uniform distribution in the interval ½13:8 mV; 14:5 mV. Additionally, a non-speciﬁc background current I b is again drawn from a uniform distribution in an interval ½13:5 mA; 14:5 mA for each individual neuron. 4.2. Parameter values of the intra-column synapses The synapses connecting the neurons inside the column are modelled with short-time dynamics according to Markram et al. (1998). In his model, the parameters U, D and F determine the utilization of the synaptic efﬁcacy, the time constant of depression (in [s]) and the time constant for facilitation (in [s]), respectively. The actual parameters are set to values chosen from a Gaussian distribution with a standard deviation of 50% of its mean.

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

The mean values again depend on the types of the neurons which a particular synapse connects. For EE-type connections the mean values for U, D and F are 0.5, 1.1 and 0.05. IE-type connections have mean values of 0.05, 0.125 and 1.2. EI-type connections are modelled with mean values of 0.25, 0.7 and 0.02. Finally, the dynamical parameters U, D and F for II-type connections have mean values of 0.32, 0.144 and 0.06. These mean values are were again selected according to Gupta et al. (2000). The post-synaptic currents of all synapses are modelled with a simple exponential decay et=ts with ts ¼ 3 ms for excitatory and ts ¼ 6 ms for inhibitory synapses. The transmission delays between every two liquid neurons were uniformly chosen to be 1:5 ms for EE-type connections and 0:8 ms for all other connections. 4.3. Feeding sensory input into the neural microcircuit For the experiments the values of the six frontal IR proximity sensors have to be fed continuously into the liquid. Therefore, six linear input neurons with the ability to receive external input in real-time were used. These types of neurons can inject a current directly into post-synaptic neurons. In our experiments, static analog synapses were used to connect the excitatory input neurons to speciﬁc neurons in the neural microcircuit. We used a simple form of spatial coding to connect the input neurons to column. As it is illustrated in Fig. 5 each input neuron projects its input current onto three unique neurons placed at the most frontal layer of the column. Each input neuron corresponds to a certain Khepera sensor. Hence, the visual ﬁeld of the six sensors from left to right project their information onto the corresponding input neurons from bottom up. 5. Results The basic task in the experiments was to teach the generic neural microcircuit described in the previous chapters to imitate a behaviour that was demonstrated and recorded by the controller architectures introduced in the previous section. During the phase of collecting real-world data—when Khepera was controlled by the Braitenberg type vehicle— n ¼ 1859 encounters with obstacles were recorded and fed into the database. For each run of the experiment, 500 of those episodes are randomly chosen to be training examples and 125 more are randomly saved for the later use as test data. The training episodes are then connected into a single long data stream with arbitrary pauses between each two obstacle encounters. The maximum pause that can occur was set to be 25 s. The same is valid for the test data stream. Using this technique to generate different training and test examples, the resulting training and test data streams consisted of about 80,000 and 20,000 sensor and motor recordings, respec-

749

tively. With the Khepera communication time step of Dt ¼ 25 ms, this training data stream is equivalent to a real-time experiment with a duration of about 2000 s and a test run of about 500 s. For a neural microcircuit of this size and connection density the simulation of 2000 s takes about 1 min. Thus the reason becomes clear, why it was chosen to use randomly generated simulated runs out of a database instead of real-time experiments for a training run. Each single experiment was carried out identically: ﬁrst, the training data stream was fed into the simulated neural microcircuit via the six input neurons. The spike times of all intra-column neurons were recorded. This—in this case 54 dimensional—vector of spiking times, folded with the model of the post-synaptic current et=ts is referred to the state vector xðtÞ of the LSM. The two readouts of the neural microcircuit model are trained by linear regression to output the target wheel speeds of the left and right motors as imposed by the Braitenberg type controller. After training, the calculated weights wL and wR are set for each synapse and the test data stream is connected to the input neurons. We used linear optimization to optimize the corresponding weights because it is a simple and fast training algorithm that guarantees to ﬁnd a global optimal solution and it cannot get stuck in local minima. For these types of learning problems, linear optimization yielded good results. Hence, it is not necessary to employ a nonlinear training algorithm. During the test phase (and when switched to real-time mode) both readout neurons are simply modelled as linear gates, with their weight vectors wL and wR applied to the liquid state xðtÞ of the neural microcircuit. The weight vectors wL and wR are ﬁxed after training and remain constant during the whole test run and later on. Finally, when running in real-time mode, the outputs of both readout neurons are directly communicated as the motor commands via the khepCom process to the Khepera. The motor commands generated during the test phase are recorded and compared to the target values of the Braitenberg type controller. For a session of 50 experiments, the average values of the correlation coefﬁcient between the recorded and the target vectors are 0:9334 for the left motor and 0:9338 for the right motor. A snapshot of 7:5 s of one of the experiments is shown in Fig. 6. The dotted line displays the target values of the motor control as imposed by the Braitenberg type controller. The solid line represents the output of the readout neurons during the test run after training. The upper plot illustrates the results for the left motor control, while the lower plot depicts the same time slot for the right motor control. Since the motor of the Khepera acts as a low-pass ﬁlter the actual robot movements during the realtime runs become smoother than the plots in Fig. 6 might suggest. Experiments with the non-linear controller yielded similar results. In this case the correlation coefﬁcients for the left and right motor are 0:8130 and 0:8263, respectively.

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

750 10

Target Output Left motor

5

0

-5

-10

0

50

100

150 Time: 25ms per unit

200

250

300

20 Target Output Right motor

15

10

0

0

0

50

100

150 Time: 25ms per unit

200

250

300

Fig. 6. A snapshot of 7.5 s of one of the Braitenberg type experiments, illustrating the motor control values of the behavioural template (dotted lines) compared to the output of the readout neurons (solid lines) during the testing phase after training. 10 Target Output Left motor

5

0

-5

-10

0

50

100

150 Time: 25ms per unit

200

250

300

10 Target Output Right motor

5

0

-5

-10

0

50

100

150 Time: 25ms per unit

200

250

300

Fig. 7. A snapshot of 7.5 s of one of the non-linear type controller experiments, illustrating the motor control values of the behavioural template (dotted lines) compared to the output of the readout neurons (solid lines) during the testing phase after training.

Fig. 7 again shows a snapshot of the test run with a length of 7.5 s. It illustrates that it is ‘‘harder’’ for the LSM to imitate a non-linear behaviour like the step-functions that

are incorporated in this type of controller. Still, due to the low-pass ﬁlter characteristics of a DC-motor, the desired behaviour of Khepera is recognizable.

ARTICLE IN PRESS H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

6. Discussion We have presented a new approach for robotic learning systems that operates not only in simulation but with realworld devices. It revealed, that a single neural microcircuit sufﬁces as the main computational unit for many robotic controlling tasks. Joshi and Maass (2004) show further results of experiments based on this controller type for a simulated two-joint robot arm. They use a closed loop controller to control the behaviour of the arm. In contrast to our open loop set up, this approach yields a more stable controller for many robotic tasks but it is more complicated to set up, since the feedback has to be precisely controlled to satisfy the well-known BIBO-criteria (bounded input, bounded output). Also, this set-up has still to prove the stability in real-world experiments with noise and synchronization problems. Jordan and Wolpert (1999) give an excellent introduction to computational motor control. They describe different types of controllers for robotic systems—including open and closed loop controllers—that could also be incorporated in further experiments with a neural microcircuit in robotic control tasks. One of the goals of our experiments was not only to give results for simulations of robotic activity but to use realworld devices. Thus, many problems involving noisy sensory readings, synchronization of the neural network simulation and the real-time and full-duplex communication between the simulator and a real-world device had to be solved. Therefore the CSIM simulator available at http://www.lsm.tugraz.at/ had to be enhanced by real-time extensions to allow these kinds of experiments. The results of this article now also provide a common platform available for experiments involving SNNs and real-world devices. The generic communication and synchronization approach does not limit this platform to robot control. Instead, one can use arbitrary real-world sensors and actuators as input to and output from a SNN. A compromise still exists in the current set-up. Currently a simple waiting loop is used in the CSIM to achieve synchronization with the real-time. This approach does not guarantee to provide communication at ﬁxed time points for two reasons: ﬁrst, other users can start computational intensive processes and our simulation will thus have less processor time available. This will slow-down the whole simulation and the same computer will only be able to provide real-time simulation for much smaller circuits with less connectivity. Here, exclusive usage of a computer will provide a simple solution. The second problem is more difﬁcult to solve. The underlying operating system has a higher priority in scheduling tasks, e.g. for important system processes. Therefore it could occur that the simulator has to wait much longer than intended in some situations—even in a single-user mode—and eventually some synchronization points could be missed. For this problem only the use of hard real-time operating system (a modiﬁcation for the Linux kernel is available) would

751

provide a solution. In the current implementation, it was decided that the actual system time when the simulation tasks gets control from the operating system again is taken as the new synchronization point. Hence, further synchronization time points will simply be moved into the future. An alternative solution would be to shorten the next waiting loops to catch up the previous delays. This method will provide more constant synchronization with some jitter that is dependent on the current system activity. The disadvantage is, that this will only work for shorttime higher system loads. When too many waiting loops have to be shortened, the time to simulate the SNN would not be sufﬁcient any more and the simulation would have to stop. A drawback of the current implementation of the imitation learning experiments is that learning has to be broken up in many separate parts: a phase where the robot is controlled by e.g. a programmed controller and all sensor and motor values are recorded. Then the weights of the readout neurons are calculated with linear optimization. Next, these weights are set in the simulated neural microcircuit. Eventually, the robot can be controlled by the simulated SNN. A more intuitive approach to imitation learning would be to be able to control a robot with a programmed controller or even with a joystick and the learning is done in parallel. Since linear optimization sufﬁces at least for simple types of robot controllers, available online versions of this algorithm could be incorporated to achieve this kind of imitation learning. Online versions of the least mean squared (LMS) and the recursive least squares (RLS) algorithms provide promising simplicity with arguable computational costs. The behaviours used as templates for learning experiments in this article were restricted to simple reactive controllers. These controllers also do not fully expose the inherent temporal integration capabilities of the LSM. A LSM with a single neural microcircuit is able to generate more complex behaviours e.g. with delays between sensors readings and motor control output as it is shown in Joshi and Maass (2004). Further experiments that also require temporal memory and partially use the real-time framework from this article have been conducted in Burgsteiner et al. (2005) and Burgsteiner (2005). Theoretically, the idea of using neural microcircuits is even more extensible. A single neural column can be used to provide a liquid state for many different readouts that can be trained to expose different controls. Examples for several different readout functions from a single neural column are given in Maass et al. (2002). Furthermore, one could use many similar neural microcircuits, each trained to output certain functions in time. The outputs can then be used again as input to further neural microcircuits. The organization of mammal brains suggest that such hierarchies of neural microcircuits are used to generate complex control behaviours. The framework developed during the work for this article provides a generic basis for future experiments testing such hypothesis.

ARTICLE IN PRESS 752

H. Burgsteiner / Engineering Applications of Artificial Intelligence 19 (2006) 741–752

References Billard, A., 2000. Learning motor skills by imitation: a biologically inspired robotic model. Cybernetics and Systems 32 (1–2), 155–193. Bishop, C., 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK. Burgsteiner, H., 2005. In: Proceedings of the Ninth International Conference on Engineering Applications of Neural Networks, Lille, France, pp. 129–136. Burgsteiner, H., Kro¨ll, M., Leopold, A., Steinbauer, G., 2005. In: Proceedings of the 18th International Conference IEA/AIE. Lecture Notes in Artiﬁcial Intelligence. Springer, Berlin, pp. 121–130. Eckmiller, R., 1988. In: Eckmiller, R., v.d. Malsburg, C. (Eds.), Neural Computers. Springer, Berlin, Heidelberg, New York, pp. 359–370. Eckmiller, R., 1991. In: Kohonen, T., Ma¨kisara, K., Simula, O., Kangas, J. (Eds.), Proceedings of the International Conference Artiﬁcial Neural Networks (ICANN’91), vol. 1. Elsevier, Amsterdam, pp. 345–350. Elman, J., 1990. Finding structure in time. Cognitive Science 14, 179–211. Floreano, D., Mattiussi, C., 2001. In: Proceedings of the International Symposium on Evolutionary Robotics From Intelligent Robotics to Artiﬁcial Life. Lecture Notes In Computer Science. Springer, Berlin, pp. 38–61. Floreano, D., Mondada, F., 1996. Evolution of homing navigation in a real mobile robot. IEEE Transactions on Systems, Man, and Cybernetics—Part B 26, 396–407. Gers, F., Schmidhuber, J., Cummins, F., 2000. Learning to forget: continual prediction with LSTM. Neural Computation 12 (10), 2451–2471. Goldberg, D., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA. Gupta, A., Wang, Y., Markram, H., 2000. Organizing principles for a diversity of GABAergic interneurons and synapses in the neocortex. Science 287, 273–278. Hartmann, G., Frank, G., Scha¨fer, G., Wolff, M., 1997. In: Proceedings of MicroNeuro 97. Dresden, pp. 130–139.

Hopﬁeld, J., 1982. In: Proceedings of the National Academy of Science, vol. 79. pp. 2554–2558. Hopﬁeld, J., Herz, A. , 1995. In: Proceedings of the National Academy of Science, vol. 92. pp. 6655–6662. Ja¨ger, H., 2001. Technical Report GMD Report 148, German National Research Center for Information Technology. Jordan, M., Wolpert, D., 1999. In: Gazzaniga, M. (Ed.), The Cognitive Neurosciences. MIT Press, Cambridge, MA. Joshi, P., Maass, W., 2004. In: Proceedings of BIO-ADIT. pp. 258–273. Kohonen, T., 2001. Self-Organizing Maps, Third ed. Springer, Berlin. Lo¨fﬂer, A., Mondada, F., Ru¨ckert, U. (Eds.), 1999. Experiments with the Mini-Robot Khepera: Proceedings of the First International Khepera Workshop. HNI-Verlagsschriftenreihe, vol. 64, Heinz-Nixdorf-Institut. Maass, W., 1997. Networks of spiking neurons: the third generation of neural network models. Neural Networks 10, 1659–1671. Maass, W., Bishop, C. (Eds.), 1999. Pulsed Neural Networks. MIT Press, Cambridge, MA. Maass, W., Natschla¨ger, T., Markram, T., 2002. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Computation 14 (11), 2531–2560. Markram, H., Wang, Y., Tsodyks, M., 1998. Differential signaling via the same axon of neocortical pyramidal neurons. Proceedings of the National Academy of Science 95 (9), 5323–5328. Natschla¨ger, T., Na¨ger, C., Burgsteiner, H., 2002. hhttp://www.lsm. tugraz.at/i. Natschla¨ger, T., Markram, H., Maass, W., 2003. In: Ko¨tter, R. (Ed.), Neuroscience Databases. A Practical Guide. Kluwer Academic Publishers, Boston, pp. 123–138. Pearlmutter, B., 1995. Gradient calculation for dynamic recurrent neural networks: a survey. IEEE Transactions on Neural Networks 6 (5), 1212–1228. Thomson, A., West, D., Wang, Y., Bannister, A., 2002. Synaptic connections and small circuits involving excitatory and inhibitory neurons in layers 2–5 of adult rat and cat neocortex: triple intracellular recordings and biocytin labelling in vitro. Cerebral Cortex 12 (9), 936–953.

Imitation learning with spiking neural networks and real-world devices

Imitation learning with spiking neural networks and real-world devices

Recommend Documents