Principles and networks for self-organization in space–time

Principles and networks for self-organization in space–time

Neural Networks 15 (2002) 1069–1083 www.elsevier.com/locate/neunet 2002 Special Issue Principles and networks for self-organization in space – time ...

784KB Sizes 3 Downloads 35 Views

Neural Networks 15 (2002) 1069–1083 www.elsevier.com/locate/neunet

2002 Special Issue

Principles and networks for self-organization in space – time Jose Principea,*, Neil Eulianob, Shayan Garania a

Computational NeuroEngineering Laboratory, Department of Electrical Engineering, University of Florida, Gainesville, FL 32611, USA b NeuroDimension, Inc., 1800 N. Main Street, Gainesville, FL 32609, USA

Abstract In this paper, we develop a spatio-temporal memory that blends properties from long and short-term memory and is motivated by reaction diffusion mechanisms. The winning processing element of a self-organizing network creates traveling waves on the output space that gradually attenuate over time and space to diffuse temporal information and create localized spatio-temporal neighborhoods for clustering. The novelty of the model is in the creation of time varying Voronoi tessellations anticipating the learned input signal dynamics even when the cluster centers are fixed. We test the method in a robot navigation task and in vector quantization of speech. This method performs better than conventional static vector quantizers based on the same data set and similar training conditions. q 2002 Elsevier Science Ltd. All rights reserved. Keywords: Self-organisation; Space–time memory; Reaction diffusion

1. Introduction 1.1. Signal time structure and memories The simplicity with which biological systems are able to navigate and gather information from a time varying, and unknown environment is totally misleading, when one attempts to develop the computational principles for learning and self-organization in space – time. It is now clear that interpretation of information carried in the time structure of signals requires at least two types of memory: long and short-term memory. Long-term memory represents accumulated evidence translated in system parameters, even when the stimulus is long gone. The long-term memory models studied in neurocomputing have been mostly static, i.e. the response of the system depends only on the present input. The most common classes of long-term memory models are the content addressable (associative) memories (CAMs) (Anderson, 1983), vector quantizers (VQ) (Garani & Principe, 2001a), and self-organizing maps (SOMs) (Kohonen, 1997), which work as distributed table lookups. These models encode input patterns in weight matrices with very simple Hebbian principles, using either a global inner product rule (Anderson, 1983) or a local competitive rule (Kohonen, 1997). They are easily implemented in digital computers, but are also biological plausible when implemented as coupled first order dynamical systems as * Corresponding author.

pointed out by Grossberg (1976). A powerful view of longterm memory as point attractors of dynamical systems was advanced and studied by Hopfield (1982). However, their extension to the recognition of sequences was not successful (Kleinfeld & Sompolinsky, 1998). Once the system falls into a stable state, it collapses the rich information of all the recent past into the stable state, so context is lost. Hence, another type of memory, the short-term memory, is indispensable to extract information embedded in the structure of time series (Elman, 1990). Short-term memory serves to disambiguate or reinforce current stimulus using information from the recent past history of the time series (also called contextual information). It became clear through these studies that short-term memory is really an intrinsic property of dynamical systems associated with the reverberation within the system of past input information. Hence, static networks do not possess short-term memory, and dynamical systems with first order singularities cannot independently handle both long and short-term memory. Dynamical systems with non-convergent dynamics created by higher order singularities (chaotic attractors) have been recently proposed as biological realistic models for associative memories (Freeman, 1992), and potentially they can implement both short and long-term memory. However, understanding these systems as information processing models is a current research topic outside the scope of this paper (see Freeman (1992)). The neural models for short-term memory can be basically divided into two groups (Principe, Euliano, & Lefebvre, 2000): time-to-space mappings originated in the

0893-6080/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 8 9 3 - 6 0 8 0 ( 0 2 ) 0 0 0 8 0 - 1

1070

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

signal processing community, and the concept of local feedback. The cascade of ideal delay elements of the time delay neural network (TDNN) represents one example of the first type (Lang & Hinton, 1988), while the contextprocessing element (PE) (Elman, 1990; Jordan, 1986) is a good example of the latter. The gamma memory is a clever combination of both (Principe, deVries, & Oliveira, 1993), since it is a cascade of first order context PEs. These models have been extensively utilized in supervised training applications because when combined with MLPs or RBFs they became non-linear extensions of Wiener filters (Principe, 2001). It is also well known in neurocomputing and system theory that global feedback represents implicitly past information, but it has been very difficult to utilize fully recurrent neural networks with global feedback for sequence recognition, although many interesting architectures and examples have been given for grammar recognition (Giles, Sun, Chen, & Chen, 1990), time series prediction and system identification (Narendra & Parthasarathy, 1990). The problem of vanishing gradients that plague these models have not been fully resolved, although the NARX (Lin & Giles, 2001), and the LSTM (Hochreiter & Schmidhuber, 1997) are promising architectures. Since most of the real life applications (from robotics to sensory processing) benefit from paradigms that are capable of processing spatio-temporal signals (Christodolou, Clarkson, & Taylor, 1995; Euliano, 1998; Garani & Principe, 2001a; Kangas, 1990), biology must have found a productive way to self-organize in space – time. However, as we stated above, the present neurocomputing models for self-organization (associative memories and SOMs) are inherently static, with time taking a secondary role in the architectures. In fact, most spatio-temporal models for selforganization are a simple integration of the above mentioned short-term memory mechanisms with SOMs. The key concept is that the temporal order within the input patterns is made available in a preprocessor to the SOM and the system is trained using unsupervised Hebbian learning rules to form self-organized memories. In one of the simplest architectures to represent temporal information, multiple copies of the input signal at various time delays are created (Kangas, 1990; Principe, Wang, & Motter, 1998). Subsequently, these vectors can be presented to a static clustering algorithm for processing. These models convert the temporal dynamic information into static spatial information outside the network and produce an increase in the dimensionality of the network size (linear increase with depth of memory). A biological plausible alternative is the use of leaky integrators with SOMs (Christodolou et al., 1995), which provide a low memory resolution but offer long memory depth. The SARDNET architecture adds exponential decays to each PE such that the sequences of PE firings can be reconstructed (James & Miikkulainen, 1995). Chappel and Taylor (1993) added the activity of leaky integrators that store the PE firings to the typical spatial distance to determine the winner, but the training was too

brittle. This problem was improved in the recurrent SOM (Cristley, 1994), where the direction of the error vector was preserved. Other combinations of feedback and PEs activations were attempted (Barreto & Araujo, 1999; Kemke & Wichert, 1993), but did not bring new understanding to the role of time and space in self-organization. Hence, we believe it is critical to rethink the principles of self-organization in space – time. 1.2. Self-organization with reaction diffusion We submit that a more principled approach for selforganization in space – time is to exploit reaction– diffusion (R – D) mechanisms. Reaction diffusion couples temporal gradients with spatial Laplacians, and it explains how local interactions can give rise to global spatial structure. Reaction diffusion is a powerful concept that has been utilized extensively to explain pattern formation in biology (Murray, 1989), since its introduction by Turing (1952), and may be a basis for local and mesoscopic information processing in the central nervous system. According to the Fitzhugh – Nagumo neuron model, R – D explains the propagation of the action potential down the axon. The gas nitric oxide (NO) may be involved in many central nervous system processes, such as the modification of synaptic strength the primary mechanism for learning (and most commonly used in ANNs). Neurons produce NO postsynaptically after depolarization. The NO diffuses rapidly (3.3 £ 1025 cm2/s) and has a long half-life (, 4 – 6 s), creating an effective range of at least 150 mm. Large quantities of NO at an active synapse strengthen the synapse (called Long Term Potentiation, or LTP). If the NO level is low, the synaptic strength is decreased (Long Term Depression or LTD) even if the site is strongly depolarized. NO is thus commonly called a diffusing messenger as it has the ability to carry information through diffusion, without any direct electrical contact (synapses) over much larger distances than normally considered (non-local). The NO diffusion and non-linear synaptic change mechanism has been shown to be capable of supporting the development of topographical maps without the need for a Mexican Hat lateral interaction. This seems to be a more biologically plausible explanation of the short-range excitation and longrange inhibition than the preprogrammed weights of synaptic connections, which are typically assumed to implement the same effect. In addition to the possibility of lateral diffusive messenger effects, the long life of NO can produce interesting temporal effects. Krekelberg has shown that NO can act as a memory trace and allow conversion of input temporal correlations into spatial connection strengths (Krekelberg & Taylor, 1996). NO may also be linked with the sequential firing of ‘place cells’ in the hippocampus, when the animal moves through a familiar environment. Our group has been very interested in the application of R – D to spatio-temporal memories (Euliano, 1998; Euliano & Principe, 1996; Garani & Principe, 2001a; Garani &

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

Fig. 1. A 2D SOMTAD architecture.

Principe, 2001b). See also Kargupta and Ray (1994) for the use of R – D for a sequence processor, and other applications of R–D to information processing (Cunningham & Waxman, 1994; Reiss & Taylor, 1991; Ruwisch, Bode, & Purwins, 1993). Much theoretical development is still needed because R – D as a model for spatio-temporal memories differs both from the other short-term memory models and from the common study of the conditions that ensure creation of traveling waves. In Section 2, we present a description of the SOM with temporal activity diffusion (SOMTAD) model architecture, followed by the neural gas with temporal diffusion (GASTAD) in Section 3. In Section 4, we include all the experimental results validating the theoretical results.

2. The SOMTAD model 2.1. General The SOMTAD architecture is based on two main elements: the self-organization of similar PEs into clusters, and the diffusion of information over time and space. Most temporal neural networks use short-term memory to transform time into space. The neural network operates as if the temporal pattern was simply a much larger spatial pattern. This is clearly inefficient. The SOMTAD uses diffusion to create self-organization in time and space. The approach is to leave the fundamentals of self-organization the same, but create temporally correlated neighborhoods in the field of PEs. The basic functionality of the network becomes more organized and temporally sensitive, without drastically changing its underlying operation. Although the SOMTAD architecture is unsupervised and thus should not be used as a general pattern recognition device, it does preprocess the input such that patterns commonly found in the training data will be easily detectable from the output of the SOMTAD. The SOMTAD is implemented by an array of LIN PEs attached to the output of the SOM, or similar VQ networks (neural gas). Spatial and temporal couplings are introduced in the output space of the VQ to organize the neighborhoods

1071

in space – time. An activity wave originates from the winning PE and diffuses information through the output space of PEs, simultaneously decaying over time at the point where it was applied. The winner is determined by combining the spatial information with the activity weights, i.e. competition is extended to space – time. Thus, the model stores the past history of the input signal in the spatial pattern of activity waves (long-term memory). During operation, the activity waves remain strong only when spatially neighboring PEs are triggered by the temporally ordered input sequences used in training (short-term memory). The output activity of our 2D SOMTAD can be thought of as waves in a pond, where pebbles are thrown. The spatial ripples in the water ‘remember’ the location and the timing of the pebbles that were thrown in the pond. Effectively, a pond is a spatially distributed dynamical system with shortterm memory, but no long-term memory since it always decays to a single resting state. The inverse problem (from the ripples to the excitation) is extremely hard to solve, and the model is not adaptive nor readily usable for information processing (although intriguing attempts have been made in this direction (Natschla¨ger, Maass, & Zador, 2001)). In our analogy to SOMTAD, the pebbles are the firing of the winning PEs that inject perturbations in certain locations in the output space, and the liquid is the discrete mesh of PEs with its local dynamics. Unlike a liquid, the SOMTAD possesses adaptive long-term memory, and it is anticipatory, that is, it not only remembers the location and timing of the events, but also anticipates where is the likely location where the new pebble is going to be thrown, given the information in a training set. An amazing property of the SOMTAD is that the Voronoi cells created by this VQ are time varying, even when the weights of the system are kept constant because the input signal dynamics are coupled to the output (Euliano, 1998). 2.2. The model Similar to SOMs, the network uses competitive learning with neighborhood functions (Kohonen, 1997). In the SOM, the input is simultaneously compared to the weights of each PE, and the PE that has the closest match between the input and its stored weights is the winner. The winner and its neighbors are then trained in a Hebbian manner, which brings their weights closer to the current input. The key concept in the SOMTAD architecture is the activity diffusion through the output space. Therefore, the output PEs in the SOMTAD are linked by weights to all its nearest neighbors, unlike the SOM (Fig. 1). The firing of a PE in the network causes activity to diffuse through the network and affects both the training and the recognition of the network. In the SOMTAD, the activity diffusion moves through the SOM lattice and is modeled after the R – D equation. If the system is ‘excitable media’ (multi-stable dynamical system), then the diffusion

1072

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

gracefully handle both time-warping and missing or noisy data. The decay of the activity also creates another biological possibility for explaining the movement of the enhancement throughout the network. If we define a neighborhood around a PE as one, where it has strong excitatory connections with its neighbors, then the decay of activity from a PE which fired in the past will help to fire (or lower the threshold of) its neighboring PEs. 2.3. Algorithm description The SOMTAD activity model (Euliano & Principe, 1996) is inspired by the Gamma memory structure (Principe et al., 1993) used in a variety of temporal signal processing applications. The activity wave can be modeled by the following space – time equation aðx; tÞ ¼ ð1 2 mÞaðx; t 2 1Þ þ mkX 2 Wx k Fig. 2. Temporal activity in the SOMTAD network. (a) Activity created by temporally ordered input; (b) activity created by unordered input.

of activity can create traveling pulses or wavefronts in the system. When the activity diffusion spreads to neighboring PEs, the thresholds of these neighboring PEs are lowered proportionally, creating a situation where the neighboring PEs are more likely to fire next. We define enhancement as the amount by which a PE’s threshold is lowered. In the SOMTAD model, the local enhancement acts like a traveling wave. This significantly reduces computation of diffusion equations and provides a mechanism where temporally ordered inputs will trigger spatially ordered outputs. This is the key aspect of this network architecture. The traveling wave decays over time because of competition for limited resources with other traveling waves. It can only remain strong if spatially neighboring PEs are triggered from temporally ordered inputs, in which case the traveling waves are reinforced. In a simple 1D case, Fig. 2 shows the enhancement for a sequence of spatially ordered winners (winners in order were PE1, PE2, PE3, PE4) and for a sequence of random winners (winners in order were PE4, PE2, PE1, PE5), which would be the case if the input was noise or unknown. In the ordered case, the enhancement will lower the threshold for PE5 dramatically more than the other PEs making PE5 likely to win the next competition. In the unordered case, the enhancement becomes weak and affects all PEs roughly evenly. The second temporal functionality added to the SOM is the decay of output activation over time (the feedback connections in the output space of Fig. 1). This is also biologically realistic (Chappel & Taylor, 1993). When a PE fires or becomes active, it maintains a portion (exponentially decaying) of its activity after it fires. Because the PE gradually decays, the wavefront it creates is more spread out over time, rather than a simple traveling impulse. This spreading creates a more robust architecture that can

ð1Þ

where m is the feedback parameter of the context PE, and aðx; tÞ denotes the activity of a PE located at x with weight vector Wx at time t. In our formulation, both x and t are discrete variables, but will be used throughout to show the space (x) and time (t) components. The winning PE is chosen by coupling the spatial distance with the temporal enhancement of the activity wave being propagated to the neighboring PEs. For the Euclidean metric, the winner kp is given by, kp ¼ arg min ðkX 2 Wk 2 baðx; tÞÞ

ð2Þ

k

where b denotes the spatio-temporal coupling parameter. This parameter determines the amount that a temporal wavefront affects the threshold to determine the winner. Once the winner is selected, it is trained according to the Kohonen learning rule (Kohonen, 1997), i.e. wkp ðn þ 1Þ ¼ wkp ðnÞ þ hneighðXÞðkX 2 Wkp kÞ

ð3Þ

where neigh(X) is the neighborhood function, and h is the learning rate. The spatio-temporal parameter b brings interesting functionality. By increasing b, the threshold of the neighboring PEs are lowered such that the next winner is almost guaranteed to be a neighbor of the current winner, which forces neighboring output PEs to fire sequentially. It is interesting to note that when b ! 0; the choice of the winner is similar to the SOM. In the limit when b ! 1; the system operates like an avalanche network. The SOMTAD architecture creates a spatially distributed memory, where the reaction part of the R –D equation is implemented by the static SOM activation (effectively a Gaussian bump in the output space) and the diffusion is implemented by the temporal enhancement. There are several ways to implement the temporal enhancement in the SOMTAD (Euliano, 1998). If only the activity of each node is decayed, the output of the SOMTAD can be described by the

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

1073

Fig. 3. Enhancement in the network with m ¼ 0:50 (left) and m ¼ 0:75 (right).

following equation:

instant t are given by

(

hðtÞ ¼ h0 e2t=T ; lðtÞ ¼ l0 e2t=T ;   b0 tp 1 þ sin bðtÞ ¼ 2T Max

eðx; tÞ ¼ eðx 2 1; t 2 1Þm þ aðx; tÞ aðx; tÞ ¼ aðx; t 2 1Þð1 2 mÞ þ dðx; tÞ

ð4Þ

where we separated the enchancement eðx; tÞ from the activity aðx; tÞ; and dðx; tÞ is the contribution from the SOM matching. Expanding yields (Euliano, 1998) eðx; tÞ ¼

n X n X

dðx 2 k; t 2 k 2 tÞmk ð1 2 mÞt

ð5Þ

k¼0 t¼0

This equation shows how the results from the matching activity contribute to the enhancement, and it is depicted in Fig. 3. The traveling waves create two decaying exponentials, one which moves through space ðmk Þ; and one which moves through time ðð1 2 mÞt Þ: The past history of the PE is added to the enhancement via the recursive self-loop in ð1 2 mÞ: The wavefront motion is added to the enhancement via the diagonal movement through the left-to-right channel scaled by m: The further the PE is off the diagonal and the further back in time, the less influence it has on the enhancement. As we can observe, the parameter m has opposite effects in the spread over time and over space, and it is one more parameter that needs to be set in the SOMTAD. Although b could be in principle adapted, it was preset in all our work as follows: During the initial stages of learning, the enhancement parameter is very small, so that the network learns patterns like the static Kohonen model. However, as time progresses, the magnitude of the coupling parameter is increased to enable learning of temporal relationships by the network. Later on, the enhancement function again decays down similar to the annealing process in the SOM for finetuning. We selected a raised cosine function to schedule b. In the SOMTAD, the learning rate and neighborhood kernel functions are exponentially annealed for better convergence and to avoid any local minima. The governing equations for the learning rate h and the annealing rate l at any time

ð6Þ

where T is the total number of training epochs. Examples on how this 1D-SOM works in simple synthetic data are presented in Euliano and Principe (1996).

3. Neural Gas algorithm with temporal activity diffusion The Neural Gas (NGAS) algorithm is similar to the SOM algorithm without the imposition of a predefined (spatial) neighborhood structure on the output PEs. The NGAS algorithm has been shown to converge quicker to low distortion errors when compared to k-means, maximum entropy clustering or the SOM algorithm (Martinetz, Berkovich, & Schulten, 1993). The NGAS PEs are trained with a softmax rule, but the softmax is applied based on the ranking of the distance to the reference vectors, not on the distance to the winning PE in the lattice. Since the NGAS algorithm has no predefined structure, each PE acts relatively independently, just like a molecule of gas diffuses to evenly cover the desired space. The NGAS has no predefined structure for the activity diffusion to move through, which is an asset. The GASTAD creates an auxiliary connection matrix trained with temporal Hebbian learning to best fit the temporal correlation in the input data. These secondary weights are similar to transition probabilities in Hidden Markov Models (HMM) and are the pathways used to diffuse the temporal information (Rabiner & Juang, 1993). Using this trainable neighborhood structure, the GASTAD still models the activity diffusion as traveling wavefronts through the system. This flexible structure decouples much of the spatial component from the temporal component in the network. In the SOMTAD, two neighboring nodes in time also needed to be relatively close in space in order for the system to train properly (since time and space were coupled). GASTAD is still a space –time

1074

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

mapping, but now the coupling between space and time is directly controllable. The most interesting concept that falls out of the GASTAD structure is the ability for the network to focus on temporal correlations. Temporal correlation can be thought of as the simple concept of anticipation. The human brain uses information from the past to enhance the recognition of ‘expected’ patterns. For instance, during a conversation a speaker uses the context from the past to determine what they expect to hear in the future. This methodology can greatly improve the recognition of noisy input signals, such as slurred or mispronounced speech. 3.1. GASTAD algorithm The GASTAD algorithm works as follows: first, one calculates the distance ðdi Þ from the input to all the PEs. The temporal activity in the network is similar to the SOMTAD wavefronts except that the wavefronts are scaled by the connection strengths between PEs. Thus, the temporal activity diffuses through the list defined by the connection matrix as follows: X ½ðmf ðd; kÞ þ ð1 2 mÞak ðtÞÞpk;i  ai ðt þ 1Þ ¼ aai ðtÞ þ

k

maxðpÞ

ð7Þ

where ai ðtÞ is the activity at PE i at time t, a is a decay constant less than 1, pi; j is the connection strength from PE i to PE j, d is the vector of distances from the input to each PE, m is the parameter which smoothes the activity giving more or less importance to the past activity in the network, and max(p) normalizes the connection strengths. The function f ðd; kÞ determines how the current match (distance) of the network contributes to the activity. At the present time, we simplify f ðd; kÞ to a d function, and remove the summation from Eq. (7) such that only the activity from the past winner is considered. Therefore, a previous winner that has followed a ‘known’ path through the network will have higher activity and thus will have more influence on the next selection. In the general case for the activity (Eq. (7)), the temporal activity at each PE is affected by contributions from all other PEs. In this case, the function f ðd; kÞ is typically an enhanced/sharpened version of the output and the summation is over all PEs. This allows all the activity in the network to influence the current selection. It makes the network more robust, since the wavefronts will continue to propagate (but will decay rapidly) even if the selected winner temporarily transitions to an unlikely path. The next step of the GASTAD algorithm is to modify the output (competition selection criteria) of each PE by the temporal activity in the network via the following equation: outi ¼ di 2 bai

ð8Þ

where b is the spatio-temporal parameter that determines how much the temporal information affects the selection of the winner. This parameter should be set based upon the

expected magnitude of the noise present in the system. For example, if the data is normalized [0,1], then a setting of b ¼ 0:1 allows the network to select a new winner, that is, at most a distance of 0.1 further away than the PE closest to the spatial input. To adjust the spatial weights, we use the standard Neural Gas algorithm with competitive learning and a neighborhood function based on an ordering of the temporally modified distance to the input. Dwi ¼ hhl ðki ðoutÞÞðin 2 wi Þ

ð9Þ

where h is the learning rate (step size), hl ðpÞ is an exponential neighborhood with the parameter l defining the width of the exponential. ki ðoutÞ is the ranking of PE i based on its modified distance from the input. The connection strengths are trained using temporal Hebbian learning with normalization. Temporal Hebbian learning is correlation learning applied over time, where PEs that fire sequentially enhance their connection strength. The rationalization for this rule is that PEs will remain active for a period of time after they fire, thus both current and previous winners will be active at the same time. In the current implementation, the connection strengths are updated similar to the conscience algorithm for competitive learning:   N Dparg minðoutðt21ÞÞ;arg minðoutðtÞÞ ¼ b; pi; j ¼ pi; j Nþb ð10Þ The strength of the connection between the last winner and the present winner is increased by a small constant b and all connections are decreased by a fraction that maintains constant energy across the set of connections. Another possibility for normalization would be to normalize all connections leaving each PE. This method gives poorer performance if a PE is shared between two portions of a trajectory, since the connection strength would have to be shared between the two outbound PEs. It does, however, give an interpretation of the connection strengths as probabilities and points out the similarity between the GASTAD and the HMM (Rabiner & Juang, 1993). The parameters h and l are annealed exponentially as in the Neural Gas algorithm, while b takes the form of a raised sine wave. During operation, the trained weights and information from the past create temporal wavefronts in the network that allow plasticity during recognition. This temporal activity is mixed with the standard spatial activity (distance from input to the weights) via b, the spatiotemporal parameter. Two identical inputs may fire different PEs depending on the temporal past of the signal. Fig. 4 shows two Voronoi diagrams for the same GASTAD network (i.e. with the same weights) trained with random patterns that contain a sequence of samples that starts in the left bottom corner and ends in the upper right corner. A Voronoi diagram graphically

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

1075

Fig. 4. Voronoi diagrams without and with enhancement.

describes the region in the input space that fires each PE. In these particular diagrams, the number in each Voronoi region represents the PE number for that particular region and is located at the center of the static Voronoi region (the center is the same as the weights of the PE). Although the GASTAD has all the parameters fixed, notice that the winner selection is still dependent upon the input signal structure due to the activity diffusion term (Eq. (9)). Hence, the Voronoi becomes time varying. The left side of Fig. 4 shows the Voronoi diagram during a presentation of random noise to the trained network. Since this input pattern was not encountered during training, the temporal wavefronts were not created and the Voronoi diagram is very similar to the static Voronoi diagram. The right side of Fig. 4 shows the Voronoi diagram during the presentation of the bottom-left to top-right sequence. The temporal wavefront grew to an amplitude of 0.5 by the time PE18 fired. Also, from the training of the network, the connection strength between PE18 and PE27 was large compared to the other PEs. Thus, the temporal wavefront flowed preferentially to PE27 enhancing its possibilities of winning the next competition. Notice how large is the region 27 in right side of Fig. 4 since it is the next expected winner. This plasticity seems similar to the way humans recognize temporal patterns (e.g. speech). Notice that the network uses temporal information and its previous training to ‘anticipate’ the next input. The anticipated result is much more likely to be detected since the network is expecting to see it. It is important to point out how the static and dynamic conditions are dramatically different. In the dynamic GASTAD, the centroids (reference vectors) are not as important—the temporal information changes the entire characteristics of vector quantization creating data dependent Voronoi regions. An animation can demonstrate the operation of the GASTAD Voronoi regions much better than static figures (see http://www.cnel.ufl.edu/ somtad).

4. Experimental results 4.1. Landmark recognition for robot navigation The first application of the SOMTAD is landmark discrimination in a robotics application. This application was motivated by the work of Kulzer (1996). A very simple robot with limited sensing (ultrasound proximity sensors and a compass) wanders about its ‘space’ until it finds an object in front of it. The robot may approach this object from any possible direction and orientation. After successful docking, the robot follows the walls of the landmark in a clockwise direction. At preselected intervals the robot collects its turn angle and compass settings. This is the simplest algorithm for circumventing an object, but creates difficulties for the processing of the data. One of the difficulties with the wall-following approach is that the data collected by this simple robot is imprecise. The wheels and gears have a tendency to slip and the turning data and segment distances are often incorrect. Additionally, the wall-following control algorithm can overshoot or undershoot the turns, just like any other control algorithm. Fig. 5 shows measurements obtained from a robot docked to a sofa and a sofa/chair combo. Thirteen runs are shown for the sofa, and six to the combo to illustrate the variability and also the difference in scale (warping) of the measurements. We can visualize the difficulty of the task if straight trajectory matching is applied. To obtain more accurate information, the speed to circumvent the object must be much slower. The goal of our approach is to allow the robot to operate at higher speeds by creating an algorithm that will accept less precise local turn data. In order to fully understand the appropriateness of SOMTAD for this problem, we decided to create simulated L trajectories and add noise to them to approximate the variability of the real world measurements. Later we will present results on the real trajectories. Our SOMTAD implementation uses turn angles and a compass to map each

1076

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

Fig. 5. Robot wall-following for two obstacles (Kulzer, 1996).

landmark. The relative coordinate turn angle is the derivative of the compass setting and thus this creates a dynamic state-space description of the robot’s motion. The sampled data can be constructed into a trajectory through turn-angle/compass space. The SOMTAD can be used to map such trajectories. Fig. 6 shows the trajectory the robot must follow to traverse the L-shaped landmark. The trajectory begins when the robot is heading north (08) after making a right turn at the bottom-left portion of the landmark. The interpolated turn slowly moves back to zero and then back to 2p/2 as the robot turns right again and begins to head east. The trajectory is very difficult to map,

because there is an overlap in the input space, where the trajectory doubles back on itself (corresponding to the concave corner in the figure and shown as the line moving from approximately (3p/2, 2 p/2) ! (p,p/2) and viceversa). The trajectory passes through the same points in state space and the only difference is the direction of travel. Without temporal information, the standard SOM cannot properly map this trajectory. A SOMTAD network with six strings and 10 PEs was created to map the target trajectory. Since this is a complicated target trajectory, that is, 40 samples long, training a single long SOMTAD network for this problem is

Fig. 6. Robot trajectory around the L-shaped landmark.

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

1077

Fig. 7. SOMTAD mapping of L-shaped landmark trajectory.

very difficult. Multiple strings of PEs provide improved flexibility in the mapping of complex trajectories by allowing different clusters of PEs to move independently. Each string of PEs is like a single 1D SOMTAD network. The only mechanism linking the six strings of PEs is the wavefront of enhancement that travels from the last PE of one string to the first PE of the next. The neighborhood training of the strings is local and does not cross string borders. The SOMTAD was trained with noisy versions of the target trajectory (the turns and compass headings) intermixed with noise that represents random searching motions of the robot looking for landmarks. There was no supervision or indication as to which part of the input was noise and which was signal. Fig. 7 shows the trained PE weights of each string mapped back to the input space and represented as X’s (O’s represent the first PE in each string). Four (labels 1– 4) of the strings mapped to different locations in the trajectory (as labeled on the right side of the figure) and two others (labels 5 and 6) account for most of the inter-signal noise. As the robot moves along the periphery of the object, the nodes of the SOMTAD will fire sequentially. The SOMTAD dynamics use temporal information to smooth noisy signals and can also gracefully handle time warping. If the input data is noisy, the temporal wavefronts

that move through the network will influence the selection of the winning PEs and be able to ignore a certain amount of noise. The spatio-temporal parameter, b, determines how much noise the system will accept. If b is set to 0.5, then the temporal information in the network can influence the network to choose a PE, that is, up to 0.5 units further away from the spatial input than the closest spatial PE. Thus, b should be set based upon the dynamic range of the input and the amount of expected noise in the signal. Of course increasing b also decreases the ability of the network to discriminate between two similar but different temporal patterns. 4.1.1. Robustness to noise Fig. 8 shows two trajectories with 0.1 amplitude and 0.3 amplitude zero-mean additive noise. The trajectories are significantly different from the training signal, but the dominant temporal pattern is still clearly visible. Fig. 9 shows the sequence of winning PEs and the SOMTAD enhancement in the network for both trajectories, with random noise interspersed. Remember that a diagonal line in the plot of winning PEs means that the PEs fired in a sequential order, as desired. b was set to 0.5 and thus the network should be able to use the temporal information in the signal to remove much of the variation in the signal. The

Fig. 8. Noisy trajectories with 0.1 and 0.3 amplitude noise.

1078

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

Fig. 9. Enhancement and winning PEs for noisy trajectories with ^0.1 and ^0.3 amplitude noise.

trajectory with 0.1 amplitude noise is shown in the first 40 points of the figures. Notice the straight diagonal line showing that the PEs fired in perfect sequential order. The wavefront is perfectly synchronized with the signal and moves along just ahead of the previous winning PE, helping the network choose the correct PE. The second trajectory is shown between points 60 and 100. The SOMTAD was still able to correctly map the majority of the signal and only three times misclassified a point on the trajectory. These locations are clearly shown on the plot of winning PEs and also show up as a ‘dimming’ of the enhancement wavefront in the enhancement chart. 4.1.2. Robustness to time-warping If the input data is time-warped, the memory kernel in the SOMTAD allows the network to either skip PEs for shorter patterns or fire PEs more than once for longer patterns, without greatly reducing the wavefront strength of the network. Maintaining the wavefront strength allows the network to continuously smooth spatial noise with temporal information. Two sequences were created that warped the 40-point trajectory to 56 points and 30 points by upsampling and downsampling the signal. Fig. 10 shows the enhancement of each PE over time and the sequence of winning PEs. The longer trajectory is shown in the first 56 samples and the enhancement plot shows that the wavefronts periodically die out and restart at the same PE one time period later. The

plot of the winning PEs also shows that certain PEs are fired twice so that the SOMTAD can warp the signal back onto its output map. Samples 72 – 102 show that the network periodically skips a PE to adjust the network to the faster sampling of the shorter trajectory. The enhancement shows disconnected wavefronts that continue in the network just ahead of where the previous one ended. After this robust performance in our noisy data, we attempted to discriminate between the real trajectories collected from the robot. Table 1 shows the results obtained from two SOMTADs with 1D neighborhoods. Each linear SOMTAD will represent one of the objects, and the endpoints will wrap around because the point of docking is unknown (Kulzer, 1996). The size of the SOM is determined when the robot encounters the object for the first time, and the training of the SOMTAD is done in one shot also at the same time (Kulzer, 1996). Table 1 computes the ratio of the two network outputs, so a value larger than one means discrimination of the true obstacle. We repeated the tests for eight different trajectories. Only three mistakes were tallied in this very demanding task, which is much less than the method used in Kulzer (1996) based on simple logic. 4.2. Speech vector quantization Speech classification is a challenging task. The goal of

Fig. 10. SOMTAD enhancement for time-warped trajectories.

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

1079

Table 1 Sofa versus sofa/chair landmark discrimination ratios (Kulzer, 1996)

Sofa Sofa/chair

Test1

Test2

Test3

Test4

Test5

Test6

Test7

Test8

1.51 1.34

1.65 1.37

0.98 1.40

1.13 1.32

0.90 1.22

1.23 0.90

1.61 1.35

1.99 1.20

this experiment is to employ the neural network model that we developed as a dynamic VQ, to serve as a preprocessor engine to the classification machine. The basic building blocks of the system are shown in Fig. 11. This experiment is a speaker independent isolated spoken English digit recognition. The data is the TI-46 corpus (TI-46 Data Base for Isolated Words, 1991), which is the standard database for isolated word recognition problems and is published by the National Institute of Standards and Technology. This CD-ROM contains a corpus of speech, which was designed and collected at Texas Instruments in 1980. The dataset is divided into a training set and a testing set. Totally there are 16 speakers, eight male and eight female. In the training set, each speaker speaks words from zero through nine in 10 different sessions. Thus, we have a total of 1600 words in the training set. In the testing set, however, there are 16 sessions. Thus, there are a total of 2560 words for testing. The speakers represent a wide variety of accents and gender, which makes this task a difficult one. The sampling rate for this digitized data was 12.5 kHz. Speech is a quasi-stationary signal. Hence, we need to reliably estimate the short time characteristics of the signal within an analysis window. The preprocessing comprised extracting the first 12 PARCOR coefficients. The data window over which these feature vectors were extracted was 256 samples in length and an overlap length of 100 samples was chosen. The feature vectors for each window were then vector quantized by both a Neural Gas network and GASTAD. A SOM (and SOMTAD) would be more difficult to train, because the spatial and temporal relationships are complex. We trained 10 VQ for each isolated digit, without using segmentation. This process was done by feeding each network with input vectors comprising of the target spoken digits, by the speakers interspersed with feature vectors from random digits. The number of noisy feature vectors was randomly chosen at each step and in length roughly equal to the number of feature vectors for the digit being trained. Notice that no classification was attempted, since we did not created a desired response. Our goal was to see if we could self-organize a system with temporal patterns that would preferably respond to each of the spoken digits. If this is possible, then the classification would become much easier. The training parameters are listed Table 2. The learning rate and the neighborhood radius were annealed slowly towards convergence. The spatio-temporal coupling parameter was chosen as a raised sinusoid with peak amplitude

Fig. 11. Block diagram of an isolated word speech recognition system.

0.2. The temporal enhancement in the network reaches a peak and decays slowly for fine-tuning at the end of learning. Choices of such coupling functions allow the PEs in the network to span the output space freely at the beginning without any temporal interference. During the mid-phase of the training, the PEs are fairly uniformly distributed when the temporal training is at its peak and finally it decays to zero for fine-tuning. Another important step in the temporal Hebbian learning is the update in the temporal weights due to the secondary connectivity matrix. If the Hebbian increment is too small, the network does not change quickly enough to affect the training. However, if the increment is very large, the network emphasizes too much the recent past. A balanced tradeoff in the increment would be roughly the reciprocal of the number of PEs in the network. An interesting observation with the GASTAD model is the ability with which the networks can spot words embedded in noise. Fig. 12 shows the activity of the network over time for the digit eight. The input data (utterance of the digit ‘eight’) is interspersed with vectors from other digits. It is clear that the activity of the network is much higher when the word eight is presented to the network compared to the noise interspersed. Also a global integrated activity is plotted which sets the threshold for digit detection and recognition. It is interesting to note that the GASTAD activity has resulted from unsupervised, unsegmented speech and is created using self-organizing principles in space –time. Another aspect in the model is the quantifying function, which serves as the discriminating function for the purpose of recognition. In this spatio-temporal based approach, we choose the index of the class that provides the maximum integrated temporal enhancement for a given speech pattern as a criterion for the recognition index. This is, however, equivalents to a spatio-temporal distance metric (Garani & Principe, 2001a). Unlike static quantizers, where the L2 norm based on the distance of a vector to the weights is chosen as the criterion for distortion, this model will look into the enhancement along with the distance to weights, as the distortion criterion for classification. The recognition results are tabulated in the form of a confusion matrix in Tables 3 and 4. The confusion matrix is

1080

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

Fig. 12. Output of GASTAD for the digit eight.

a two dimensional matrix, which is used to represent the classification of the digits. The columns represent the classification due to the network while the rows show the correct classification. In an ideal classification, without any errors, the confusion matrix will be a diagonal matrix. All misclassifications are depicted as non-diagonal terms, which show as to which digits are confused by the system. For instance, we notice that among 160 samples for the utterance of the digit ‘one’, 153 utterances were identified as one, 1 utterance was identified as ‘seven’ while six utterances were identified as ‘nine’. The resulting recognition accuracy on the training set was around 96%. The recognition accuracy on the testing set was 93%. 4.2.1. Implementation results with a static NGAS architecture The experiments for vector quantization of speech were carried out with the Neural Gas architecture in order to evaluate the performance of the static quantizer model. The training conditions were similar to the activity diffusion model. For the recognition, since we did not have the temporal enhancement, we used the integrated distance between the input and the NGAS winner ðindex ¼ P n arg mini ðkXn 2 Wi;k kÞÞ: The recognition on the training

set was 91%. The test set recognition results for the Neural Gas algorithm is tabulated as a confusion matrix in Table 5. The recognition rate on the test set was 86%. Hence, we conclude that the activity diffusion has an advantage for vector quantization of speech, because performance increases from 87 to 93%, which is significantly better ðp ¼ 0:01Þ for this task. 4.2.2. Noisy speech classification In a second experiment, we consider a speaker dependent recognition problem in the presence of babble noise (ICRA, 1997). The babble noise signals have the same power spectra of normal speech, but the phase of the FFT is randomized, before an IFFT is taken, which destroys the time connectivity of the signal. We choose babble noise ‘babble4.wav’ in our experiment. To study the effect of babble noise in speech recognition, we focused on one of the speakers in the database. Thus, this was a speaker dependent recognition experiment. Ten different words spoken in eight different sessions formed the training set and a similar number was randomly chosen for the testing set. The variance of the babble noise was varied to get different Signal to Noise Ratios. The following results show the effect of SNR versus Recognition accuracy for the speaker (Table 6).

Table 2 Training parameters for the activity diffusion model Training parameter

Value

Epochs (T) PEs (N) Annealing parameter Learning rate Initial activity Spatio-temporal parameter (b) Connection strength update (D)

800 20 Exponentially annealed from 10 to 0.01 Exponential decay from 0.1 to 0.001 Uniform with a value 0.05 for all nodes Raised sine—Increases from 0 to reach a peak at 0.2 and decays back to 0 for fine-tuning 0.05

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

1081

Table 3 Confusion matrix for the training set

Zero One Two Three Four Five Six Seven Eight Nine

Zero

One

Two

Three

Four

Five

Six

Seven

Eight

Nine

False positives

157 0 0 0 0 0 0 0 1 0

0 153 0 0 0 3 0 0 0 4

0 0 154 0 1 0 3 0 0 0

3 0 0 160 0 0 1 0 9 0

0 0 0 0 159 0 0 0 0 0

0 0 0 0 0 156 0 0 0 2

0 0 6 0 0 0 152 2 1 0

0 1 0 0 0 1 0 156 0 1

0 0 0 0 0 0 3 0 149 0

0 6 0 0 0 0 1 2 0 153

3 7 6 0 1 4 8 4 11 7

It is interesting to note that the recognition accuracy is fairly good even for 0 dB SNR, where the recognition starts to be difficult even for humans. This shows that temporal enhancement is able to handle one of the worse forms of noisy speech.

of linear or non-linear systems based on delay lines was enhanced with the development of the SOMTAD. From the computational neuroscience perspective, the SOMTAD is also interesting because it is based on principles that are biologically plausible and can be implemented by physical principles that have been shown to exist in the central nervous system. In particular, this model can be an alternative to the attractor neural network paradigm that has been so popular in computational neuroscience since Hopfield’s work. Although there are clear links to the R –D paradigm (the generation of traveling waves), we were unable to derive a cost function and learning rule from first principles (in this case the famous Turing’s reaction diffusion equation). Further work along this line is required to fully capitalize on diffusion to train other neural models, such as MLPs and RBFs. A timid attempt to apply traveling waves to help train recurrent neural networks with backpropagation through time is reported in Euliano and Principe (2000), but the results were not conclusive. We also think that not all the information contained in the traveling waves is being utilized in the training, so more theoretical work is needed. The SOMTAD and GASTAD models have been successfully applied in a robot navigation problem and in speech recognition as a dynamic VQ. We do not want to imply that self-organization in space – time by itself will be

5. Conclusions We have presented the development and application of a novel neural network model that self-organizes in space – time, motivated by the R – D paradigm. Two architectures (SOMTAD and GASTAD) are highlighted, and the performance of the models are studied and validated through simulations. The key enhancement of the technique is the introduction of activity diffusion in the competitive selection of the winner PE. Activity diffusion appears to be a very efficient way to create short-term memories and avoids the conventional time-to-space brute force mapping implemented by delay lines. When coupled with a selforganizing network, activity diffusion constructs a local structure that stores both long and short-term memories and that is unique in neurocomputing. Since these two types of memories are essential to process information carried in the time structure of signals, the engineering toolbox consisting Table 4 Confusion matrix for the test set (GASTAD)

Zero One Two Three Four Five Six Seven Eight Nine

Zero

One

Two

Three

Four

Five

Six

Seven

Eight

Nine

False positives

248 0 2 3 0 1 4 0 2 0

1 235 0 0 1 1 0 0 0 15

1 0 243 2 1 0 3 4 0 0

3 0 2 251 1 0 0 0 21 0

0 2 1 0 253 0 0 0 0 0

0 0 0 0 0 244 5 2 0 17

0 0 5 0 0 0 224 8 1 0

3 4 2 0 0 2 4 230 0 1

0 0 1 0 0 0 7 0 232 0

0 15 0 0 0 8 9 12 0 223

8 21 13 5 3 12 32 26 24 33

1082

J. Principe et al. / Neural Networks 15 (2002) 1069–1083

Table 5 Confusion matrix for the test set (NGAS)

Zero One Two Three Four Five Six Seven Eight Nine

Zero

One

Two

Three

Four

Five

Six

Seven

Eight

Nine

False positives

242 2 5 7 2 4 2 1 2 1

1 217 0 0 1 2 0 8 0 13

2 0 237 3 2 0 7 1 0 0

2 0 1 239 0 0 4 0 37 0

0 2 0 0 251 0 0 0 0 0

0 0 0 0 0 192 6 9 0 18

0 0 5 0 0 0 200 7 4 0

9 4 3 0 0 9 9 213 0 10

0 0 5 7 0 0 19 1 213 0

0 31 0 0 0 49 9 16 0 214

14 39 19 17 5 64 56 43 43 42

able to provide solutions to these difficult problems. We simply showed that the SOMTAD or GASTAD are subsystems with unique representation features that can be successfully utilized in the processing of time information. In the robot example, we showed that the map handles a fair degree of variability and warping in the trajectories, and it is robust to noise. However, the size of the SOM has still to be tuned to the trajectory length. In the speech example, we could be more quantitative due to the availability of a speech corpus. We showed that a dynamic VQ built from a GASTAD was able to significantly improve conventional vector quantization. The improvement is derived from the temporal information encoded within the spatio-temporal network that anticipates the next phoneme and therefore enlarges the corresponding Voronoi cell. We also showed that the GASTAD was able to attenuate the effect of additive babble noise (correlated noise with the same power spectrum of speech), which is considered the worst case of interference. These examples validate the hypothesis that the improvement comes from the use of the time structure of the signal. In the speech example, the SOMTAD did not produce the same improvements as the GASTAD, which leads us to think that in complex, high dimensional problems, the low dimensional output space of the SOM (1 or 2D) is not large enough to harmonize the constraints of space and time neighborhoods (which is general may be different). Therefore, the GASTAD seems a better choice in these cases. Another aspect that is worth mentioning, is the fact that the GASTAD (or SOMTAD) provide a higher quantization Table 6 Overall speech recognition rates for noisy speech recognition SNR (dB)

Static VQ based on NGAS (%)

Dynamic VQ based on GASTAD (%)

1 12 6 0

88.50 80.75 70.50 56.25

95.5 85.5 77.25 68.75

error than their conventional counterparts. This can be understood by noting that the Voronoi diagrams are time varying and so the centers of the clusters may not be optimally placed. This corroborates the argument that the role of these models is mostly one of helping to make better decisions in noisy or time warped conditions. We foresee other important engineering applications for the SOMTAD, such as missing sample reconstruction, an important problem in time series analysis. Even when one or more samples are missing, the temporal enhancement may provide the information required to select the winner, and approximately replace the missing data.

Acknowledgments This work was partially supported by NSF grant ECS9900394, ONR N00014-01-1-0405, and NSF EIA-0135946, and represent solely the views of the authors.

References Anderson, J. (1983). Cognitive and psychological computation with neural networks. IEEE Transactions on Systems, Man and Cybernetics, SMC13, 799–815. Barreto, G., & Araujo, A. (1999). Unsupervised context based learning of multiple temporal sequences. Proceedings of the International Joint Conference on Neural Networks, pp. 1102– 1106. Chappel, G., & Taylor, J. (1993). The temporal Kohonen map. Neural Networks, 6, 441–445. Christodolou, C., Clarkson, T., & Taylor, J. (1995). Temporal pattern detection and recognition using the temporal noisy leaky integrator PE model with postsynaptic delays trained using Hebbian learning. Proceedings of the World Congress on Neural Networks. Cristley, D. (1994). Extending the Kohonen SOM by use of adaptive parameters and temporal neurons. PhD Thesis, University College of London. Cunningham, R., & Waxman, A. (1994). Diffusion enhancement bilayer: Realizing long range apparent motion and spatiotemporal grouping in neural architectures. Neural Networks, 7(6/7), 895 –924. Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179 – 211. Euliano, N. (1998). Temporal self-organization for neural networks. PhD

J. Principe et al. / Neural Networks 15 (2002) 1069–1083 Dissertation, University of Florida, Electrical and Computer Engineering, Gainesville, FL, USA. Euliano, E., & Principe, J. (1996). Spatio-temporal self organizing feature maps. Proceedings of the ICNN ’96, Washington, DC, pp. 1900–1905. Euliano, N., & Principe, J. (2000). Dynamic subgrouping in RTRL provides a faster O(N2) algorithm (Vol. 6). Proceedings of ICASSP2000, pp. 3418–3421. Freeman, W. (1992). Tutorial in neurobiology: From single neurons to brain chaos. International Journal of Bifurcation and Chaos, 2, 451– 482. Garani, S., & Principe, J. (2001a). Dynamic vector quantization of speech. WSOM’01, London: Springer. Garani, S., & Principe, J. (2001b). A spatiotemporal vector quantizer for missing sample reconstruction. Proceedings of IJCNN’01, Washington, DC. Giles, C., Sun, G., Chen, H., & Chen, D. (1990). Higher order recurrent networks and grammatical inference. Advances in Neural Information Processing Systems (NIPS), 2, 380– 387. Grossberg, S. (1976). Adaptive pattern classification and universal recoding. I. Parallel development of neural detectors. Biological Cybernetics, 23, 121 –134. Hochreiter, S., & Schmidhuber, J. (1997). LSTM can solve hard long term lag problems. In M. Mozer, M. Jordan & T. Petche (Eds.), Neural information processing systems, NIPS 9, 473–479. Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of the Sciences of the United States of America, 79, 2554– 2558. ICRA Noise Signals ver. 0.3 CDROM, International Collegium of Rehabilitative Audiology, February 1997. James, D., & Miikkulainen, R. (1995). SARDNET: A self-organizing feature map for sequences. In T. Leen, G. Tesauro & D. Touretzky (Eds.), Advances in neural information processing systems (NIPS 7), 577–584. Jordan, M. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. Eighth Annual Conference of the Cognitive Science Society, Amherst, MA, pp. 531–546. Kangas, J. (1990). Time-delayed self organizing maps. Proceedings of the International Joint Conference on Neural Networks, part 2 of 3; pp. 331–336. Kargupta, H., & Ray, S. (1994). Temporal sequence processing based on the reaction–diffusion process (Vol. 4). Proceedings of ICNN’94, pp. 2315–2320. Kemke, C., & Wichert, A. (1993). Hierarchical SOM for speech recognition (Vol. 3). Proceedings of the World Congress on Neural Networks, pp. 45–47.

1083

Kleinfeld, D., & Sompolinsky, H. (1998). Associative neural network model for the generation of temporal patterns. Biophysical Journal, 54, 1039–1051. Kohonen, T. (1997). Self-organizing maps (2nd ed.). Berlin: Springer. Krekelberg, B., & Taylor, J. (1996). Nitric oxide in cortical map formation. International Conference on Artificial Neural Networks. Kulzer, P. (1996). NAVBOT-autonomous robotic agent with neural learning. Master Thesis, University of Aveiro, Portugal. Lang, K., & Hinton, G. (1988). The development of the time delay neural networks architecture for speech recognition. Technical Report CMUCS-88-152, Carnegie Mellon University. Lin, T-N., & Giles, C. (2001). Delay networks: Buffers to the rescue. In J. Kolen, & S. Kramer (Eds.), A field guide to dynamical recurrent networks (pp. 27– 38). Martinetz, T., Berkovich, S., & Schulten, K. (1993). Neural-gas network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4), 558 –569. Murray, J. (1989). Mathematical biology. New York: Springer. Narendra, K., & Parthasarathy, K. (1990). Identifcation and control of dynamical systems with neural networks. IEEE Transactions on Neural Networks, 1, 4–27. Natschla¨ger, T., Maass, W., & Zador, A. (2001). Efficient temporal processing with biologically realistic dynamic synapses. Network: Computation in Neural Systems, 12, 75–87. Principe, J. (2001). Dynamic neural networks and optimal signal processing. In Y. Hu, & J. Hwang (Eds.), Neural networks for signal processing (Vol. 6-1) (pp. 6– 28). Boca Raton: CRC Press. Principe, J., Euliano, N., & Lefebvre, C. (2000). Neural systems: Fundamentals through simulation. New York: Wiley. Principe, J., Wang, L., & Motter, M. (1998). Local dynamic modeling with self organizing maps and applications to nonlinear system identification and control. Proceedings of the IEEE, 86(11). Principe, J., deVries, B., & Oliveira, P. (1993). The gamma filter: a new class of adaptive IIR filters with restricted feedback. IEEE Transactions on Signal Processing, 41(2), 649–656. Rabiner, L., & Juang, F. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice-Hall. Reiss, M., & Taylor, J. (1991). Storing temporal sequences. Neural Networks, 4, 773–787. Ruwisch, D., Bode, M., & Purwins, H. (1993). Parallel hardware implementation of Kohonen’s algorithm with an active medium. Neural Networks, 6, 1147–1157. TI-46 Data Base for Isolated Words, National Institute of Standards, 1991. Turing, A. (1952). The chemical basis of morphogenesis. Philosophical Transactions of the Royal Society of London, Series B, 237, 37 –72.