Neural Networks 19 (2006) 785–798 www.elsevier.com/locate/neunet
2006 Special Issue
Adaptive filtering with the self-organizing map: A performance comparison Guilherme A. Barreto ∗ , Lu´ıs Gustavo M. Souza Department of Teleinformatics Engineering, Federal University of Cear´a, Av. Mister Hull, S/N - C.P. 6005, CEP 60455-760, Center of Technology, Campus do Pici, Fortaleza, Cear´a, Brazil
Abstract In this paper we provide an in-depth evaluation of the SOM as a feasible tool for nonlinear adaptive filtering. A comprehensive survey of existing SOM-based and related architectures for learning input–output mappings is carried out and the application of these architectures to nonlinear adaptive filtering is formulated. Then, we introduce two simple procedures for building RBF-based nonlinear filters using the VectorQuantized Temporal Associative Memory (VQTAM), a recently proposed method for learning dynamical input–output mappings using the SOM. The aforementioned SOM-based adaptive filters are compared with standard FIR/LMS and FIR/LMS–Newton linear transversal filters, as well as with powerful MLP-based filters in nonlinear channel equalization and inverse modeling tasks. The obtained results in both tasks indicate that SOM-based filters can consistently outperform powerful MLP-based ones. c 2006 Elsevier Ltd. All rights reserved.
Keywords: Self-organizing map; Local linear mappings; Vector-quantized temporal associative memory; RBF models; Nonlinear equalization
1. Introduction and problem formulation Throughout the years, several linear and nonlinear learning structures have been proposed to tackle complex adaptive filtering tasks, such as identification and equalization of communication channels (Haykin, 1996; Principe, Euliano, & Lefebvre, 2000), and major commercial applications of adaptive filters are now available, such as equalizers in high-speed modems and echo-cancelers for long distance telephone and satellite circuits (Widrow & Lehr, 1990). Simply put, the equalization task consists in recovering at the receiver the information transmitted through a communication channel subject to several adverse effects, such as noise, intersymbol interference (ISI), co-channel and adjacent channel interference, nonlinear distortions, fading, and time-varying characteristics, among others (Proakis, 2001; Quereshi, 1985). The equalizer is an adaptive filter available at the receiver that learns how to recover the transmitted signal sequence,1 ∗ Corresponding author. Tel.: +55 85 4008 9467; fax: +55 85 4008 9468.
E-mail addresses:
[email protected] (G.A. Barreto),
[email protected] (L.G.M. Souza). URL: http://www.deti.ufc.br/∼guilherme (G.A. Barreto). 1 The signal amplitude sampled at discrete time t can be a discrete or continuous random variable. As a discrete random variable, the signal sequence is usually called a finite alphabet sequence; otherwise, it is called an infinite alphabet sequence. c 2006 Elsevier Ltd. All rights reserved. 0893-6080/$ - see front matter doi:10.1016/j.neunet.2006.05.005
N {s(t)}t=1 , s(t) ∈ R. Thus, loosely speaking, the equalizer tries to learn the inverse transfer function of the channel (Johnson, 1995). A linear combiner, also called a Finite Impulse Response (FIR) transversal filter in the current context, is the basic building block of almost all neural networks and adaptive filters used as equalizers. The adjustable parameters are coefficients (or weights), a j (t), j = 0, . . . , p − 1, usually organized into a coefficient vector a(t) ∈ R p . Information obtained from a N distorted signal sequence of length N , {y(t)}t=1 , observed at the output of the channel is used by an adaptive algorithm to gradually adjust the coefficients in order to have at the output of the filter, z(t) = sˆ (t−τ ), a good estimate of the signal sample s(t − τ ) transmitted τ > 0 time steps earlier:
z(t) = sˆ (t − τ ) = aT (t)x(t)
(1)
where x(t) = [y(t) y(t − 1) · · · y(t − p + 1)]T is the vector containing the last p samples of the distorted signal sequence, and the superscript T denotes the transpose vector. Without loss of generality, in this paper we assume τ = 0. An instantaneous measure of the quality of the equalizer is given by the error signal, e(t) = d(t)−z(t) = s(t)−sˆ (t), defined as the difference between the desired response d(t) and the actual output of the equalizer z(t). For achieving a good solution with minimum a priori knowledge of the statistics of the data, the distorted signal
786
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
samples in x(t) and the error signal e(t) can be used for the adaptation of the coefficient vector a(t) on an iterative basis. The coefficient adjustment process that is the simplest and most widely used today is the Widrow–Hoff Least Mean Squares (LMS) algorithm (Widrow & Hoff, 1960), given by the following recursive equation: a(t + 1) = a(t) + α 0 e(t)x(t)
(2)
where 0 < α 0 1 is a fixed adaptation step size. The Error Backpropagation algorithm (Haykin, 1994; Principe et al., 2000; Widrow & Lehr, 1990) is a remarkable generalization of the LMS algorithm, which is widely used for the adaptation of nonlinear multilayered neural networks. Another variant of the LMS algorithm, the well-known LMS–Newton algorithm, uses second-order statistics, obtained from the autocorrelation matrix R of the input signal, for updating the coefficients. Assuming that R is non-singular, the LMS–Newton algorithm can be written as follows: a(t + 1) = a(t) + α 0 R−1 (t)e(t)x(t)
(3)
where, since the matrix R is usually not known a priori, it can be estimated by the following recursive equation (Widrow & Stearns, 1985): R(t + 1) = (1 − β)R(t) + βx(t)xT (t),
(4)
where 0 < β < 1 is a forgetting parameter. At the beginning of the estimation process, we set R(0) = I p , with I p being the identity matrix of dimension p. The LMS and the LMS–Newton algorithms are both gradient descent adaptive algorithms. LMS is simple and practical, and is used in many applications worldwide. LMS–Newton is based on both Newton’s method and the LMS algorithm, being optimal in the least squares sense (Haykin, 1996). When the performances of LMS and LMS–Newton are compared, it is found that under many circumstances both algorithms provide equal performance. For example, when both algorithms are tested on linear but nonstationary signals, their average performances are statistically equivalent (Widrow & Kamenetsky, 2003). However, linear transversal FIR filters trained with LMS and LMS–Newton algorithms do not perform very well when applied to nonlinear problems. For this purpose, supervised neural network architectures, such as MLP and RBF, motivated by their universal approximation property, have been successfully used as nonlinear tools for channel identification and equalization (Ibnkahla, 2000). It has been demonstrated that their performances as nonlinear adaptive FIR filters usually outperform traditional linear techniques in complex channel equalization tasks (Adali, Liu, & Smez, 1997; Chang & Wang, 1995; Chen, Mulgrew, & Grant, 1993; Feng, Tse, & Lau, 2003; Jianping, Sundararajan, & Saratchandran, 2002; Kechriotis, Zervas, & Manolakos, 1994; Parisi, Claudio, Orlandi, & Rao, 1997; Patra, Pal, Baliarsingh, & Panda, 1999; Peng, Nikias, & Proakis, 1992; Zerguine, Shafi, & Bettayeb, 2001). The Self-Organizing Map (SOM) (Kohonen, 1990, 1997, 1998) is an important unsupervised neural architecture which, in contrast to the supervised ones, has been less applied to
adaptive filtering. For being a kind of clustering algorithm, its main field of application is vector quantization of signals (Hirose & Nagashima, 2003; Hofmann & Buhmann, 1998; Yair, Zeger, & Gersho, 1992), and hence it is not used as a stand-alone function approximator, but rather in conjunction with standard linear (Raivio, Henriksson, & Simula, 1998; Wang, Lin, Lu, & Yahagi, 2001) or nonlinear (Bouchired, Ibnkahla, Roviras, & Castanie, 1998) models. A few previous applications of the plain SOM architecture as a stand-alone equalizer are reported by Paquier and Ibnkahla (1998) and Peng, Nikias, and Proakis (1991). SOM-based architectures have also been successfully applied to a variety of signal processing related tasks, such as time series prediction (Barreto, Mota, Souza, & Frota, 2004; Lendasse, Lee, Wertz, & Verleysen, 2002; Koskela, Varsta, Heikkonen, & Kaski, 1998; Principe, Wang, & Motter, 1998; Simon, Lendasse, Cottrell, Fort, & Verleysen, 2004; Vesanto, 1997; Walter, Ritter, & Schulten, 1990), but usually in an offline fashion. By offline, we mean those applications in which the N entire training dataset {x(t), s(t)}t=1 can be presented several times to the neural network, possibly in a random presentation order at each training epoch, so that cross-validation tests can be performed to get an optimal network architecture. In the type of online learning we are interested in, such as adaptive channel equalization, no data-reuse through epochs is allowed. The coefficients of a given equalizer are adjusted continuously during a single pass of a training sequence, as usually done in real-world applications of adaptive filters. In a global perspective our main goal is to evaluate the SOM as a feasible tool for nonlinear adaptive filtering. The specific contributions of the paper are manifold, namely: (i) to evaluate how existing SOM-based architectures, which are normally applied to offline learning of input–output mappings, perform in an online channel equalization task; (ii) to introduce simple procedures for building RBF-based nonlinear equalizers using the Vector-Quantized Temporal Associative Memory (VQTAM) (Barreto & Ara´ujo, 2004), a recently proposed method for learning dynamical input–output mappings with the SOM; (iii) to provide a comprehensive literature survey of architectures related to the ones presented in (i) and (ii); and finally, (iv) to provide an in-depth performance comparison of SOM-based adaptive filters with standard linear and nonlinear equalizers. The obtained results indicate that the proposed SOM-based adaptive equalizers perform better than the standard linear ones and compare favorably with MLPbased equalizers. The remainder of the paper is organized as follows. In Section 2, three existing SOM-based architectures and related architectures, that have been commonly used for offline learning of input–output mappings, are described with the goal of also applying them to online channel equalization. In Section 3 we introduce two procedures to building RBF-based adaptive filters based on the VQTAM approach and review related architectures. In this section, we also describe all the previous SOM-based adaptive filters under the framework of modular networks. In Section 4, detailed computer simulations compare the performance of SOM-based adaptive filters with
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
standard linear and nonlinear ones in a channel equalization task. The paper is concluded in Section 5. 2. SOM-based adaptive filtering The Self-Organizing Map (SOM) is a well-known competitive learning algorithm. The SOM learns from examples a mapping (projection) from a high-dimensional continuous input space X onto a low-dimensional discrete space (lattice) A of N neurons which are arranged in fixed topological forms, e.g., as a rectangular two-dimensional array. The map i ∗ (x) : X → A, defined by the weight matrix W = (w1 , w2 , . . . , wq ), wi ∈ R p ⊂ X , assigns to each input vector x(t) ∈ R p ⊂ X , a neuron i ∗ (t) = arg min∀i kx(t) − wi (t)k, i ∗ (t) ∈ A, where k · k denotes the Euclidean distance and t symbolizes a discrete time step associated with the iterations of the algorithm. The weight vector of the current winning neuron as well as the weight vectors of its neighboring neurons are simultaneously adjusted according to the following learning rule: wi (t + 1) = wi (t) + α(t)h(i ∗ , i; t)[x(t) − wi (t)]
(5)
where 0 < α(t) < 1 is the learning rate and h(i ∗ , i; t) is a weighting function which limits the neighborhood of the winning neuron. A usual choice for h(i ∗ , i; t) is given by the Gaussian function: kri (t) − ri ∗ (t)k2 h(i ∗ , i; t) = exp − (6) 2σ 2 (t) where ri (t) and ri ∗ (t) are, respectively, the coordinates of the neurons i and i ∗ in the output array, and σ (t) > 0 defines the radius of the neighborhood function at time t. The variables α(t) and σ (t) should both decay with time to guarantee convergence of the weight vectors to stable steady states. In this paper, we adopt for both an exponential decay, given by: α(t) = α0
αT α0
(t/T ) and
σ (t) = σ0
σT σ0
(t/T ) (7)
where α0 (σ0 ) and αT (σT ) are the initial and final values of α(t) (σ (t)), respectively. Weight adjustment is performed until a steady state of global ordering of the weight vectors has been achieved. In this case, we say that the map has converged. The resulting map also preserves the topology of the input samples in the sense that adjacent patterns are mapped into adjacent regions on the map. Due to this topology-preserving property, the SOM is able to cluster input information and spatial relationships of the data on the map. Despite its simplicity, the SOM algorithm has been applied to a variety of complex problems (Flexer, 2001; Kohonen, Oja, Simula, Visa, & Kangas, 1996; Oja, Kaski, & Kohonen, 2003) and has become one of the most important ANN architectures. However, as pointed out in the introduction, just a few applications of the SOM in online adaptive filtering tasks, such as channel equalization, are reported in the literature. We
787
believe that this occurs because adaptive filtering is commonly viewed as a function approximation problem, while the SOM is commonly viewed as a neural vector-quantizer, not as a function approximator. The use of the SOM for function approximation is becoming more common in recent years, specially in the field of time series prediction, despite the fact that it has been applied to learning forward and inverse input–output mappings in the field of robotics since the year of 1989 (Ritter, Martinetz, & Schulten, 1989). In the following subsections we describe existing SOMbased architectures that have been originally proposed for approximating (nonlinear) input–output mappings in an offline fashion. In this work, these architectures will be applied without any changes to an online adaptive filtering task. The goal is to evaluate how they perform in a more realistic scenario, in which learning (i.e. weight updating) takes place continuously. The chosen task for this purpose is equalization of a nonlinear communication channel. 2.1. Local linear mapping The first architecture to be described is called Local Linear Mapping (LLM) (Walter et al., 1990), which was successfully applied to nonlinear time series prediction. The basic idea of the LLM is to associate each neuron in the SOM with a conventional FIR/LMS linear filter. The SOM array is used to quantize the input space in a reduced number of prototype vectors (and hence, Voronoi regions), while the filter associated with the winning neuron provides a local linear estimate of the output of the mapping being approximated. As defined previously, the input vector x(t) belongs to a p-dimensional continuous space X , over which the SOM performs online vector quantization. Thus, each sample vector x(t) = [y(t) y(t − 1) · · · y(t − p + 1)]T is built from a time window of width p, sliding over the input signal sequence. Clustering (or vector quantization) of the input space X is performed by the LLM as in the usual SOM algorithm, with each neuron i owning a prototype vector wi , i = 1, . . . , q. In addition, associated with each weight vector wi , there is a coefficient vector ai ∈ R p , which corresponds to the coefficients of a single FIR linear filter: ai (t) = [ai,0 (t) ai,1 (t) · · · ai, p−1 (t)]T
(8)
where p also denotes the order of the equalizer. The output value of the LLM architecture is then computed by means of the following equation: z(t) = sˆ (t) =
p−1 X
ai ∗ , j (t)x(t − j) = aiT∗ (t)x(t)
(9)
j=0
where ai ∗ (t) is the coefficient vector of the FIR filter associated with the winning neuron i ∗ (t). From Eq. (9), one can easily note that the coefficient vector ai ∗ (t) is used to build a local linear approximation of the output of the desired nonlinear mapping (see Fig. 1). Since the adjustable parameters of the LLM equalizer are the set of prototype vectors wi (t) and their associated coefficient
788
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
and references therein). In both scenarios, the input vector to the SOM at time step t, x(t), is composed of two parts. The first part, denoted xin (t) ∈ R p , carries data about the input of the dynamic mapping to be learned. The second part, denoted xout (t) ∈ Rq , contains data concerning the desired output of this mapping. The weight vector of neuron i, wi (t), has its dimension increased accordingly. These changes are formulated as follows: in in x (t) wi (t) x(t) = out and wi (t) = (11) x (t) wiout (t) Fig. 1. Sketch of the local linear mapping implemented by the LLM architecture.
vectors ai (t), i = 1, 2, . . . , q, we need two learning rules. The rule for updating the prototype vectors follows exactly the one given in Eq. (5). The learning rule of the coefficient vectors ai (t) is an extension of the plain LMS algorithm shown in Eq. (2), that also takes into account the influence of the neighborhood function h(i ∗ , i; t): ai (t + 1) = ai (t) + α 0 h(i ∗ , i; t)e(t)x(t) α0 h(i ∗ , i; t)[d(t) − aiT (t)x(t)]x(t) = ai (t) + kx(t)k2 (10) where 0 < α 0 1 denotes the adaptation step size of the coefficient vectors and d(t) is the actual output of the nonlinear mapping being approximated. It is worth emphasizing that, unlike the time-varying learning rate α(t), the adaptation step size α 0 is usually held constant at a relatively small value (α 0 = 0.001). This is also valid for the next SOM-based equalizers to be described. Related architectures: Stokbro, Umberger, and Hertz (1990), much the same way as implemented by the LLM architecture, associated a linear autoregressive filter with each hidden unit of an RBF network, applying it to prediction of chaotic time series. A similar model was recently proposed by Zaknich (2003) for adaptive filtering purposes. The main difference between these RBF-based approaches and the LLM architecture is that the former allows the activation of more than one hidden unit, while the latter allows only the winning neuron to be activated. In principle, by combining the output of several local linear models associated with the hidden units, instead of a single one, we can improve generalization performance. Finally, Martinetz, Berkovich, and Schulten (1993) developed an approach quite similar to the LLM architecture, but using the Neural Gas algorithm (Martinetz & Schulten, 1991) instead. 2.2. The VQTAM model Simply put, the VQTAM method is a generalization to the temporal domain of a SOM-based associative memory technique that has been used by many authors to learn static (memoryless) input–output mappings, specially within the domain of robotics (see Barreto, Ara´ujo, and Ritter (2003)
where wiin (t) ∈ R p and wiout (t) ∈ Rq are, respectively, the portions of the weight (prototype) vector which store information about the inputs and the outputs of the desired mapping. Depending on the variables chosen to build the vectors xin (t) and xout (t) one can use the SOM to learn forward or inverse mappings. It is worth emphasizing that these vectors do not necessarily have the same dimensionality. Indeed, we have p ≥ q in general. For the channel equalization task we are interested in, we have p > 1 and q = 1, so that the following definitions apply: xin (t) = [y(t) y(t − 1) · · · y(t − p + 1)]T x
out
(t) = s(t)
(12) (13)
where s(t) is the signal sample transmitted at the time step t, y(t) is the corresponding channel output, p is the order of the equalizer, and the T denotes the transpose vector. During learning, the winning neuron at time step t is determined based only on xin (t): i ∗ (t) = arg min{kxin (t) − wiin (t)k}. i∈A
(14)
For updating the weights, both xin (t) and xout (t) are used: wiin (t + 1) = wiin (t) + α(t)h(i ∗ , i; t)[xin (t) − wiin (t)] (15) out out ∗ out out wi (t + 1) = wi (t) + α(t)h(i , i; t)[x (t) − wi (t)] (16) where 0 < α(t) < 1 is the learning rate and h(i ∗ , i; t) is a time-varying Gaussian neighborhood function as defined in Eq. (6). In words, the learning rule in Eq. (15) performs topologypreserving vector quantization on the input space and the rule in Eq. (16) acts similarly on the output space of the mapping being learned. As training proceeds, the SOM learns to associate the input prototype vectors wiin with the corresponding output prototype vectors wiout (see Fig. 2). The SOM-based associative memory procedure implemented by the VQTAM can then be used for function approximation purposes. More specifically, once the SOM has been trained, its output z(t) for a new input vector is estimated from the learned codebook vectors, wiout ∗ (t), as follows: z(t) ≡ wiout ∗ (t)
(17)
out out out T where wiout ∗ = [w1,i ∗ w2,i ∗ · · · wq,i ∗ ] is the weight vector of ∗ the current winning neuron i (t). For the channel equalization
789
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
Fig. 2. Associative mapping between input and output Voronoi cells of a trained VQTAM model. The symbols ‘•’ denote the prototype vectors of each input/output cell. The symbol ‘×’ denotes an input vector.
task we are interested in, we have q = 1. Thus, the output of the VQTAM equalizer is a scalar version of Eq. (17), given by: out z(t) = sˆ (t) = w1,i (t)
(18)
where sˆ (t) is an estimate of the signal sample transmitted at time t. It is worth noting that this kind of associative memory is different from the usual supervised approach. In MLP and RBF networks, the vector xin (t) is presented to the network input, while xout (t) is used at the network output to compute explicitly an error signal that guides learning. The VQTAM method instead allows competitive neural networks, such as the SOM, to correlate the inputs and outputs of the mapping without computing an error signal explicitly.2 Due to the inherent continuous-to-discrete transformation implemented by the VQTAM method as a clustering algorithm, it may require too many neurons to provide small estimation out (t), when errors, e(t) = s(t) − z(t) = s(t) − w1,i ∗ approximating continuous mappings. This limitation can be somewhat alleviated through the use of suitable interpolation methods. For example, let i 1∗ (t) and i 2∗ (t) be the closest and second-closest neurons to the current input vector xin (t). Geometric interpolation (G¨oppert & Rosenstiel, 1993) can be used to compute the orthogonal projection of the vector formed out by wiout ∗ (t) onto the vector wi ∗ (t). Topological interpolation 2 1 (G¨oppert & Rosenstiel, 1995), instead, is based on a selection of the topological neighborhoods of the winning neuron i 1∗ (t), which is an advantage over the geometric method, unless there are topological defects on the chosen map. A good but computationally more expensive alternative is to use the Parametrized Self-Organizing Map (PSOM) (Walter & Ritter, 1996), a continuous version of the SOM, as a nonlinear interpolator. Related architectures: Yamakawa and Horio (1999) and Lendasse et al. (2002) developed independently SOM-based architectures equivalent to the VQTAM and successfully applied them to the design of a power system controller and time series prediction, respectively. As concerns the simultaneous associative clustering of the input and output
spaces of a given mapping, the VQTAM method is quite similar to the Counterpropagation (Hecht-Nielsen, 1988) and ARTMAP (Carpenter, Grossberg, & Reynolds, 1991) networks. The main difference is that the VQTAM inherits SOM’s topology-preservation property, which has been shown to be quite useful for data visualization purposes (Flexer, 2001) or to speed up vector quantization (DeBodt, Cottrell, Letremy, & Verleysen, 2004). Topology-preservation is not an observable property of the Counterpropagation and ARTMAP networks. Finally, Pham and Sukkar (1995) developed an associative approach similar to the VQTAM method, but using the ART2 architecture (Carpenter & Grossberg, 2002) instead. The resulting architecture, called SMART2, despite presenting a good approximation ability for dynamic nonlinear mappings, also does not possess the topology-preserving property. 2.3. Prototype-based local least-squares regression model Barreto et al. (2004) proposed a VQTAM-based local linear regression method, called the KSOM model, which was successfully applied to nonstationary time series prediction. In this paper we describe this architecture in the context of the equalization task. The idea behind the KSOM is to train the VQTAM as described above using just a few neurons (usually less than 100 units) in order to have a “compact” representation of the time series encoded in the weight vectors of the VQTAM. Then, for each new input vector, the coefficients of a linear FIR filter are computed by the standard least-squares (LS) technique using the weight vectors of the K first winning neurons (i.e. those most similar to the current input vector), denoted by {i 1∗ , i 2∗ , . . . , i K∗ }: i 1∗ (t) = arg min{kxin (t) − wiin (t)k} ∀i
i 2∗ (t)
= arg min∗ {kxin (t) − wiin (t)k} ∀i6=i 1
.. . i K∗ (t) = arg
.. .
(19)
.. . min
∗ ∀i6={i 1∗ ,...,i K −1 }
{kxin (t) − wiin (t)k}. K
2 In reality, an error signal is computed implicitly through Eq. (16).
Then, the KSOM uses K pairs {wiin∗ (t), wiout , extracted ∗ (t)} k k k=1 from the K winning prototypes, with the aim of building a local
790
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
linear function approximator at time t: T in wiout ∗ = a (t)wi ∗ (t), k
k
k = 1, . . . , K
(20)
where a(t) = [a1 (t) a2 (t) · · · a p (t)]T is a time-varying coefficient vector. Eq. (20) can be written in a matrix form as follows: z(t) = R(t)a(t),
(21)
where the output vector z(t) and the regression matrix R at time t are defined by the following equations: out out T z(t) = [wiout ∗ (t) wi ∗ ,1 (t) · · · wi ∗ ,1 (t)] K 1 ,1 2 in wi ∗ ,1 (t) wiin∗ ,2 (t) · · · wiin∗ , p (t) 1 1 in1 w ∗ (t) win∗ (t) · · · w in∗ (t) i2 ,1 i 2 ,2 i2 , p R(t) = . .. .. .. . . .
wiin∗ ,1 (t) K
wiin∗ ,2 (t)
···
K
(22)
(23)
wiin∗ , p (t) K
In practice, we usually have p > K , i.e. R is a non-square matrix. In this case, we resort to the Pseudoinverse method (Principe et al., 2000; Haykin, 1994), which is equivalent to the Least-Squares Estimation (LSE) technique when additive white Gaussian noise is assumed. Thus, the coefficient vector a(t) is given by: a(t) = (RT (t)R(t) + λI)−1 RT (t)z(t)
(24)
where I is a identity matrix of order K and λ > 0 (e.g. λ = 0.01) is a small constant added to the diagonal of RT (t)R(t) to make sure that this matrix is full rank. Once a(t) has been computed, we can estimate the output of the nonlinear mapping being approximated by the output of a pth order linear FIR filter: z(t) = sˆ (t) = aT (t)xin (t) =
p−1 X
a j (t)x(t − j + 1).
(25)
j=0
a competitive learning network through the Recursive LeastSquares (RLS) algorithm. An immediate advantage of Chen and Xi’s proposal over the just mentioned local architectures is that it does not require matrix inversions. Unlike the KSOM model, which computes a single local linear model for each input vector, Vesanto (1997) trained the SOM architecture to build several local linear autoregressive (AR) models, one for each neuron of the network. The local AR model of neuron i is built based solely in the subset of the input vectors for which that neuron was the winner during the training stage. This subset of input data, not the prototype vectors of the SOM, is then used to compute the coefficients of the AR model through the LSE technique. The same approach was adopted by Koskela et al. (1998), who trained instead a temporal variant of the SOM to perform clustering of the time series. 3. Building RBF-like filters from VQTAM The VQTAM method itself can be used for function approximation purposes. However, as pointed out in previously it is essentially a vector quantization method, and so it may require a large number of neurons to achieve accurate generalization. To improve VQTAM’s performance on interpolation, we introduce two simple RBF models designed from a trained VQTAM architecture. 3.1. A global RBF model Assuming that the trained VQTAM has q neurons, a general Radial Basis Function (RBF) network with q Gaussian basis functions and M output neurons can be built over the learned input and output codebook vectors, wiin and wiout , as follows: q P
z(t) =
i=1
wiout G i (xin (t))
q P
(26) G i (xin (t))
i=1
xin (t)
In a nutshell, for each input vector we compute the coefficient vector a(t) through Eq. (24), and then use it to building the linear estimator shown in Eq. (25). Related architectures: Principe et al. (1998) proposed a neural architecture for nonlinear system identification and control which is equivalent to KSOM in the sense that the coefficient vector a(t) is computed from K prototype vectors of a trained SOM using the LSE technique. However, the required prototype vectors are not selected as the K nearest prototypes to the current input vector, but rather automatically selected as the winning prototype at time t and its K − 1 topological neighbors. If a perfect topology preservation is achieved during SOM training, the neurons in the topological neighborhood of the winner are also the closest ones to the current input vector. However, if topological defects are present, as usually occurs for multidimensional data, the KSOM provides more accurate results. Chen and Xi (1998) proposed a local linear regression model whose coefficient vectors are computed using the prototypes of
where z(t) = [z 1 (t) z 2 (t) · · · z M (t)]T is the output vector, out w out · · · w out ]T is the weight vector connecting wiout = [w1,i M,i 2,i the ith basis function to the M output units, and G i (xin (t)) is the response of this basis function to the current input vector xin (t) (see Fig. 3): ! kxin (t) − wiin k2 in (27) G i (x (t)) = exp − 2γ 2 where wiin plays the role of the center of the ith basis function and γ > 0 defines its radius (or spread). Note that in Eq. (26), all the q codebook vectors are used to estimate the corresponding output. In this sense, we referred to the RBF model just described as the Global RBF (GRBF) model, despite the localized nature of each Gaussian basis. Since we are interested in the equalization of a single communication channel, we set M = 1. In this case, the output vector of the GRBF network in Eq. (26), which is used to
791
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
Fig. 3. Sketch of the GRBF architecture.
estimate (recover) the current transmitted sample, reduces to a scalar output, z(t), defined as: q P
z(t) = sˆ (t) =
i=1
out G (xin (t)) w1,i i
q P
.
(28)
G i (xin (t))
i=1
In the usual two-phase RBF training, the centers of the basis functions are firstly determined by clustering the xin vectors through e.g. the K -means algorithm (MacQueen, 1967). Then the hidden-to-output layer weights are computed through the LMS rule (or the pseudoinverse method) (Principe et al., 2000; Haykin, 1994). In the GRBF model just described one single learning phase is necessary, in which SOM-based clustering is performed simultaneously on the input–output pairs {xin (t), xout (t)}. Related architectures: It is worthwhile to contrast the GRBF with two well-known RBF design strategies, the Generalized Regression Neural Network (GRNN) (Specht, 1991) and the Modified Probabilistic Neural Network (MPNN) (Zaknich, 1998). In the GRNN, there is a basis function centered at every training input data vector xin , and the hidden-to-output weights are just the target values xout . If the number of input–output pairs {xin (t), xout (t)} available is large, as is commonly the case in adaptive filtering problems, the resulting GRNN architecture will also be large, limiting its application in real time. The MPNN and the GRNN share the same theoretical background and basic network structure. The difference is that MPNN uses the K -means clustering method for the computation of its centers, thus alleviating the computational cost of the original GRNN and improving its generalization ability. For both the GRNN/MPNN models, the output is simply a weighted average of the target values of training vectors close to the given input vector. For the GRBF instead, the output is the weighted average of the output codebook vectors wiout associated with the input codebook vectors wiout close to the given input vector xin , as stated in Eqs. (26) and (28). 3.2. A local RBF model As pointed out previously, many neural architectures have been proposed with the aim of approximating globally
nonlinear input–output mappings by locally linear models. The idea behind this approach is that just a small portion of the input space is used to estimate the output of the mapping for each new input vector. In the context of the VQTAM architecture, local modeling means that we need only 1 < K < q prototypes to set up the centers of the basis functions and the hidden-to-output weights of an RBF architecture. To this purpose, we suggest to use the prototype vectors of the first K winning neurons {i 1∗ (t), i 2∗ (t), . . . , i K∗ (t)}, as determined in Eq. (19). Thus, the estimated output is now given by: K P
z(t) =
k=1
out G ∗ (xin (t)) w1,i ∗ ik
K P k=1
k
.
(29)
G ik∗ (xin (t))
We referred to the local RBF architecture thus built as the KRBF architecture. It is worth noting that the VQTAM and GRBF architectures become particular instances of the KRBF if we set K = 1 and K = q, respectively. Related architectures: A local RBF model was proposed earlier in Chng, Yang, and Skarbek (1996). First, a GRNN model is built and then only those centers within a certain distance ε > 0 from the current input vector are used to estimate the output. This idea is basically the same as that used to build the KRBF, but it suffers from the same drawbacks as the GRNN model regarding its high computational cost. Furthermore, depending on the value of ε the number of selected centers may vary considerably (in a random way) at each time step. If ε is too small, it may happen that no centers are selected at all! This never occurs for the KRBF model, since the same number of K centers is selected at each time step. Dablemont et al. (2003) proposed a local RBF model based on the Double Vector Quantization (DVQ) method (Simon et al., 2004). The DVQ requires two SOM networks: one to cluster the regressors xin (t), and a second one to cluster the deformations 1xin (t) = xout (t + 1) − xout (t). By definition, each deformation 1xout (t) is associated with a single regressor xin (t). Let wi1 , i = 1, . . . , q1 , be the ith prototype of the first SOM. Similarly, let w2j , j = 1, . . . , q2 , be the jth prototype of the second SOM. A frequency table, [ f i j ]q1 ×q2 ,
792
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
is then computed, in which entry f i j corresponds to the ratio of the number of regressors and associated deformations jointly mapped to wi1 and w2j , respectively, to the number of regressors mapped to wi1 . Using the values of f i j , Dablemont et al. (2003) built q1 ×q2 local RBF models which were successfully applied to time series prediction. 3.3. SOM-based adaptive filters as networks of local experts All the SOM-based adaptive filters presented so far can be described in architectural terms under the framework of Networks of Local Experts. Recall that neurons in standard MLP networks are trained in a fully cooperative manner; that is, all the neurons are adjusted at the same time for the common goal of solving the problem. On the opposite side, due to the competitive nature of neurons in the SOM, only part of the network is responsible for mapping a portion of the input data space.3 The competitive nature of SOM-based filters gives rise in an architectural level to structures that have been called modular networks (Haykin, 1994; Principe et al., 2000). Modular networks are built from expert networks (local expert filters, in our case!), all of which work with the same input and are coordinated by a gating network. The output of a competitive network of local expert filters is given by: z(t) =
q X
gi (t)z i (t)
(30)
i=1
where gi is the gate output, which weighs the contribution of the output of the ith expert filter, z i (t), to the overall output of the network. In the following, we use Eq. (30) to illustrate how the gate output gi (t) and the output z i (t) of the ith local filter are computed by each of the SOM-based equalizers described in this paper. LLM filter — This architecture can be thought as an expert network that uses a winner-take-all (WTA) mechanism to select the output of the local filters; that is, gi (t) = 1, if i = i ∗ (t), where i ∗ (t) denotes the winning neuron, and gi (t) = 0, otherwise. This is equivalent to saying that the gate network selects the expert filter associated with the winning neuron. Then, the output is given by z(t) = z i ∗ (t) = aiT∗ (t)x(t). VQTAM filter — This architecture also works according to a WTA mechanism of selection. In this case, the values of the gate network are computed as in the LLM filter, but the winning neuron is found through Eq. (14) instead. Then, the out (t), for the single output is given by z(t) = z i ∗ (t) = w1,i ∗ out ∗ output case; or z(t) = zi (t) = wi ∗ (t), for the case of multiple outputs. KSOM filter — For each input vector, this architecture selects K expert filters among all neurons of the VQTAM based on their distances to the current input vector; that is, 3 There is indeed a certain degree of cooperation in SOM-based filters during the training phase, but it is rather localized around the vicinity of the winning neurons.
gi (t) = 1, if i ∈ {i 1∗ (t), . . . , i K∗ (t)}, where i k∗ (t) is the kth winning neuron associated with the input vector presented at time t, and gi (t) = 0, otherwise. The output of the modular network is then computed by z(t) = aT (t)xin (t), where the coefficient vector a(t) is computed as in Eq. (24). GRBF filter — This architecture enables all the expert filters to contribute to the final output by means of the following equation for the gate network: gi (t) =
G i (xin (t)) . q P G l (xin (t))
(31)
l=1
Note, however, that the localized nature of the Gaussian basis functions implicitly constrains the degree of contribution of each expert filter to the final output of the network in Eq. (30). KRBF filter — The degree of contribution of each expert filter to the final output of the network in this architecture is even more constrained by an explicit mechanism of selecting the K most active basis functions, i.e. those whose centers are the closest ones to the current input vector. Thus, the values of the gate network are given by the following equation: G i (xin (t)) , if i ∈ {i 1∗ (t), . . . , i K∗ (t)} q P in gi (t) = (32) G l (x (t)) l=1 0, otherwise where q is the number of hidden units (local experts). The output of the network is then given by: z(t) = sˆ (t) ≡
q X
out gi (t)w1,i ,
i = 1, . . . , K .
(33)
i=1
As final comments, it is worthwhile to contrast the use of competitive learning to create networks of expert filters with committees of networks and the mixture of experts architecture. In committees several networks works cooperatively to solve a problem, while in expert networks competition among the local filters forces them to model different regions of the input space. In mixture of experts networks (Jacobs, Jordan, Nowlan, & Hinton, 1991) the outputs are computed in a manner that resembles that of the GRBF filter. However, they are derived from very different first principles. The GRBF filter is derived from a trained VQTAM architecture, which is basically a vector quantizer, while statistical reasoning is used by the mixture of experts architecture. 4. Computer simulations and discussion Nonlinear channel model: In this section we compare the performances of neural equalizers implemented via the LLM, VQTAM, KSOM, GRBF, KRBF and MLP neural architectures. In addition, the standard FIR/LMS and FIR/LMS–Newton equalizers are also evaluated as baselines of performance. The signals used to train these adaptive filters originate from a simulated nonlinear noisy channel with memory. Such channels
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
793
are met frequently in practice; and the simulation tries to capture many of the important features observed in real-world data sequences, such as the temporal correlation among signal samples, distortions due to noise and nonlinearities due to saturation phenomena of the power amplifiers in the transmitter (Saleh, 1981; Ibnkahla, 2000). N Firstly, the signal sequence {s(t)}t=1 is realized as a firstorder Gauss–Markov process: s(t) = as(t − 1) + bε(t)
(34)
N (0, σε2 ) is the white Gaussian driving noise, and
where ε(t) ∼ −1 < a < 1 is a constant. Therefore, the signal samples are correlated zero-mean Gaussian samples, i.e. s(t) ∼ N (0, σs2 ), b2 2 with power σs2 given by σs2 = 1−a 2 σε . For a = 0, the source signal becomes a white Gaussian noise sequence. A noisy signal sample at time t is defined as v(t) = s(t) + w(t)
(35)
where s(t) ∈ R is the noise-free transmitted signal sample, and w(t) ∼ N (0, σw2 ) is a white Gaussian noise sequence. Then, a linear channel is realized by the following equations: u(t) =
hT v(t) , khk
t = 1, 2, . . . , N
(36)
where u(t) ∈ R is the linear channel output, v(t) = [v(t) v(t − 1) · · · v(t − n + 1)]T is the tapped-delay vector containing the n most recent noisy samples, h ∈ Rn is the linear channel impulse response, and N is the length of the signal sequence. N We assume that data sequence {s(t)}t=1 and the noise sequence N {w(t)}t=1 are jointly independent. Finally, the output y(t) of the nonlinear channel is obtained by applying a static nonlinearity to the signal u(t): y(t) = G 1 (u(t)) + G 2 (u(t))
(37)
where G 1 (u(t)) = 0.2u(t) − 0.2u 2 (t) + 0.04u 3 (t),
and
G 2 (u(t)) = 0.9 tanh(u(t) − 0.1u (t) + 0.5u (t) − 0.1u 4 (t) + 0.5). 2
(38)
3
(39)
Typical realizations of the signal sequences u(t) and y(t) are shown in Fig. 4(b). It is worth observing in this figure the saturation phenomena caused by the sigmoidal nonlinearity G 2 (u(t)). Simulation setup: All the equalizers are trained online, i.e. in N a single pass of the first half of the sequences {y(t), s(t)}t=1 , while the second half is used to test their generalization (prediction) performances. A total of 500 training/testing runs are performed for a given filter in order to assign statistical confidence to the computation of the normalized mean squared error (NMSE): NMSE =
σˆ e2 σˆ y2
(40)
where σˆ e2 is the estimated variance of the residuals e(t) = s(t) − z(t) and σˆ y2 is the estimated variance of the sequence
Fig. 4. Typical realizations of signals u(t) and y(t).
of distorted samples (equalizer’s input sequence). For each training/testing run, different noise sequences {w(t), ε(t)} are used to generate new realizations of the signal sequences {y(t), s(t)}. The following parameters are used for the simulations: Nonlinear channel — a = 0.95, b = 0.1, σε2 = 1, σw2 = 0.03, h = [1 0.8 0.5]T , n = 3 and N = 6000. Without loss of generality, we assume that the period between two transmitted samples is larger than the processing time of the proposed adaptive equalizers. Furthermore, no encoding/decoding technique is used for the transmitted signal sequence. Equalizers — After some experimentation, the adaptation step size α 0 of the FIR/LMS and FIR/LMS–Newton equalizers is set to 10−4 . The same value of α 0 is used by the LLM architecture to update the parameters of its local LMS equalizers. For larger values of α 0 , the FIR/LMS, FIR/LMS–Newton and LLM equalizers did not converge. For all VQTAM-based equalizers (KSOM, GRBF and KRBF) we set α = 0.1. The parameters of the neighborhood function are set to σ0 = 0.5q and σ N = 10−2 , where the number of neurons q depends on the goal of the simulation being carried out. The spread γ of the Gaussian basis functions for the GRBF and KRBF equalizers is computed as the average value of the distances between the center of the ith basis function to its K nearest centers. A nonlinear MLP-based equalizer trained with the backpropagation algorithm with momentum term is also evaluated (learning rate α = 0.1 and momentum constant η = 0.9). This equalizer has one hidden layer of neurons with hyperbolic tangent activation functions and a linear output neuron. Due to the linear output neuron, this MLP equalizer can be thought of as a standard LMS equalizer preceded by a feature extraction stage implemented by the hidden layer of nonlinear neurons. The following simulations evaluate the speed of convergence of a given equalizer and its generalization ability. For a neural equalizer with q neurons, an empirical notion of the speed of convergence is given by its learning curve, a plot of the average NMSE value obtained at iteration t throughout all training realizations. A fast equalizer should approach a low steady NMSE value as fast as it can, i.e. within the
794
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
Fig. 5. Learning curves: (a) LLM, FIR/LMS and FIR/LMS–Newton equalizers. (b) LLM, VQTAM, MLP neural equalizers (q = 10).
minimum number of iterations possible. An overall evaluation of the generalization ability of a neural equalizer is given by plotting the NMSE values generated during testing for different numbers of neurons. Learning curves: The first set of simulations evaluates the learning curves of the LLM, VQTAM, MLP and FIR/LMS and FIR/LMS–Newton equalizers. The results are shown in Fig. 5. For this simulation, the number of neurons was set to q = 10. As expected, the neural equalizers converged much faster than the standard linear one. In particular, the LLM equalizer and the FIR/LMS equalizer converged to approximately the same steady-state values, but the former converged much faster than the latter. The FIR/LMS–Newton equalizer had a particularly poor performance, explained in part by the fact that the autocorrelation matrix captures only linear dependencies between successive samples. The VQTAM equalizer converged as fast as the MLP equalizer, but the former has produced slightly higher steady-state NMSE values. There are no learning curves for the KSOM, GRBF and KRBF equalizers because they are to be used only during the testing phase, since they are built from the VQTAM architecture. Generalization curves: A learning curve gives only a partial view of the performance of a given equalizer. The second set of simulations attempts to evaluate the generalization or prediction
Fig. 6. Generalization error (NMSE) versus the number of hidden neurons (q): (a) LLM, MLP and VQTAM equalizers, and (b) LLM, MLP and GRBF equalizers.
ability of the equalizers after its convergence, when the weights are then frozen and the equalizers tested for an unforeseen input sequence. In order to evaluate the influence of the number of hidden neurons on the generalization performance of the LLM, VQTAM, GRBF and MLP equalizers, we compute the resulting NMSE value for each value of q ∈ [4 − 20]. The results are shown in Figs. 6(a) and 6(b). Two issues are worthwhile to point out here: (1) even presenting the slower convergence rate among the neural equalizers, the LLM equalizer performed better than the MLP equalizer during generalization, and (2) the LLM and MLP equalizers performed better than the VQTAM and GRBF equalizers when few neurons are used. However, unlike the VQTAM and GRBF equalizers, the performances of the LLM and MLP equalizers do not improve as q increases. Their NMSE values maintain a slight tendency to increase mostly due to overfitting. For the VQTAM and GRBF equalizers, the NMSE always decreases since we are using more prototype vectors to quantize the input–output spaces, thus reducing the associated quantization errors. For q ≥ 15, the VQTAM equalizer performs better than both LLM and MLP equalizers.
795
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798 Table 1 Generalization performance of the several equalizers for q = 20 Neural equalizer MLP-LM KRBF-8 GRBF VQTAM MLP-2h LLM MLP-1h KSOM-15
Fig. 7. Generalization error (NMSE) versus the number of nearest prototypes K used by the KRBF and KSOM equalizers.
For q ≥ 12, the GRBF equalizer performs better than the MLP and LLM equalizers, demonstrating that the nonlinear interpolation implemented by the Gaussian basis functions improves the generalization power of the VQTAM equalizer. In the third set of simulations, we evaluate the generalization performances of the KSOM and KRBF equalizers as a function of the number of local prototype vectors used to build them. For this simulation, we used q = 20 and varied K from 10 to 20. The results are shown in Fig. 7. One can easily note that the KRBF performed better than the KSOM for all the range of variation of K . For this simulation, the minimum NMSE for the KRBF was obtained for K = 8, while for the KSOM the minimum was obtained for K = 15. In the fourth set of simulations we show in Table 1 the best generalization results obtained for the neural equalizers simulated in this paper, assuming q = 20 for all of them. In order to provide a fair comparison, we also evaluated the performance of a one-hidden-layer MLP (MLP-1h) equalizer trained with the Levenberg–Marquardt (LM) method (Principe et al., 2000). Also, a two-hidden-layer MLP-based (MLP2h) adaptive filter, previously applied to nonlinear channel identification in Bershad, Ibnkahla, and Castani´e (1997), is simulated here as an equalizer trained with standard backpropagation with momentum. The number of neurons in the second hidden layer of the MLP-2h equalizer was set to 10 (i.e. half the number of neurons in the first hidden layer). As can be noted, the LM training method has improved considerably the generalization ability of the MLP equalizer. However, all the SOM-based equalizers, except the KSOM equalizer (K = 15), performed better than or as well as the other two MLP-based (MLP-1h and MLP-2h) equalizers. This is a quite amazing result since the sigmoidal nonlinearity used by the channel model should in principle favor the performance of all the MLP-based equalizers (this is the type of nonlinearity used by the hidden neurons!). Another advantage of SOMbased equalizers over MLP-based ones, useful for real-world implementations, is that they can be easily implemented in hardware (Card, Rosendahl, McNeill, & McLeod, 1998). The
NMSE mean
min
max
variance
0.0143 0.0487 0.0500 0.0583 0.0696 0.0722 0.0785 0.0991
0.0057 0.0190 0.0177 0.0265 0.0269 0.0319 0.0185 0.0251
0.0681 0.1723 0.2198 0.1428 0.4242 0.1596 1.3479 0.7721
5.58 × 10−5 4.67 × 10−5 6.79 × 10−4 3.33 × 10−4 2.20 × 10−3 5.72 × 10−4 0.0065 0.0076
best SOM-based equalizer was the KRBF equalizer using K = 8 nearest prototypes, followed closely by the GRBF equalizer. It should be pointed out, however, that the KRBF and GRBF algorithms are computationally more expensive than the VQTAM and the LLM equalizers. In this sense, the VQTAM and LLM equalizers seems to offer the best trade-off between generalization error and computation cost among the SOMbased equalizers. Evaluation using real-world data: For the sake of completeness, the final set of simulations aims to evaluate the function approximation abilities of the previous neural-based adaptive filters using real-world signal sequences. The task chosen for this purpose is the recursive identification of the inverse model of a hydraulic actuator (also called inverse modeling), which is closely related to channel equalization. Fig. 8 shows measured values of the valve position (input signal sequence, {u(t)}) and the oil pressure (output signal sequence, {y(t)}). The oil pressure signal sequence shows an oscillatory behavior caused by mechanical resonances (Sj¨oberg et al., 1995). If one is interested in estimating the current output y(t) based on previous values of the input and output variables, then the neural filters should approximate the following nonlinear feedforward mapping: y(t) = f [y(t − 1), . . . , y(t − q); u(t), u(t − 1), . . . , u(t − p + 1)]
(41)
where p and q, q > p, are the orders of the input and output regressors, respectively. On the other hand, for the inverse modeling task we are interested in, the neural filters should implement the inverse mapping f −1 (·), given by: u(t) = f −1 [u(t − 1), . . . , u(t − p + 1); y(t − 1), . . . , y(t − q)]
(42)
whose goal is to estimate the input of a given system based on previous values of the input and output variables. This kind of nonlinear inverse model of a system and the corresponding online identification of its parameters are useful, for example, for real-time control purposes (Norgaard, Ravn, Poulsen, & Hansen, 2000). For simplicity, we use the same number of neurons of the previous simulations, and the input and output signals are rescaled to the range [−1, +1]. The filters are trained using the first 512 samples of the input/output signal sequences and tested
796
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
Fig. 9. Typical sequence of estimated values of the valve position provided by the KSOM filter. The open circles ‘◦’ denote actual sample values, while the solid line indicates the estimated sequence.
Fig. 8. Measured values of valve position (left) and oil pressure (right).
Table 2 Performance of the neural-based filters using real-world data Neural filter KSOM-15 LLM MLP-1h MLP-LM VQTAM KRBF-8 GRBF MLP-2h
NMSE mean
min
max
variance
0.0019 0.0347 0.0453 0.0722 0.1219 0.1583 0.1740 0.3516
0.0002 0.0181 0.0358 0.0048 0.1160 0.1374 0.1444 0.0980
0.0247 0.0651 0.0543 0.3079 0.1256 0.1680 0.1949 2.6986
1.15 × 10−5 1.58 × 10−4 1.39 × 10−5 0.0041 4.72 × 10−6 2.48 × 10−5 1.84 × 10−4 0.0963
with the remaining 512 samples. For all SOM-based filters, the learning rate now decays exponentially in time (with parameters α0 = 0.5 and αT = 0.01, where T = 512). The learning rates of all MLP-based filters are set to α = 0.1. For the LLM filter, we set α 0 = 0.1. The input and output memory orders were set to p = 4 and q = 5, respectively. The obtained NMSE values were averaged over 100 training/testing runs, in which the weights of the neural filters were randomly initialized at each run. The results are shown in Table 2.
In this table, the ranking of the neural filters in terms of the NMSE errors was practically inverted in comparison to the table of results of the equalization task. In particular, the performances of KSOM and LLM adaptive filters on this realworld application are better than the MLP-based filters. This can be explained in part due to the fact that the types of nonlinearity of the hydraulic actuator are not necessarily of sigmoidal type, and thus the LLM and KSOM filters fairly demonstrated their better performances with respect to MLPbased ones.4 A typical sequence of estimated values of the valve position provided by the KSOM filter is shown in Fig. 9. A final issue worth discussing is the poor performances of VQTAM-based adaptive filters (VQTAM, GRBF and KRBF), caused by the small number of neurons used. Since these filters are being used basically as vector quantizers, they require many neurons to approximate a mapping that has a complex dynamic behavior. This is in accordance with the results presented by Barreto and Ara´ujo (2004), who used hundreds of neurons to obtain a good approximation performance for the VQTAM using the same input–output data. 5. Conclusion and further work In this paper we evaluated SOM-based architectures, which have been normally applied to offline learning of input–output mappings, in two online adaptive filtering tasks, namely nonlinear channel equalization and inverse mapping identification. We also introduced two simple procedures for building RBF-based nonlinear equalizers using the VQTAM model, a recently proposed method for learning dynamical input–output mappings with the SOM. A comprehensive literature survey of architectures related to the ones discussed in this paper is provided. An in-depth performance comparison 4 Recall that a sigmoidal nonlinearity was present in the simulated channel model, which has favored the MLP-based filters considerably in the nonlinear equalization task; but even in that case, the performances of the SOM-based filters were rather impressive.
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
has shown that the SOM-based adaptive filters can consistently outperform powerful MLP-based filters in simulated and realworld tasks. We are currently developing adaptive strategies to determine the number of neurons (q) that yields optimal performance for SOM-based adaptive filters. Some of these strategies are based on the growing self-organizing map (GSOM) (Villmann & Bauer, 1998) and Growing Cell Structures (GCS) (Fritzke, 1994) to add neurons to the network until a given performance criterion has been reached. Acknowledgments The authors would like to thank CNPq (grants #506979/ 2004-0 and #305275/2002-0) for their financial support. We also thank the reviewers for their valuable suggestions for improving the paper. References Adali, T., Liu, X., & Smez, M. K. (1997). Conditional distribution learning with neural networks and its application to channel equalization. IEEE Transactions on Signal Processing, 45(4), 1051–1064. Barreto, G., Mota, J., Souza, L., & Frota, R. (2004). Nonstationary time series prediction using local models based on competitive neural networks. Lecture Notes in Computer Science, 3029, 1146–1155. Barreto, G. A., & Ara´ujo, A. F. R. (2004). Identification and control of dynamical systems using the self-organizing map. IEEE Transactions on Neural Networks, 15(5), 1244–1259. Barreto, G. A., Ara´ujo, A. F. R., & Ritter, H. J. (2003). Self-organizing feature maps for modeling and control of robotic manipulators. Journal of Intelligent and Robotic Systems, 36(4), 407–450. Bershad, N. J., Ibnkahla, M., & Castani´e, F. (1997). Statistical analysis of a two-layer backpropagation algorithm used for modeling nonlinear memoryless channels: The single neuron case. IEEE Transactions on Signals Processing, 45(3), 747–756. Bouchired, S., Ibnkahla, M., Roviras, D., & Castanie, F. (1998) Equalization of satellite mobile communication channels using combined self-organizing maps and RBF networks. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (vol. 6) (pp. 3377–3379). Card, H. C., Rosendahl, G. K., McNeill, D. K., & McLeod, R. D. (1998). Competitive learning algorithms and neurocomputer architecture. IEEE Transactions on Computers, 47(8), 847–858. Carpenter, G., Grossberg, S., & Reynolds, J. H. (1991). ARTMAP: Supervised real-time learning and classification of nonstationary data by a selforganizing neural network. Neural Networks, 4, 565–588. Carpenter, G. A., & Grossberg, S. (2002). Adaptive resonance theory. In M. Arbib (Ed.), The handbook of brain theory and neural networks. Cambridge, MA: MIT Press. Chang, P. -H., & Wang, B. -C. (1995). Adaptive decision feedback equalization for digital satellite channels using multilayer neural networks. IEEE Journal on Selected Areas in Communications, 13(2), 316–324. Chen, J. -Q., & Xi, Y. -G. (1998). Nonlinear system modeling by competitive learning and adaptive fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics-Part C, 28(2), 231–238. Chen, S., Mulgrew, B., & Grant, P. (1993). A clustering technique for digital communications channel equalization using radial basis function. IEEE Transactions on Neural Networks, 4(4), 570–579. Chng, E. -S., Yang, H. H., & Skarbek, W. (1996). Reduced complexity implementation of the bayesian equaliser using local RBF network for channel equalisation problem. Electronics Letters, 32(1), 17–19. Dablemont, S., Simon, G., Lendasse, A., Ruttiens, A., Blayo, F., & Verleysen, M. (2003) Time series forecasting with SOM and local non-linear models — Application to the DAX30 index prediction. In Proceedings of the 4th workshop on self-organizing maps (pp. 340–345).
797
DeBodt, E., Cottrell, M., Letremy, P., & Verleysen, M. (2004). On the use of self-organizing maps to accelerate vector quantization. Neurocomputing, 56, 187–203. Feng, J. -C., Tse, C. K., & Lau, F. C. M. (2003). A neural-network-based channel equalization strategy for chaos-based communication systems. IEEE Transactions on Circuits and Systems - I, 50(7), 954–957. Flexer, A. (2001). On the use of self-organizing maps for clustering and visualization. Intelligent Data Analysis, 5(5), 373–384. Fritzke, B. (1994). Growing cell structures — a self-organizing network for unsupervised and supervised learning. Neural Network, 7(9), 1441–1460. G¨oppert, J., & Rosenstiel, W. (1993) Topology preserving interpolation in selforganizing maps. In Proceedings of the neuronimes’93 (pp. 425–434). G¨oppert, J., & Rosenstiel, W. (1995) Topological interpolation in SOM by affine transformations. In Proceedings of the European symposium on artificial neural networks. Haykin, S. (1994). Neural networks: A comprehensive foundation. Englewood Cliffs, NJ: Macmillan Publishing Company. Haykin, S. (1996). Adaptive filter theory (3rd ed.). Englewood Cliffs, NJ: Prentice Hall. Hecht-Nielsen, R. (1988). Applications of counterpropagation networks. Neural Networks, 1, 131–139. Hirose, A., & Nagashima, T. (2003). Predictive self-organizing map for vector quantization of migratory signals and its application to mobile communications. IEEE Transactions on Neural Networks, 14(6), 1532–1540. Hofmann, T., & Buhmann, J. (1998). Competitive learning algorithms for robust vector quantization. IEEE Transactions on Signal Processing, 46(6), 1665–1675. Ibnkahla, M. (2000). Applications of neural networks to digital communications — a survey. Signal Processing, 80(7), 1185–1215. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87. Jianping, D., Sundararajan, N., & Saratchandran, P. (2002). Communication channel equalization using complex-valued minimal radial basis function neural networks. IEEE Transactions on Neural Networks, 13(3), 687–696. Johnson, C. R. (1995). Yet still more on the interaction of adaptive filtering, identification and control. IEEE Signal Processing Magazine, 12(2), 22–37. Kechriotis, G., Zervas, E., & Manolakos, E. S. (1994). Using recurrent neural networks for adaptive communication channel equalization. IEEE Transactions on Neural Networks, 5(2), 267–278. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480. Kohonen, T. K. (1997). Self-organizing maps (2nd extended ed.). SpringerVerlag: Berlin, Heidelberg. Kohonen, T. K. (1998). The self-organizing map. Neurocomputing, 21, 1–6. Kohonen, T. K., Oja, E., Simula, O., Visa, A., & Kangas, J. (1996). Engineering applications of the self-organizing map. Proceedings of the IEEE, 84(10), 1358–1384. Koskela, T., Varsta, M., Heikkonen, J., & Kaski, S. (1998). Time series prediction using recurrent SOM with local linear models. International Journal of Knowledge-based Intelligent Engineering Systems, 2(1), 60–68. Lendasse, A., Lee, J., Wertz, V., & Verleysen, M. (2002). Forecasting electricity consumption using nonlinear projection and self-organizing maps. Neurocomputing, 48, 299–311. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In L. M. L. Cam and J. Neyman (Eds.) Proceedings of the 5th Berkeley symposium on mathematical statistics and probability (vol. 1) (pp. 281–297). Martinetz, T. M., Berkovich, S. G., & Schulten, K. J. (1993). Neural-gas network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4), 558–569. Martinetz, T. M., & Schulten, K. J. (1991). A “neural-gas” network learns topologies. In T. Kohonen, K. Makisara, O. Simula, & J. Kangas (Eds.), Artificial neural networks (pp. 397–402). Amsterdam: North-Holland. Norgaard, M., Ravn, O., Poulsen, N. K., & Hansen, L. K. (2000). Neural networks for modelling and control of dynamic systems. Springer-Verlag. Oja, M., Kaski, S., & Kohonen, T. (2003). Bibliography of self-organizing map (som) papers: 1998–2001 addendum. Neural Computing Surveys, 3, 1–156.
798
G.A. Barreto, L.G.M. Souza / Neural Networks 19 (2006) 785–798
Paquier, W., & Ibnkahla, M. (1998) Self-organizing maps for rapidly fading nonlinear channel equalization. In Proceedings of the IEEE world congress on computational intelligence (pp. 865–869). Parisi, R., Claudio, E. D. D., Orlandi, G., & Rao, B. D. (1997). Fast adaptive digital equalization by recurrent neural networks. IEEE Transactions on Signals Processing, 45(11), 2731–2739. Patra, J. C., Pal, R. N., Baliarsingh, R., & Panda, G. (1999). Nonlinear channel equalization for QAM signal constellation using artificial neural networks. IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics, 29(2), 262–271. Peng, M., Nikias, C.L., & Proakis, J.G. (1991) Adaptive equalization for PAM and QAM signals with neural networks. In Proceedings of the IEEE asilomar conference on signals, systems and computers (vol. 1) (pp. 496–500). Peng, M., Nikias, C.L., & Proakis, J.G. (1992) Adaptive equalization with neural networks: New multilayer perceptron structures and their evaluation. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (vol. 2) (pp. 301–304). Pham, D.T., & Sukkar, M.F. (1995) Supervised adaptive resonance theory neural network for modelling dynamic systems. In Proceedings of the IEEE international joint conference on neural networks (pp. 2500–2505). Principe, J. C., Euliano, N. R., & Lefebvre, W. C. (2000). Neural adaptive systems: Fundamentals through simulations. New York, NY: John Wiley & Sons. Principe, J. C., Wang, L., & Motter, M. A. (1998). Local dynamic modeling with self-organizing maps and applications to nonlinear system identification and control. Proceedings of the IEEE, 86(11), 2240–2258. Proakis, J. G. (2001). Digital communications. New York: McGraw-Hill. Quereshi, S. U. H. (1985). Adaptive equalization. Proceedings of IEEE, 73, 1349–1387. Raivio, K., Henriksson, J., & Simula, O. (1998). Neural detection of QAM signal with strongly nonlinear receiver. Neurocomputing, 21(1–3), 159–171. Ritter, H., Martinetz, T., & Schulten, K. (1989). Topology-conserving maps for learning visuo-motor-coordination. Neural Networks, 2(3), 159–168. Saleh, A. (1981). Frequency-independent and frequency-dependent nonlinear models of TWT amplifiers. IEEE Transactions on Communications, 29(11), 1715–1720. Simon, G., Lendasse, A., Cottrell, M., Fort, J. -C., & Verleysen, M. (2004). Double quantization of the regressor space for long-term time series prediction: method and proof of stability. Neural Networks, 17, 1169–1181.
Sj¨oberg, J., Zhang, Q., Ljung, L., Benveniste, A., Deylon, B., et al. (1995). Nonlinear black-box modeling in system identification: A unified overview. Automatica, 31(12), 1691–1724. Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6), 568–576. Stokbro, K., Umberger, D. K., & Hertz, J. A. (1990). Exploiting neurons with localized receptive fileds to learn chaos. Complex Systems, 4, 603–622. Vesanto, J. (1997) Using the SOM and local models in time series prediction. In Proc. 1997 Workshop on the Self-Organizing Map (pp. 209–214). Villmann, T., & Bauer, H. -U. (1998). Applications of the growing selforganizing map. Neurocomputing, 21, 91–100. Walter, J., & Ritter, H. (1996). Rapid learning with parametrized selforganizing maps. Neurocomputing, 12, 131–153. Walter, J., Ritter, H., & Schulten, K. (1990) Non-linear prediction with selforganizing map. In Proceedings of the IEEE international joint conference on neural networks (vol. 1) (pp. 587–592). Wang, X., Lin, H., Lu, J., & Yahagi, T. (2001). Detection of nonlinearly distorted M-ary QAM signals using self-organizing map. IEICE Transactions on Fundamentals of Electronics, Communications, and Computer Sciences, E84-A(8), 1969–1976. Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. IRE WESCON convention record-part, 4, 96–104. Widrow, B., & Kamenetsky, M. (2003). Statistical efficiency of adaptive algorithms. Neural Networks, 16(5–6), 735–744. Widrow, B., & Lehr, M. (1990). 30 years of adaptive neural networks: Perceptron, madaline and backpropagation. Proceedings of the IEEE, 78, 1415–1442. Widrow, B., & Stearns, S. D. (1985). Adaptive signal processing. Upper Saddle River, NJ: Prentice-Hall. Yair, E., Zeger, K., & Gersho, A. (1992). Competitive learning and soft competition for vector quantizer design. IEEE Transactions on Signals Processing, 40(2), 294–309. Yamakawa, T., & Horio, K. (1999). Self-organizing relationship (SOR) network. IEICE Transactions on Fundamentals, E82-A(8), 1674–1677. Zaknich, A. (1998). Introduction to the modified probabilistic neural network for general signal processing applications. IEEE Transactions on Signal Processing, 46(7), 1980–1990. Zaknich, A. (2003). A practical sub-space adaptive filter. Neural Networks, 16(5–6), 833–839. Zerguine, A., Shafi, A., & Bettayeb, M. (2001). Multilayer perceptron-based DFE with lattice structure. IEEE Transactions on Neural Networks, 12(3), 532–545.