Mathematics and Computers in Simulation 57 (2001) 355–365
Complex functional networks Chunguang Li a,∗ , Xiaofeng Liao a,b , Zhongfu Wu b , Juebang Yu a a
Department of Opto-Electronics Technology, University of Electronics Science and Technology of China, Chengdu, Sichuan 610054, PR China b Institute of Computer Science, Chongqing University, Chongqing 400044, PR China Received 1 January 2001; accepted 9 February 2001
Abstract Functional networks are a recently introduced extension of neural networks, which deal with general functional models instead of sigmoidal-like ones. In this paper, we propose complex functional networks, whose inputs, outputs, neural functions and arguments are all complex-valued. The general learning algorithm for this kind of complex functional networks is derived. And the performance of the proposed complex functional networks is demonstrated with application in the identification of complex-valued communication channels. © 2001 IMACS. Published by Elsevier Science B.V. All rights reserved. Keywords: Communication channel; Functional networks; Real time recurrent learning
1. Introduction Castillo [1] has introduced functional networks as a powerful generalization of artificial neural networks. Unlike neural networks, in these networks there are no weights associated with the links connecting neurons, and the neural functions are unknown from given families to be estimated during the learning process. We can select appropriate families for each specific problem (such as polynomials, Fourier expansions and trigonometric functions). The functional networks have been successfully applied in many experiments. In [2], the authors use functional networks for nonlinear time series modeling and prediction, and extracting information masked by chaos, in [3], the authors use functional networks to approximate solutions of differential, functional and difference equations and to obtain the differential equation associated with a set of data. The functional networks have shown excellent performance in the above-mentioned experiments. But such works mentioned above all based on real-valued functional networks, in many applications, however, the inputs and outputs of a system are best described as complex-valued signals. In such case, one needs a complex learning algorithm to train complex networks. ∗ Corresponding author. E-mail address:
[email protected] (C. Li).
0378-4754/01/$ – see front matter © 2001 IMACS. Published by Elsevier Science B.V. All rights reserved. PII: S 0 3 7 8 - 4 7 5 4 ( 0 1 ) 0 0 2 9 6 - 8
356
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
Recently, results have appeared in the literature that generalize the well-known back-propagation (BP) algorithm for training a feed-forward neural network with complex weights [4–6], the complex BP algorithm has been shown to be a straight forward extension of the real-value one. In [7], the authors extended the real time recurrent learning (RTRL) algorithm to the complex RTRL (CRTRL) and applying it to complex communication channel equalization. Also, the complex-valued RBF neural networks were proposed in [8,9]. As a natural extension of neural networks, the functional networks should have their own complex form. In this paper, we extend the functional networks to complex form and derive the general complex learning algorithm. In Section 2, we briefly introduce the functional networks and separable functional networks, and we propose the complex functional networks and derive the general complex learning algorithm in Section 3, in Section 4, we evaluate the performance of the proposed complex functional networks by applying it to the identification of complex communication channels, and finally in Section 5, we summarize our conclusions.
2. Functional networks Functional networks have been recently introduced as a powerful generalization of neural networks [1–3]. Fig. 1 shows a typical architecture of a functional network illustrating its main components. Besides of the input and output layers (sets {x1 , x2 , x3 } and {x6 }, respectively), a functional network consists of one or several layers of intermediate storing units (in Fig. 1 the only intermediate layer is {x4 , x5 }), which store information produced by neuron units, and one or several layers of neuron units (the layers {f1 , f2 } and {f3 }). A neuron unit evaluates a set of input values, coming from the previous layer and delivers a set of output values to the next layer. There is a set of directed links in functional networks, which connect the input layer to the first layer of neurons, neurons of one layer to the next layer, and the last layer of neurons to the output units. To the end, each neuron has associated a neural function, which can be multivariate and can have as many arguments as input. An interesting family of functional network architectures with many applications is the so-called separable functional networks (see Fig. 2), which has associated a functional expression, which combines the separate effects of input variables. For the case of two inputs, x and y, and one output, z, we have z = F (x, y) =
n fi (x)gi (y)
(1)
i=1
We will discuss the complex form of the separable functional networks in next section, and the complex form of other functional networks models can be obtained through similar analysis.
Fig. 1. A functional network.
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
357
Fig. 2. Separable functional network.
An important problem associated with functional networks is that of analyzing the uniqueness of representation, i.e. obtaining the more general sets of functions satisfying the functional constraints imposed by the network topology. We shall analyze these problems using the complex separable functional network architecture in next section. 3. Complex functional networks In this section, we consider the complex form of the separable functional networks as shown in Fig. 2. The input x, y, output z and the neural functions fi , gi are all complex. √ We use subscript R and I denote the real and imaginary part, 1 and j (j = −1) denote the real and imaginary vector, respectively. In complex functional networks, one of the main problems is the selection of the complex domain neural functions, one possible choice of the neural functions consists of the superimposition of a real and imaginary neural function. f (x) = fR (xR ) + jfI (xI )
(2)
where functions fR (·) and fI (·) are real-valued functions. 3.1. Uniqueness of representation Lemma 1. All solutions of equation ni=1 fi (x)gi (y) = 0, (f T (x)g(y = 0 in matrix form) can be written in the form f (x) = Aφ(x), g(y) = Bψ(y), where A and B are constant matrices (of dimensions n × r and n × n − r, respectively) with AT B = 0, and φ(x) = (φ1 (x), φ2 (x), . . . , φr (x)), ψ(y) = (ψr+1 (y), . . . , ψn (y)) are two arbitrary systems of mutually linearly independent functions, and r is an integer between 0 and n (first theorem of [2]). Theorem 1. All solutions of equation ni=1 fi (x)gi (y) = 0, where fi and gi are all complex functions (f T (x)g(y) = 0 in matrix form), can be written in the form f (x) = Aφ(x), g(y) = Bψ(y), where A and B are constant complex matrices (of dimensions n × r and n × n − r, respectively) with AT B = 0, and φ(x) = (φ1 (x), φ2 (x), . . . , φr (x)), ψ(y) = (ψr+1 (y), . . . , ψn (y)) are two arbitrary systems of mutually linearly independent complex functions, and r is an integer between 0 and n. Proof. Assume that r is the linearly independent functions number of f(x), and we can know that the linearly independent functions number of g(y) is less or equal than n − r (otherwise, f T (x)g(y) = 0).
358
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
It is easy to know that all solutions of equation ni=1 fi (x)gi (y) = 0, can be written in the form f (x) = Aφ(x), g(y) = Bψ(y), where φ(x) = (φ1 (x), φ2 (x), . . . , φr (x)), ψ(y) = (ψr+1 (y), . . . , ψn (y)) are two arbitrary systems of mutually linearly independent complex functions. We will prove AT B = 0. n
fi (x)gi (x) = f T (x)g(y) = (fRT (xR ) + jfTI (xI ))(gR (yR ) + jgI (yI ))
i=1
= (fRT (xR )gR (yR ) − fIT (xI )gI (yI )) + j (fRT (xR )gI (yI ) + fIT (xI )gR (yR )) = 0 Eq. (3) can be rewritten into the following two equations. gR (yR ) (fRT (xR )fIT (xI )) =0 −gI (yI ) gI (yI ) T T =0 (fR (xR )fI (xI )) gR (yR )
(3)
(4) (5)
and f (x) = fR (x) + jfI (x) = (AR + jAI )(φR (xR ) + j φI (xI )) = (AR φR (xR ) − AI φI (xI )) + j (AR φI (xI ) + AI φR (xR )) so
fR (xR ) fI (xI )
=
AR AI
Similarly BR gR (yR ) = −g1 (yI ) −BI
−AI AR
−BI −BR
φR (xR ) φI (xI )
(6)
φR (yR ) −φ1 (yI )
(7)
By Lemma 1 and Eq. (2), we can obtain T BR −BI AR ATI =0 −BI −BR −ATI ATR
(8)
(9)
so ATR BR − ATI BI = 0,
ATR BI + ATI BR = 0
(10)
and so AT B = (ATR + jATI )(BR + jBI ) = (ATR BR − ATI BI ) + j (ATR BI + ATI BR ) = 0
(11)
3.2. Learning The problem of learning the functional network is to estimating the neuron functions fi , gi from the available data. We can approximate the functions fi and gi in Eq. (1) by considering a linear combination of known functions from a given family.
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
fi (x) = fiR (xR ) + jfiI (xI ) =
359
m1i aij φij (x) = aiT φi (x) = (aiR + jaiI )T (φiR (xR ) + j φiI (xI )) i=1
T T φiR (xR ) − aiIT φiI (xI )) + j (aiR φiI (xI ) + aiIT φiR (xR )) = (aiR
(12)
m2i gi (y) = giR (yR ) + jgiI (yI ) = bij ϕij (y) = biT ϕi (y) = (biR + jbiI )T (ϕiR (yR ) + j ϕiI (yI )) T = (biR ϕiR (yR )
−
k=1 T biI ϕiI (yI ))
T + j (biR ϕiI (yI ) + biIT ϕiR (yR ))
(13)
where {φij (x), ϕik (y), i = 1, 2, . . . , n, j = 1, 2, . . . , m1i , k = 1, 2, . . . , m2i } are given sets of linearly independent functions capable of approximating fi and gi to the desire accuracy, and φi (x) = (φi1 (x), φi2 (x), . . . , φim1i (x))T , ϕi (y) = (ϕi1 (y), ϕi2 (y), . . . , ϕim2i (y))T , the coefficients aij , bik are the parameters of the functional network, they are all complex-valued. Then the error can be measured by eP = ePR + jePI = zP −
n fi (xP )gi (xP ) = (zPR + jzPI ) i=1
−[(fRT (xPR )gR (yPR ) − fIT (xPI )gI (yPI )) + j (fRT (xPR )gI (yPI ) + fIT (xPI )gR (yPR ))] = [zPR − (fRT (xPR )gR (yPR ) − fIT (xPI )gI (yPI ))] + j [zPI − (fRT (xPR )gI (yPI ) + fIT (xPI )gR (yPR ))] (14) where P is the current training pattern. Thus, to find the optimum coefficients, we minimize the sum of square errors E0 =
P P P 2 2 |ep |2 = ep ep∗ = (epR + epI ) p=1
p=1
(15)
p=1
where the superscript ∗ denotes complex conjugate. In order to have a unique representation of the functional networks, some initial conditions have to be given. In this case, we consider the initial functional conditions. fi (xi0 ) = ui0 ,
gi (yi0 ) = vi0
(16)
where ui 0 and vi0 are given constants. Sometimes, it is not necessary to give all 2n initial conditions, but in order to derive the general learning algorithm, we assume that all these initial conditions are give. By add penalty terms to E0 in Eq. (15), we define the energy function as E = E 0 + c1 = E0 + c1
n n |fi (xi0 ) − ui0 |2 + c2 |gi (yi0 ) − vi0 |2 i=1 n
i=1
[(fiR (xi0R ) − ui0R )2 + (fiI (xi0I ) − ui0I )2 ]
i=1 n +c2 [(giR (yi0R ) − vi0R )2 + (giI (yi0I ) − vi0I )2 ] i=1
(17)
360
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
we define the gradient vector ∇ai E as the derivative of the energy function E with respect to the real and imaginary parts of ai as shown by ∇ai E =
∂E ∂E +j ∂aiR ∂aiI
(18)
and we have P ∂E = 2 [epR (−giR (ypR )φiR (xpR ) + giI (ypI )φiI (xpI )) + epI (−giR (ypR )φiR (xpR ) ∂aiR p=1
−giI (ypI )φiI (xpI ))] + 2c1 [(fiR (xi0R ) − ui0R )φiR (xi0R )) + (fiI (xi0I ) − ui0I )φiI (xi0I )]
(19)
and P ∂E = 2 [epR (giR (ypR )φiI (xpI )+giI (ypI )φiR (xpR ))+epI (giR (ypI )φiI (xpI ) − giR (ypR )φiR (xpR ))] ∂aiI p=1
+2c1 [(fiR (xi0R ) − ui0R )(−φiI (xi0I )) + (fiI (xi0I ) − ui0I )φiR (xi0R )]
(20)
so ∇ai E =
P ∂E ∂E +j = 2 epR [(−giR (ypR )φiR (xpR ) + giI (ypI )φiI (xpI )) + j (giR (ypR )φiI (xpI ) ∂aiR ∂aiR p=1
+giI (ypI )φiR (xpR ))] + 2
P epI [(−giI (ypI )φiR (xpR ) − giR (ypR )φiI (xpI )) p=1
+j (giI (ypI )φiI (xpI ) − giR (ypR )φiR (xpR ))] + 2c1 [(fiR (xi0R ) − ui0R )(φiR (xi0R ) −j φiI (xi0I )) + (fiI (xi0I ) − ui0I )(φiI (xi0I ) − j φiR (xi0R ))]
(21)
Eq. (21) equal to ∇ai E = −2
P ep (gi (yp )φi (xp ))∗ + 2c1 (fi (xi0 ) − ui0 )φi (xi0 )∗
(22)
p=1
Similarly, we have P ∇bi E = −2 ep (fi (xp )ϕi (yp ))∗ + 2c2 (gi (yi0 ) − vi0 )ϕi (yi0 )∗
(23)
p=1
and we can adjust the parameters adaptively ai (n + 1) = ai (n) + 21 µ1 (−∇ai E)
(24)
bi (n + 1) = bi (n) + 21 µ2 (−∇bi E)
(25)
where 0 < µ1 , µ2 < 1 is the learning rates.
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
361
4. Examples of application The approximation capabilities of the complex functional networks and the efficiency of the complex learning algorithms are illustrated using examples of modeling complex communication channels. Firstly, a simple linear communication channel is considered, whose transfer function was given by [9] H (z) = (1.0119 − j 0.7589) + (−0.3796 + j 0.5059)z−1
(26)
H(z) has a zero z0 = 0.4801 − j 0.1399 in the Z-plane. The transmitted digital signals x(t) = x R (t) + jxI (t) are 4-QAM symbols, that is, xR (t) and xI (t) can only take values from the symbol set {±1}. The channel output was corrupted by an additive complex noise and the SNR is 15 dB. We use the simplest separable functional network architecture to approximate this transfer function. The functional network is shown in Fig. 3, and the function is F (x, y) = f (x) + g(y)
(27)
Note that this functional corresponding to Eq. (1) with n = 2, and f2 = g1 = 1. In this case, the uniqueness of representation problem reduces to find the relationships among the functions of two different representation of Eq. (27), say F (x, y) = f (x) + g(y) = f (x) + g (y)
(28)
Eq. (28) equal to (f (x) − f (x)) + (g(y) − g (y)) = 0 Theorem 1 gives the general solution of this problem, we have c1 1 1 f (x) − f (x) = (1) (1), = 1 1 g(y) − g (y) c2 where (c1 1)
1 c2
(29)
(30)
= 0 ⇔ c1 = −c2 = c
(31)
From Eqs. (30) and (31) we get the constraints f (x) = f (x) − c,
g (y) = g(y) + c
(32)
where c is an arbitrary complex constant. We choose a polynomial functional family, say φ R (q) = φ I (q) = {1, q} (q is real-valued) for all neuron functions and use the learning algorithm considering the functional initial condition
Fig. 3. Functional network architecture for example 1.
362
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
Fig. 4. Learning curve for example 1.
Fig. 5. State constellation for example 1: O, channel; +, functional network.
f (0.5 + 0.5j ) = 1 + j . After learning, we get the model f (x(n)) = (0.4921 + 0.3841j ) + (1.0190 − 0.7566j )x(n), g(x(n − 1)) = (−0.4939 − 0.3819j ) + (−0.3782 + 0.5033j ) x(n − 1), and F(x(n), x(n − 1)) = (−0.0018 + 0.0022j ) + (1.0190 − 0.7566j ), x(n) + (−0.3782 + 0.5033j ) x(n − 1). The learning curve is shown in Fig. 4, and Fig. 5 compare the constellation of the channel states with those of the model states produced by the complex functional network (the test data consists 100 points). Next, we consider a nonlinear case, the transmitted signals are first passed through a linear filter with transfer function H(z) as defined in Eq. (26), and then passed through a nonlinear element defined by 2u(t) π |u(t)2 | v(t) = exp j (33) 1 + |u(t)|2 3 1 + |u(t)|2 The schematic of this nonlinear channel was shown in Fig. 6. We use a functional network architecture as shown in Fig. 7, this is also a functional network belong to family (1), and z = F (x, y) = f1 (x)g1 (y) + f2 (x)g2 (y)
Fig. 6. A nonlinear channel model.
(34)
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
363
Fig. 7. Functional network architecture for example 2.
In this case, the uniqueness of the representation problem reduces to find the relationships among the functions of two different representation of Eq. (34), say F (x, y) = f1 (x)g1 (y) + f2 (x)g2 (y) = f1 (x)g1 (y) + f2 (x)g2 (y)
(35)
Eq. (35) equal to f1 (x)g1 (y) + f2 (x)g2 (y) − f1 (x)g1 (y) − f2 (x)g2 (y) = 0 Theorem 1 gives the general solution of this problem, we have 1 0 c5 g1 (y) f (x)1 f (x) 0 1 f (x) g (y) c 1 7 2 2 = = f1 (x) c1 c2 f2 (x) −g1 (y) −1 −g1 (y) 0 c3 c4 f2 (x) where
1 0
0 1
c1 c2
c5 c c3 7 c4 −1 0
c6 c8 = 0 ⇔ c5 = c1 , 0 −1
(36) c6 c8 g1 (y) 0 g2 (y) −1
c7 = c2 ,
c6 = c3 ,
c8 = c4
(37)
(38)
Then, we can obtain the relationships among the functions of different representations from Eqs. (37) and (38) f1 (x) = c1 f1 (x) + c2 f2 (x), c4 g1 (x) − c3 g2 (x) g1 (y) = , c1 c4 − c2 c3
f2 (x) = c3 f1 (x) + c4 f2 (x), c2 g1 (x) − c1 g2 (x) g2 (y) = c2 c3 − c1 c4
(39)
where ci (i = 1, 2, 3, 4) are arbitrary complex constants. We choose the functional families as φR (q) = φI (q) = {1, q, q 2 } (q is real-valued) for fi neuron functions and ψ R (q) = {1, cos(q), cos(2q), . . . , cos(6q), ψI (q) = {1, sin(q), sin(2q), . . . , sin(6q)}. And the simulation conditions are same as in the first example. The learning curve is shown in Fig. 8, and Fig. 9 compare the constellation of the channel states with those of the model states produced by the complex functional network (the test data consists 100 points). From the above examples, we can see that the complex functional networks have fast learning speed and powerful approximation ability. We also have done the same examples with some other neural networks
364
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
Fig. 8. Learning curve for example 1.
Fig. 9. State constellation for example 1: O, channel; +, functional network.
methods. The simulation results demonstrate that the complex functional networks is better than some other general neural networks methods both in learning speed and performance. Limited to the length of this paper, we do not give the simulation results. 5. Conclusions In this paper, we proposes complex functional networks, whose inputs, outputs, neural functions and arguments are all complex. The general learning algorithm for this kind of complex functional networks was derived. And the performance of the proposed complex functional networks is demonstrated with application in the identification of complex-valued communication channels, the simulation results demonstrated that the complex functional networks have fast learning speed and powerful approximation ability. In signal processing and communications were the inputs, outputs and transfer functions of a system are modeled in the complex domain, the proposed complex functional networks provide a useful tool for such cases. References [1] E. Castello, Functional networks, Neural Proc. Lett. 7 (1998) 151–159. [2] E. Castello, J.M. Gutierrez, Nonlinear time series modeling and prediction using functional networks. Extracting information masked by chaos, Phys. Lett. A 244 (1998) 71–84.
C. Li et al. / Mathematics and Computers in Simulation 57 (2001) 355–365
365
[3] E. Castello, A. Cobo, J.M. Gutierrez, E. Pruneda, Working with differential, functional and difference equations using functional networks, Appl. Math. Model 23 (1999) 89–107. [4] N. Benvenuto, F. Piazza, On the complex back propagation algorithm, IEEE Trans. Signal Proc. 40 (1992) 967–969. [5] G.R. Little, S.C. Gustafson, R.A. Senn, Generalization of the back propagation neural network learning algorithms to permit complex weights, Appl. Optics 29 (11) (1990) 1591–1592. [6] H. Leung, S. Haykim, The complex back propagation algorithm, IEEE Trans. Signal Proc. 39 (1991) 2101–2104. [7] G. kechriotis, E.S. Manolakos, Training fully recurrent neural networks with complex weights, IEEE Trans. CAS-II 41 (3) (1994) 235–238. [8] S. Chen, S. Mclaughlim, B. Mulgrew, Complex-valued radial based function network. Part I. Network architecture and learning algorithms, Signal Proc. 35 (1994) 19–31. [9] S. Chen, S. Mclaughlim, B. Mulgrew, Complex-valued radial based function network. Part II. Application to digital communications channel equalization, Signal Proc. 36 (1994) 175–188.