Effects of connectivity structure of complex echo state network on its prediction performance for nonlinear time series

Effects of connectivity structure of complex echo state network on its prediction performance for nonlinear time series

ARTICLE IN PRESS Neurocomputing 73 (2010) 2177–2185 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/loca...

711KB Sizes 0 Downloads 37 Views

ARTICLE IN PRESS Neurocomputing 73 (2010) 2177–2185

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Effects of connectivity structure of complex echo state network on its prediction performance for nonlinear time series Qingsong Song , Zuren Feng Systems Engineering Institute, Xi’an Jiaotong University, Xi’an 710049, China

a r t i c l e in fo

abstract

Article history: Received 18 July 2009 Received in revised form 26 January 2010 Accepted 30 January 2010 Communicated by W.L. Dunin-Barkowski Available online 25 February 2010

Reservoir computing methods have become popular; however, the nature of the dynamical reservoir (DR) is not thoroughly understood yet. We propose complex echo state network (CESN), the construction process of its DR is determined by five growth factors. The relationships between CESN connectivity structure and its performance are investigated when predicting nonlinear time series. We also introduce a quantifiable characteristic for the connectivity structure—the connectivity index, and a tool to measure the richness of reservoir states—the omega-complexity index. It is demonstrated from the experimental results that connectivity structure of the reservoirs has significant effect on theirs prediction performance, the omega-complexity index can be used as a performance predictor, and particular configuration of the growth factors and corresponding connectivity index can yield optimal performance. & 2010 Elsevier B.V. All rights reserved.

Keywords: Echo state network Complex network Connectivity structure Complexity Chaotic time series prediction

1. Introduction Recently reservoir computing (RC) has been proposed as a kind of artificial recurrent neural network (RNN) design and training approach [27]. RC facilitates the practical application of RNNs and outperforms classical fully trained RNNs in many tasks [11,22,23]. One of the three typical representatives of RC is echo state network (ESN) [11], the other two are liquid state machine (LSM) [17] and backpropagation-decorrelation (BPDC) [25]. The original ESN consists of two main component parts: a large-scale RNN (it is called ‘reservoir’) and a linear readout [11]. The most distinctive characteristic of the ESN is that, only the weights for the connections from the reservoir neurons to the readout neurons need to be adapted, the reservoir is randomly created and remains unchanged, however. In addition, because the echo states have enough richness, the adaptation can be treated as a simple linear regression problem. In fact, some architectures had been described earlier that fall under the same RC framework [1,4]. The troubling issues on RNN weight adaptation are almost entirely evaded, such as local minima, slow convergence, and even non-convergence [7]. The randomness of reservoir generation brings much convenience for ESN applications; on the other hand, however, it usually results in that the generated reservoir is unsatisfactory.

 Corresponding author.

E-mail address: [email protected] (Q. Song). 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.01.015

For example, random ESN realizations obtain remarkably different prediction performances for the same task, although each of them has the same connectivity density and spectral radius [20,21]. So, it becomes an urgent requirement to correctly understand the effects of reservoir characteristics on task performance, and then to design suitable reservoirs. A basic principle of system science is that topology can strongly affect performance [29]. There have been many reservoirs generated by the methods different with the original random one, such as the decoupled reservoirs [31], the critical reservoirs [6], the reservoirs resulted from the next ascent local search [2]. The corresponding ESNs lead to better performances than the original random sparse connectivity for the tasks. In this paper we study the relationships between connectivity structure and modeling capacity of the reservoirs in view of complex network theory. We adopt a cortex-like network generation method to construct reservoir, the method is originally proposed by Kaiser to simulate the cortical networks of the mammalian brain [14]. We call the ESNs with the obtained cortex-like reservoir complex echo state networks (CESNs). An improved design strategy for reservoir is discovered from our simulation experiment results. The outline of this paper is as follows. Section 2 reviews the classical ESN method and describes in detail the cortex-like reservoir construction process. CESNs with different connectivity structure are applied to predict three benchmark nonlinear time series in Section 3. In Section 4, the reservoirs’ connectivity characteristics are quantitatively evaluated. Section 5 is conclusion and discussion.

ARTICLE IN PRESS 2178

Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

2. Construction process of CESN

Step 3. Wout is computed. The Wiener–Hopf solution is one of the most common methods.

2.1. Review of the classical ESN (JESN)

Wout ¼ ððMT MÞ1 ðMT TÞÞT

ESN’s architecture is shown with Fig. 1. Since the ESNs are used as trajectory-generators in this paper, they are not equipped with any input neurons. They have N reservoir neurons and L readout neurons. The weights for reservoir connections are collected in a N  N weight matrix W. The weights for connections from the reservoir neurons to the readout neurons are given in a L  N matrix Wout . And the weights for connections projected back from the readout neurons to the reservoir neurons are given in a N  L matrix Wback . Let D and r, respectively, stand for the connectivity density and the spectral radius of reservoirs. Usually the value range of D is ½0:01; 0:2, the value of r should be smaller than unit to ensure the echo state property. Usually the four steps training-and-exploitation algorithm is used to train and exploit ESN [10]. Step 1. An untrained ESN (W, Wback) is randomly constructed, which is consistent with the assignments (D, r, L, N). Step 2. The dynamics of the untrained ESN is sampled. Given a training sequence Yd ðkÞ, k ¼ 1; . . . ; Tt , the reservoir is updated according to

where superscript T implies transposed-matrix, 1 implies inversed-matrix. Step 4. The resulting ESN (W, Wback, Wout) is ready for exploitation. Given an exploitation sequence YðkÞ, k ¼ 1; . . . ; Te , Te b T0 . The exploitation sequence and the training sequence originate from the same system, but they may have different initial values, i.e., Yð1Þ aYd ð1Þ. YðkÞ is fed into the reservoir via the feedback connections. The reservoir state is updated according to Eq. (1) while k rT0 ; however, the resulting ESN is updated as b is prediction output. follows while k 4 T0 . YðkÞ

PE

Xðk þ 1Þ ¼ f ðWXðkÞ þ Wback Yd ðkÞ þ vðkÞÞ

ð1Þ

^ þ1Þ ¼ f O ðWout Xðkþ 1ÞÞ Yðk

ð2Þ T

where XðkÞ ¼ ½x1 ðkÞ; x2 ðkÞ; . . . ; xN ðkÞ are activations of the reserPE O voir neurons at time step k, Xð1Þ ¼ 0; f and f are the hyperbolic tangent function applied component-wise; vðkÞ is uniformly distributed noise over ½a; þ a, a is a constant. Here the trick—stabilization through wobbling—is adopted, i.e., during training some noise is inserted in the reservoir states [9]. This trick is actually an empirical regularization method and usually used in the literature to improve robustness and generalization. From time step T0 þ 1 to Tt , the reservoir state XðkÞ and the O sigmoid-inverted teacher output ðf Þ1 ðYd ðkÞÞ are collected rowwise into matrix M and matrix T, respectively. In order to eliminate the effects the initial reservoir state brings on, the dynamics are not sampled before T0 .

W

ð3Þ

b Xðk þ 1Þ ¼ f ðWXðkÞ þWback YðkÞÞ

ð4Þ

O b Yðkþ 1Þ ¼ f ðWout Xðkþ 1ÞÞ

ð5Þ

PE

2.2. Construction process of the cortex-like reservoir It has been shown that LSMs with the lamina-specific cortical structure perform significantly better than the circuits without the laminar structure for wide variety tasks [5]. So we seek for cortex-like reservoir to improve the modeling capacity of the corresponding ESN. Kaiser et al. proposed a multi-cluster cortical network generation method [13,14]. The generated networks simulate certain characteristics of the mammalian cerebral cortices well. Two growth mechanisms manage the generating process. One depends on the spatial-distance between neurons. Two parameters are involved: the index of sensitivity to distance and the index of sensitivity to density. Another one is determined by specific time-windows. The windows are described by another two parameters: the number of seed-neurons and the width of time-windows. The probability that a connection between two neurons is made depends on the spatial distance between them as well as on the time windows of them. Let N, n, o, a and b stand for the size of the reservoir, number of the seed-neurons, width of the time-windows, index of sensitivity to distance and index of sensitivity to density, respectively. Since these five parameters determine the final connectivity structure of the reservoirs, they are called growth factors. The growth mechanism depending on the time-windows is implemented by [19] m

1 2l l i ðtÞ ¼ Pðt; mi ; jðmi ; oÞÞ ¼ 16 ðt ðt 1Þ2 Þ1=jðmi ;oÞ Ptime

where t ¼ j=N, j ¼ 0; . . . ; N; mi ¼ i=n þ 1, i ¼ 1; . . . ; n; l ¼ ðlogð2Þ=logðmi ÞÞ; jðmi ; oÞ is a numerically determined scaling factor used to compute the desired value of o. For simplification, assuming all seed-neurons has the same o value. The shapes of the time-windows are illustrated with Fig. 2: the upper corresponds to the case that o equals 0.2, the lower corresponds to the case that o equals 0.5. The growth mechanism depending on the spatial-distance is implemented by

Wback

Wout

Readout

Reservoir

u;v ðdðu; vÞÞ ¼ beadðu;vÞ Pdist

ð7Þ

where dðu; vÞ is the Euclidean distance between neuron u and neuron v. The probability Pðu; v; tÞ that a connection between the neuron u and the neuron v is made on time step t is computed as m

Fig. 1. System architecture of ESN.

ð6Þ

m

u;v u v ðdðu; vÞÞ  Ptime ðtÞ  Ptime ðtÞ Pðu; v; tÞ ¼ Pdist

ð8Þ

ARTICLE IN PRESS Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

2179

ω = 0.2

Ptime(t)

1

0.5

0

0

0.2

0.4

0.6

0.8

1

0.6

0.8

1

ω = 0.5

Ptime(t)

1

0.5

0

0

0.2

0.4

t Fig. 2. Shape of the time windows.

Table 1 Parameter-settings in the experiments. Parameters

MG

Lorenz

MSO

a

40 0.5 0.5 Sig Sig 0.5 rand(N, N)  0.5 0.85 1000 3000 1084 84

40 0.5 0.5 Sig Sig 0.5 rand(N, N)  0.5 0.95 1000 6000 1084 84

40 0.5 0.5 Sig Sig 0.5 rand(N, N)  0.5 0.9 1000 2000 1100 100

b

o DR neuron Output neuron const Wback

r T0 Tt Te Tp

‘Sig’ is the abbreviation of Sigmoid.

where mu and mv stand for the seed-neuron of u and v, respectively. There are no spatially geometric constraints in our applications as well as in the brain simulation [8], every neuron here is deployed in a two-dimensional plan at random. Besides that, there are some other modifications being made to accelerate the construction process. The construction process is outlined in Appendix A. The obtained neural network is expressed by its adjacency matrix Wadj . 2.3. Complex echo state network (CESN) Values of the growth factor (N, n, o, a and b) are set in advance. The construction process outlined in Appendix A is invoked to produce a Wadj . And then, W is computed with Eq. (9), where ‘U*’ is the array multiplication operator; ‘randðÞ’ is the uniformly distributed random number function in Matlab; ‘const’ is a constant real number. And further, W is scaled such that its spectral radius equals a specified value W ¼ Wadj :randðN; NÞconst

ð9Þ

The feedback connection weight matrix Wback is randomly generated (as being listed in Table 1). We now get the untrained CESN (W, Wback ). It is obvious that compared with JESN reservoir, CESN reservoir is generated in a different manner. Except Step 1, the training-and-exploitation algorithm for CESN is the same as the four steps algorithm for JESN.

3. Experiments and results To understand the effects of reservoir characteristics on performance, we firstly study the performances of the CESN with different assignments to (N, n, o, a and b) for three benchmark prediction tasks: the Mackey–Glass (MG) chaotic time series prediction task, the Lorenz chaotic time series prediction task and the classic multiple superimposed oscillator (MSO) problem. In order to conveniently elucidate the relationships between the CESN’s prediction performance and the growth factors, we fix the values of o, a and b to be 0.5, 40 and 0.5, respectively. In fact, all the five growth factors (N, n, o, a and b) are tested. However, just being similar with the conclusions made in [15], no significant relation is observed between the parameters (a and b) and the prediction performance measure from our experimental results. The free parameters are N and n. As the value of N becomes larger, ESN’s modeling capability will increase; however, the computational cost will increase also. For the sake of compromise, we set the values of N to be 100, 300 and 500. Let g stand for a one-dimensional array: g = [0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8]. Let j stand for the array multiplication result of N and g, i.e., j ¼ N:  g. For each value of N, n is assigned each element value of j successively. For example, as N= 100, j =[1,5,10,15,20,25,30,35, 40,45,50,55,60,65,70,75,80]; and then, n equals 1, 5, 10, and so on, until n equals 80. Some parameters pertaining to the CESNs and the modified training-and-exploitation algorithm are presented in Table 1. Other relating parameters are described in the text before simulation results are presented.

ARTICLE IN PRESS 2180

Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

The four steps training-and-exploitation algorithm described in Section 2.1 are used for CESNs, except that in Step 1 the CENS reservoirs are constructed in the way proposed in Section 2.3. After the training process is completed, all connection weights are fixed. During the exploitations, the normalized root mean square error at the Tpth time step (NRMSETp ) can be computed, T0 oTp r Te . ! PNr 2 1=2 b i ¼ 1 ðY i ðTp ÞYi ðTp ÞÞ NRMSETp ¼ ð10Þ Nr  s 2

The length of the training sequence is 3000, and the first 1000 steps are discarded to wash out initial transient. The DR states O XðkÞ and the sigmoid-inverted teacher outputs ðf Þ1 ðzd ðkÞÞ are sampled during the remaining 2000 steps. Noise vðkÞ is uniformly distributed over [ 1E  5, 1E  5]. The 50 independent sequences (Nr =50) are exploited on the trained CESN. The length of each exploitation sequence is 1084. The modified training-and-exploitation algorithm runs independently 20 times (Nt =20). The obtained prediction performance evaluations of the CESNs are demonstrated with Fig. 3, which consists of three subgraphs. The upper shows the relationships between d and the growth factors (N and n), here d are scaled with natural logarithm. The mid simultaneously illustrates the time-course output amplitude curves of a CESN (N= 500, n= 50, r = 0.85) and a JESN (N= 500, r =0.85) while they run freely. For more clarity, the corresponding output error amplitude curves are plotted in the lower panel. The corresponding concrete values are presented in Table 2. For this task, the CESNs (N=500, n=50) improve the prediction performance (d) of the JESNs about 12.5% ((0.0032 0.0028)/0.0032), e is improved as well. Though the performances are not comparable with the record-breaking performance [11], they are significantly superior to the results obtained by other previous methods [3].

where Ybi ðTp Þ and Yi ðTp Þ are the predicted output and the desired output at time step Tp corresponding to the ith exploitation of the CESN; i ¼ 1; 2; . . . ; Nr , i.e., there are Nr times independent exploitations for the same CESN, and each exploitation corresponds to an exploitation sequence with another initial value; s is the standard deviation of the desired outputs. The modified training-and-exploitation algorithm is repeated independently Nt times with the same assignments to (N, n, o, a and b). The mean (d) and the standard deviation (e) over all of these NRMSETp ’s are considered as the prediction performance evaluations of the CESNs. 3.1. Prediction of the MG chaotic time series

3.2. Prediction of the Lorenz chaotic time series CESNs are firstly test on the 84th pre-step prediction task of the MG time series, i.e., Tp = 84. The MG dynamic system is as follows, where z stands for state, dz 0:2zðttÞ 0:1zðtÞ ¼ dt 1 þ zðttÞ10

The Lorenz time series prediction is another benchmark problem for ESN. The Matlab ode45 solver is used to solve Eq. (12), where a = 10, b =28, c= 8/3. The step-size of the solver is set 0.02. The obtained x-coordinate time series are rescaled by a factor of 0.01, and then used for training and exploitation. The training sequence and the exploitation sequences have different values.

ð11Þ

We use t = 17. Matlab solver dde23 is used to solve Eq. (11); the solver is with its default parameter settings. The obtained zðkÞ are scaled and shifted as 0:3 tanh½zðkÞ1:0 þ0:2. The data for training (zd ðkÞ, k ¼ 1; . . . ; Tt ) and the data for exploitation (zðkÞ, k ¼ 1; . . . ; Te ) are generated in the same manner except initial values. We test 2  17 groups of CESN on the MG task. The first 17 groups correspond to the assignments that N=300, n=3, 15, 30,y, 240. The second 17 groups correspond to N=500, n=5, 25, 50,y,400.

x_ ¼ aðyxÞ y_ ¼ xz þbxy z_ ¼ xycz

ð12Þ

ln( δ )

0 −2

N = 300 N = 500

−4 −6 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Signal Amplitude

Ratio of seed−neuron number to DR size 0.4

teacher

CESN output

JESN output

0.2 0 0

10

20

30

40

50

60

70

80

90

80

90

Time steps Error Amplitude

−4

2

x 10

CESN error

JESN error

0 −2 0

10

20

30

40

50

60

Time steps Fig. 3. Prediction results for the MG task.

70

ARTICLE IN PRESS Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

2181

Table 2 Prediction performances of the CESNs and the JESNs. N

r

n

CESNs

JESNs

d7e

O

d7e

O

MG

300 500

30 50

0.85 0.85

0.00497 0.0015 0.00287 0.0013

1.18637 0.0370 1.23627 0.0338

0.0051 70.0026 0.0032 70.0016

1.16467 0.0522 1.21457 0.0392

Lorenz

300 500

30 50

0.95 0.95

0.00537 0.0022 0.00437 0.0018

1.76867 0.1109 1.95687 0.1737

0.0061 70.0033 0.0049 70.0027

1.49477 0.1128 1.8607 7 0.1648

MSO

100 300

10 30

0.9 0.9

1.7E  57 7.0E 6 1.2E  57 2.5E  6

1.29677 0.1459 1.30037 0.1358

0.0014 70.0006 0.0008 70.0002

1.0471 7 0.0159 1.0581 7 0.0107

0

ln(δ)

N = 300 N = 500 −5 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Signal Amplitude

Ratio of seed−neuron number to DR size teacher

0.2

CESN output

JESN output

0 −0.2

0

10

20

30

40

50

60

70

80

90

70

80

90

Time steps Error Amplitude

−4

x 10

CESN error

2

JESN error

0 −2 0

10

20

30

40

50

60

Time steps Fig. 4. Prediction results for the Lorenz task.

We test the same 2  17 groups of CESN on the Lorenz task, i.e., the first 17 groups correspond to the assignments that N=300, n =3, 15, 30,y,240; the second 17 groups correspond to N=500, n =5, 25, 50,y,400. The length of the training sequence is 6000, and the first 1000 steps are discarded to wash out initial transient. The DR states XðkÞ and the sigmoid-inverted teacher outputs O ðf Þ1 ðzd ðkÞÞ are sampled during the remaining 5000 steps. Noise vðkÞ is uniformly distributed over [  1E 4, 1E  4]. During the exploitations, Nr and Nt are, respectively, assigned to be 50 and 20. The results are illustrated with Fig. 4, which also consists of three subgraphs. The relationships between d and the growth factors (N and n) are described in the upper panel, here d is also scaled with natural logarithm. Besides that, a JESN (N=500, r = 0.95) and a CESN (N= 500, n = 50, r =0.95) are also verified on the same Lorenz task for comparison. The output curves and the output error curves of the CESN and the JESN are separately plotted on the mid and the lower panel. The concrete performance evaluation values are presented in Table 2. For this task, the CESNs improve the JESNs about 12% on d. As being presented in Table 2, (d 7 e) of the CESNs (N= 300, n =30) is 0.005370.0022, whereas (d 7 e) of the JESNs (N= 300) is 0.006170.0033; and (d 7 e) of the CESNs (N= 500, n =50) is

0.004370.0018, whereas (d 7 e) of the JESNs (N=500) is 0.004970.0027.

3.3. The MSO problem The MSO problem is also a benchmark task for ESN. The desired outputs are zðkÞ ¼ sinð0:2kÞ þ sinð0:311kÞ, k ¼ 1; 2; . . .. Another 2  17 groups of CESN are test on the MSO problem: the first 17 groups correspond to the assignments that N=100, n= 1, 5, 10,y,80; the second 17 groups correspond to N=300, n =3, 15, 30,y,240. The obtained zðkÞ, k ¼ 1; 2; . . . ; 2000, are rescaled by a factor of 0.01, and then used for training and exploitation. The training sequence and the exploitation sequences also have different values. And the parameters with the modified training-andexploitation algorithm are assigned Nr = 50, Nt = 20. Noise vðkÞ is uniformly distributed over [  1E 6, 1E  6]. The results are demonstrated with Fig. 5 and Table 2 in the same manner as well as in the MG task and the Lorenz task. The superiority of the CESNs over the JESNs is more significant as for this problem: the CESNs perform better than the corresponding JESNs about one order of magnitude.

ARTICLE IN PRESS 2182

Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

3.4. Results

4.1. Relationships between connectivity structure and growth factor

Figs. 3–5 and Table 2 demonstrate the significant phenomena. As is shown with all upper-subgraphs, (i) when n is fixed, the values of d decrease as N increases; (ii) there is a non-monotone relation between d and n when N is fixed; (iii) and especially, as the value of n is set 10% of the value of N, d’s arrive at their minimums.

There are three basic geometric quantities in complex network theory: average path length (l), clustering coefficient (C), and cumulative degree distribution (CDD) function ðPcum ðdÞÞ. They can effectively characterize network connectivity structure [18]. There are a total of 3  17 groups of CESN (51 groups) used in Section 3, i.e., the first correspond to N= 100, n =1, 5, 10,y,80; the second N= 300, n= 3, 15, 30,y,240; and the last N=500, n = 5, 25, 50,y,400. For each combination of (N and n), the proposed cortex-like reservoir generation method is invoked independently 20 times. So correspondingly, there are 20 Wadj ’s for each combination of

4. Analyses on connectivity structure and dynamics richness We study reservoir connectivity structure of the CESNs with different assignments to (N and n).

ln(δ)

0 −5

N = 100 N = 300

−10 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Signal Amplitude

Ratio of seed−neuron number to DR size teacher

CESN output

JESN output

2 0 −2 0

20

40

60

80

100

120

100

120

Time steps Error Amplitude

−5

2

x 10

CESN error

JESN error

0 −2

0

20

40

60

80

Time steps

Average path length

Fig. 5. Prediction results for the MSO task.

4 N = 100 N = 300 N = 500

3 2 1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.7

0.8

Clustering coefficient

Ratio of seed−neuron number to DR size 0.5 N = 100 N = 300 N = 500

0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

Ratio of seed−neuron number to DR size Fig. 6. Average path lengths and clustering coefficients of the CESNs.

ARTICLE IN PRESS Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

(N and n). l, C and Pcum ðdÞ are computed for each Wadj according to the equations described in Appendix B. The results are averaged. Fig. 6 shows the final computed relationships between (l and C) and (N and n). The upward-convex curves in the uppersubgraph sketch the relationships between l and (N, n). The maximums of l occur on the points that n/N=0.1. The downwardconvex curves in the lower-subgraph illustrate the relationship between C and (N, n). The minimums of C occur on the points that n/N= 0.1, too. Fig. 6 also demonstrates the values of l increase with the increase of N, whereas the values of C are almost irrelevant with N especially when n/Nr0.35. For the sake of conciseness, only a few calculated curves of Pcum ðdÞ are depicted with Fig. 7. The curves correspond to six specific pairs of (N, n): (500,5), (500,40), (500,50), (500,80), (500,100) and (500,300). For each pair of (N, n), there are five curves. All curves have approximate straight-line appearances on

0

Cumulative degree distributions

10

n=5 n = 40 n = 50 n = 80 n = 100 n = 300 JESN

−1

10

semi-logarithmic coordinates. Matlab polyfit solver with one order is employed to fit the discrete points ðd; lnðPcum ðdÞÞÞ, d ¼ 0; 1; . . . ; dmax , dmax stands for the maximum vertex degree. The fitting errors are tolerable. The calculated slopes (noted as ssl ’s) of the fitting straight-lines replace the CDD functions and are used as one of the parameters describing the reservoir connectivity structures. On the base of the resulting l, C and ssl , we define an integrated characteristic of reservoir connectivity structure—the connectivity index (CI). If C or ssl equals zero, the corresponding prediction results are not the best [6]; so, we exclude such cases here. CIp

l C  jssl j

ð13Þ

The curves of the relationship between CI and (N and n) have good positive correlation with the curves in the upper-subgraph of Fig. 6. As N is fixed, an upward-convex relationship exists between CI and n; whereas as n is fixed, the values of CI increase as N increases. And the maximums of CI occur on the points that n/N= 0.1. 4.2. Relationships between state richness and growth factor We measure the richness of CESN reservoir states. Since the reservoirs are cortex-like, the N-dimensional DR state vectors are intuitively considered as some N-channels EEG signals. The correlation between N-channels EEG signals is well evaluated by omega (O) complexity: a large value of O indicates slight correlation, and a small value of O indicates high correlation [24,28]. Now, we define O-complexity on reservoir states to measure the richness.

−2

10

K ¼ XðkÞXðkÞT OðkÞ ¼ e

−3

10

0

5

10

15

2183

PN i ¼ 1

ð14Þ l0i ln l0i

ð15Þ

20

Vertex degree



Fig. 7. Cumulative degree distribution curves of the CESNs.

k2 X 1 OðkÞ k2 k1 þ 1 k ¼ k

Fig. 8. Omega complexity of the CESNs.

1

ð16Þ

ARTICLE IN PRESS 2184

Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

where OðkÞ is the instantaneous complexity at time step k; P l0i ¼ li = j lj ; li , i ¼ 1; . . . ; N, is the ith eigenvalue of the diad matrix K. O averages all OðkÞ’s from k1 to k2 . For each trained CESN, O is computed while it runs freely. The averaged computed results for the three tasks are successively shown in the upper, the mid and the lower panel in Fig. 8. It can be seen that the larger size of the reservoir, O has higher value. And if N is fixed, O has a non-monotonic upward-convex relation with n. The maximums of O , which indicate the most state richness and also correspond to the best prediction performances, occur on the points that n/N=0.1.

5. Conclusion and discussion According to the obtained relations, we conclude that connectivity structure of the reservoirs has significant effect on theirs prediction performance, the omega-complexity index (O ) can be used as a performance predictor, and in particular, the performances will be optimal as the value of n equals 10% of N. This find would be considered as a prior heuristic rule when constructing CESNs for such prediction applications in future. The stability issue of the trained CESNs running in a closedloop generative mode is tough but appealing. Although the trick, ‘stabilization through wobbling’, works every time the modified training-and-exploitation algorithm is invoked, it can hardly contribute to stability of the ESNs with auxiliary feedbacks [16]. Several other similar tricks, such as the noise immunization method, the addition of noise in the frequency domain, are experimentally proved to be effective for certain tasks [12,26]. The ridge regression is also successfully used to train the readout for the stability [30]. However, there are no thoroughly theoretical explanations why those tricks can work; it remains to be further investigated.

Acknowledgments The authors would like to thank the anonymous reviewers for all theirs detailed comments and suggestions, and also thank Dr. Marcus Kaiser, Dr. Tim Waegeman for theirs valuable suggestions and discussions. This work was supported in part by the State Key Development Program for Basic Research of China (Grant no. 2007CB311006), and the National Natural Science Foundation of China (Grant nos. 60875043 and 60905044).

Appendix A Construction method of the cortex-like reservoirs (1) Initialization. Parameters (N, n, o, a, b) determining the reservoir construction process are assigned. All seed-neurons are released on time points j=ðn þ 1Þðj ¼ 1; . . . ; nÞ along a time axis with a range [0,1] in a two-dimensional plane region with X-axis range [0,1] and Y-axis range [0,1]. Their spatial coordinates (x,y) are stored in a row-growth position matrix Wpos . There are still ðm ¼ NnÞ non-seed-neurons to be released one by one with time step of length 1/m. The number of time steps is noted as x and the number of existing neurons is noted as z. Now x = 1 and z = n. (2) Randomly releasing one non-seed-neuron u and computing its time-window dependent probability. The neuron u is placed in the plane region at random. Its position is recorded in the Wpos . The distances from each seed-neuron nj to u, i.e., fdðu; nj Þg, are computed. The seed-neuron ns which corre-

sponds to the shortest distance, i.e., dðu; ns Þ ¼ minfdðu; nj Þg, is considered to be the seed-neuron for u. If there are more than one seed-neurons having the minfdðu; nj Þg, one of them is randomly chosen as the seed-neuron for u. According to the seed-neuron ns’s time-window function, the time-window ns ðx  1=mÞ, is computed. dependent probability of u, i.e., Ptime (3) Computing connection probabilities of the neuron u. The distances from all z existing neurons {v} to u, i.e., fdðu; vÞg, are calculated. Therefore the distance-dependent probabilities u;v ðdðu; vÞÞg are obtained. The time-window dependent fPdist nv ðx  1=mÞg, probabilities of those existing neurons, i.e., fPtime are computed also. Combining both probabilities, the connection probabilities fPðu; v; ðx  1=mÞÞg between each of all the z neurons and u are obtained. (4) Network development. The edges will survive if the connection probabilities are larger than some random number. So the number of established connections on the neuron u may be zero, one or many. If zero, one connection is established between the neuron u and one randomly picked up seedneurons. Now x adds one and z adds one also. (5) Achieving one complex network. Repeats from (2) to (4) until z = N. The resulted complex network is represented by its position matrix Wpos and adjacency matrix Wadj . The later is a 0  1 matrix with the size of N  N, its entity is assigned 1 if there is a connection between the corresponding two neurons, and otherwise the entity is assigned 0.

Appendix B B.1. Average path length Let dij stand for the number of edges along the shortest path between the neuron i and the neuron j within the DR, where i, j= 1,y,N. The average path length l is computed as [18] l1 ¼

X 1 d1 ij 1 NðN þ 1Þ i Z j 2

ðB:1Þ

B.2. Clustering coefficient Let EðGi Þ stand for the number of edges actually existing   ki between the neuron i’s neighbors. Note as the number of 2 edges which can possibly exist between the neuron i’s neighbors. The neuron i’s clustering coefficient Ci and the whole DR’s clustering coefficient C are computed as Eqs. (B.2) and (B.3), respectively [18]. jEðG Þj Ci ¼  i ki

ðB:2Þ

2 C¼

1X C N i i

ðB:3Þ

B.3. Degree distribution The number of connected edges on a neuron is defined as its degree, which is noted as d. The probability that a randomly picked up neuron has exactly the degree d is defined as the degree distribution P(d). The cumulative degree distribution Pcum ðdÞ achieves as X Pðd0 Þ ðB:4Þ Pcum ðdÞ ¼ d0 Z d

ARTICLE IN PRESS Q. Song, Z. Feng / Neurocomputing 73 (2010) 2177–2185

References [1] D.V. Buonomano, M.M. Merzenich, Temporal information transformed into a spatial code by a neural network with realistic properties, Science 267 (1995) 1028–1030. [2] K. Bush, B. Tsendjav, Improving the richness of echo state features using next ascent local search, in: Proceedings of Artificial Neural Networks in Engineering Conference, 2005, pp. 227–232. [3] Y. Chen, B. Yang, J. Dong, A. Abraham, Time-series forecasting using flexible neural tree model, Information Sciences 174 (2005) 219–235. [4] P.F. Dominey, Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning, Biological Cybernetics 73 (1995) 265–274. [5] S. Haeusler, W. Maass, A statistical analysis of information-processing properties of lamina-specific cortical microcircuit models, Cerebral Cortex 17 (2007) 149–162. [6] M.A. Hajnal, A. Lorincz, Critical echo state networks, in: Proceedings of ICANN 2006, Part I, Lecture Notes in Computer Science, vol. 4131, 2006, pp. 658–667. [7] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice-Hall, New Jersey, 1998. [8] C.C. Hilgetag, M. Kaiser, Clustered organization of cortical connectivity, Neuroinformatics 2 (2004) 353–360. [9] H. Jaeger, The echo state approach to analyzing and training recurrent neural networks, Technical Report No. 148, German National Research Center for Information Technology, Bremen, 2001. [10] H. Jaeger, Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the ‘‘echo state network’’ approach, Technical Report No. 159, German National Research Center for Information Technology, Bremen, 2002. [11] H. Jaeger, H. Haass, Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication, Science 304 (2004) 78–80. [12] H. Jaeger, M. Lukoˇsevicˇius, D. Popovici, U. Siewert, Optimization and applications of echo state networks with leaky-integrator neurons, Neural Networks 20 (2007) 335–352. [13] M. Kaiser, C.C. Hilgetag, Modeling the development of cortical systems networks, Neurocomputing 58–60 (2004) 297–302. [14] M. Kaiser, C.C. Hilgetag, Development of multi-cluster cortical networks by time windows for spatial growth, Neurocomputing 70 (2007) 1829–1832. [15] B. Liebald, Exploration of effects of different network topologies on the ESN signal crosscorrelation matrix spectrum, B.Sc. Thesis, School of Engineering and Science, International University Bremen, 2004. [16] M. Lukoˇsevicˇius, Echo state networks with trained feedbacks, Technical Report No. 4, School of Engineering and Science, Jacobs University Bremen, 2007. [17] W. Maass, T. Natschlager, H. Markram, Real-time computing without stable states: a new framework for neural computation based on perturbation, Neural Computation 14 (2002) 2531–2560. [18] M.E.J. Newman, The structure and function of complex networks, SIAM Review 45 (2003) 167–256. [19] F. Nisbach, M. Kaiser, Developmental time windows for spatial growth generate multiple-cluster small-world networks, The European Physical Journal B 58 (2007) 185–191. [20] M.C. Ozturk, D. Xu, J.C. Principe, Analysis and design of echo state networks, Neural Computation 19 (2007) 111–138. [21] D. Prokhorov, Echo state networks: appeal and challenges, in: Proceedings of International Joint Conference on Neural Networks, vol. 2, 2005, pp. 1463–1466. [22] M. Salmen, P.G. Ploger, Echo state networks used for motor control, in: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005, pp. 1953–1958. [23] M.D. Skowronski, J.G. Harris, Noise-robust automatic speech recognition using a predictive echo state network, IEEE Transactions on Audio, Speech, and Language Processing 15 (2007) 1724–1730.

2185

[24] J.A. Stancak, J. Wackermann, Spatial EEG synchronization over sensorimotor hand areas in brisk and slow self-paced index finger movements, Brain Topography 11 (1998) 23–30. [25] J.J. Steil, Online stability of backpropagation–decorrelation recurrent learning, Neurocomputing 69 (2006) 642–650. [26] J.J. Steil, Several ways to solve the MSO problems, in: Proceedings—European Symposium on Artificial Neural Networks, 2007, pp. 489–494. [27] D. Verstraeten, B. Schrauwen, M.D. Haene, D. Stroobandt, An experimental unification of reservoir computing methods, Neural Networks 20 (2007) 391–403. [28] J. Wackermann, Towards a quantitative characterization of functional states of the brain: from the non-linear methodology to the global linear description, International Journal of Psychophysiology 34 (1999) 65–80. [29] N. Wiener, Cybernetics, The MIT Press, 1965. [30] F. Wyffels, B. Schrauwen, D. Stroobandt, Stable output feedback in reservoir computing using ridge regression, in: Proceedings of the 18th ICANN, Part I, Lecture Notes in Computer Science, vol. 5163, 2008, pp. 808–817. [31] Y. Xue, L. Yang, S. Haykin, Decoupled echo state networks with lateral inhibition, Neural Networks 20 (2007) 365–376.

Qingsong Song received the M.S. degree in System Engineering in Systems Engineering Institute, Xi’an Jiaotong University, Xi’an, China, in March 2005. He is currently working toward the Ph.D. degree in System Engineering in Systems Engineering Institute, Xi’an Jiaotong University. His current research interests are in the areas of complex network theory and nonlinear time series predictions.

Zuren Feng received his M. Eng. and Ph.D. degrees in Information and Control Engineering from Xi’an Jiaotong University, Xi’an, China, in 1982 and 1988, respectively. He was appointed as a Lecturer at Xi’an Jiaotong University in 1982 and as an Associate Professor in 1990. Since 1994, he has been a Professor in Systems Engineering Institute, Xi’an Jiaotong University. In 1992, he worked as a visiting scholar in INRIA, France, for research on manipulator control with flexible joints and applications of Petri Nets in DEDS. In 1994, he was invited by Kassel University, Germany, for research on mobile service robots. Professor Feng has been Deputy Dean of the School of Electronics and Information Engineering at Xi’an Jiaotong University since 2001. From 1998 to 2001, he was the Deputy Dean of the Academy of Engineering and Science, and also the Head of the Systems Engineering Institute at Xi’an Jiaotong University. He is now a member of the Committee of Deep Space Exploration Technology, Chinese Society of Astronautics, a member of Council of Shaanxi Society of Automation, China, and a member of Academic Council of Chinese Key Labs in Manufacturing. He was the winners of National Outstanding Achievement Award for Ph.D Scholars in 1991. His research interests include robotics and automation; multi-agent systems; intelligent optimization, adaptive control and vision based robot navigation. He has been involved in many academic and practical projects and published near 100 papers on relevant topics.