Neurocomputing 20 (1998) 253—263
Time-delay recurrent neural network for temporal correlations and prediction Sung-Suk Kim* Department of Computer Science and Statistics, Yongin University, Yongin, South Korea Received 18 February 1997; accepted 23 March 1998
Abstract This paper presents a time-delay recurrent neural network (TDRNN) for temporal correlations and prediction. The TDRNN employs adaptive time delays and recurrences where the adaptive time delays make the network choose the optimal values of time delays for the temporal location of the important information in the input sequence and the recurrences enable the network to encode and integrate temporal context information of sequences. The TDRNN and multiple recurrent neural network(MRNN) described in this paper, adaptive time-delay neural network (ATNN) proposed by Lin, and time-delay neural network (TDNN) introduced by Waibel were simulated and applied to the chaotic time series prediction of Mackey—Glass delay-differential equation and the Korean stock market index prediction. The simulation results suggest that employing time delayed recurrences in the layered network is more effective for temporal correlations and prediction than putting multiple time delays into the neurons or their connections. The best performance is attained by the TDRNN. The TDRNN will be well applicable for temporal signal recognition, prediction and identification. ( 1998 Elsevier Science B.V. All rights reserved. Keywords: Adaptive time-delay; Recurrence; Prediction; Temporal correlation
1. Introduction Learning and recognition of temporal correlations are crucial in hearing, vision, motor control, and time series prediction. The key issue in learning temporal sequences is that there needs to be some means of recognizing and storing temporal nature of sequences. There are multiple methods representing temporal information in * E-mail:
[email protected] 0925-2312/98/$ — see front matter ( 1998 Elsevier Science B.V. All rights reserved. PII S 0 9 2 5 - 2 3 1 2 ( 9 8 ) 0 0 0 1 8 - 6
254
S.-S. Kim/Neurocomputing 20 (1998) 253–263
the neural network [1,2,9,11,13]. These include: (1) creating a spatial representation of temporal pattern, (2) putting time delays into the neurons or their connections, (3) employing recurrent connections, (4) using neurons with activations summing inputs over time, and (5) using combination of the above. The simplest and the most common is to spatially encode the temporal information, as can be done by using Fourier transform preprocessing. Time-delay networks have shown great promise in such applications as speech recognition and are highly competitive with existing technologies. However, both spatial coding and time delay approaches have some inherent problems with scaling, in terms of how much (or how long) of temporal contexts can be encoded. Recurrent neural network (RNN) is useful and interesting possibility, and has been applied to speech recognition and robot control tasks. The RNN gives a method of incorporating long-term context, but does not choose the optimal values for the temporal location of the important information in the input sequence. Thus, the system may have poor performance due to mis-catch of temporal location and require a great deal of training time for convergence. The time-delay neural network (TDNN) proposed by Waibel has been successfully applied to phoneme recognition and trajectory recognition [4,10]. However, a limitation of the TDNN is its inability to learn or adapt the values of the time delays. Time delays are fixed initially and remain the same throughout training. As a result, the system may have poor performance due to the inflexibility of time delays and a mis-match between the choice of time-delay values and the temporal location of the important information in the input sequence. To overcome this limitation, Lin proposed the adaptive time-delay neural network (ATNN) which adapts time delays as well as synaptic weights during training, to better accommodate to changing temporal patterns and to provide more flexibility in system architecture [5]. The time-delay networks such as ATNN and TDNN cannot integrate temporal context information explicitly and thus cannot recognize temporal patterns which have arbitrary interval times and arbitrary lengths of temporal context. In this paper, we present a time-delay recurrent neural network (TDRNN) for learning temporal correlations of sequences and prediction, which meets the issues mentioned above. The paper is organized in four sections. In Section 2, we begin by introducing the architecture and learning algorithm of the TDRNN. In Section 3, the prediction performance of the TDRNN is evaluated through the chaotic time series prediction of Mackey—Glass delay-differential equation and the stock market index prediction, and the performance comparison between TDRNN and other architectures are made. Finally, conclusions are presented in Section 4.
2. Time-Delay Recurrent Neural Network (TDRNN) The TDRNN is inspired from neurobiology: time delays do occur along axons due to different conduction times and different lengths of axon fibers [1], and within areas of the cortex there is a great deal of feedback between the cortical strata [8], and additionally temporal properties such as temporal decay and integration occur at synapses [5]. The TDRNN makes use of adaptive time delay and feedback.
S.-S. Kim/Neurocomputing 20 (1998) 253–263
255
The architecture of the TDRNN is shown in Fig. 1. The network employs adaptive time delays and recurrent connections for processing temporal information of input sequences. The network mainly consists of three-layer perceptron and also has an internal state layer. The activations of the second hidden units at time t-1 are copied into the internal state units which encode and integrate temporal context information of the input sequence and act as the additional inputs at time t. The TDRNN has modifiable weights and modifiable time delays along interconnections between the input unit and the first hidden unit and both time delays and weights are adjusted. The adaptive time delays make the TDRNN choose the optimal values of time delays for the temporal location of the important information in the input sequence. However, the interconnections between the first hidden unit and the second hidden unit and from the second hidden unit to the output unit have only modifiable weights, and the interconnections from the internal state unit to the first hidden unit include both modifiable weights and fixed time delays. The feedback connections through the copy are not subject to training. The configuration of n interconnections, each with its own delay, from the input unit to the first hidden unit and between the internal state unit and the first hidden unit is called a Delay box which is depicted in Fig. 2. Node i of layer h-1 is connected to node j of the next layer h, with the connection line having an independent time-delay q and synaptic weights w . jik,h~1 jik,h~1
Fig. 1. The architecture of the TDRNN (¸ denotes the layer h#i). h`*
256
S.-S. Kim/Neurocomputing 20 (1998) 253–263
Fig. 2. The Delay box (N denotes the set of nodes of layer h!1) [5]. h~1
Each node sums up the net inputs from the activation values of the previous neurons, through the corresponding time delays on each connection line, i.e., at time t unit j on the layer h receives a weighted sum [5]: n Kji,h~1 net (t )" + + u )a (t !q ), (1) j,h n jik,h~1 i,h~1 n jik,h~1 i|Nh~1 k/1 where a (t !q ) is the activation level of unit i on the layer h!1 at time i,h~1 n jik,h~1 t !q ,N denotes the set of nodes of layer h!1 and K represents the n jik,h~1 h~1 ji,h~1 total number of connections to node j of layer h from node i of layer h!1. Then the output of node j is governed by a non-decreasing differential function f of the net input (sigmoidal function is selected in this paper)
G
f (net (t )) j,h n a (t )" j,h j,h n a (t ) j,0 n where
if h52, if h"1,
b j,h !c f (net)" j,h j,h 1#e~aj,h >/%5
(2)
(3)
and a (t ) denotes the jth channel of the input signal at time t and a , b and c j,0 n n j,h j,h j,h are real numbers. !c and b !c are the upper bound and lower bound of j,h j,h j,h sigmoidal function, respectively, and the steepness of f (net), e.g., f @ (0) is (a ) b )/4 j,h j,h j,h j,h [5]. The internal state vector at time t , S (t ) is described as Eq. (4): n h~1 n S (t )"A (t ), (4) h~1 n h`1 n~1 where A (t ) is the activation vector of second hidden units at time t . h`1 n~1 n~1 An instantaneous error measure is defined as MSE [5]: 1 (t ))2, (5) E(t )" + (d (t )!a j n j,h`2 n n 2 j|Nh`2 where N denotes the set of nodes of output layer and d (t ) indicates the desired h`2 j n (target) value of output node j at time t . n
257
S.-S. Kim/Neurocomputing 20 (1998) 253–263
The weights and time delays are updated by an amount proportional to the opposite direction of the error gradient, respectively [5]: LE(t ) n, *w "!g (6) jik,h 1Lw jik,h LE(t ) n, (7) *q "!g jik,h 2Lq jik,h where g and g are the learning rate. The learning rules are summarized as 1 2 follows [5]: *w "g d (t )a (t !q ), jik,h~1 1 j,h n i,h~1 n jik,h~1 (t !q ), *q "g o (t )w a@ jik,h~1 jik,h~1 2 j,h n jik,h~1 i,h~1 n where
GA GA
d (t )" j,h n and
(d (t )!a (t )) f @ (net (t )), j n j,h n j,h n
(8) (9)
j is an output unit
B
Kpj,h + + d (t )w (t ) f @ (net (t )), j is a hidden unit p,h`1 n pjq,h n j,h n h`1 p|N q/1
o (t )"! j,h n
(d (t )!a (t )) f @ (net (t )), j n j,h n j,h n
(10)
if j is an output unit
B
Kpj,h + + o (t )w (t ) f @ (net (t )), if j is a hidden unit p,h`1 n pjq,h n j,h n p|Nh`1 q/1 (11)
3. Simulation results 3.1. Chaotic time series prediction We have carried out the chaotic time series prediction of the Mackey—Glass delay differential equation [3]: dx(t) x(t!q) "!bx(t)#a . dt 1#x(t!q)10
(12)
This differential equation possesses many dynamic properties such as non-linearity, limit cycle oscillations, aperiodic wave forms and other dynamic behaviors, and provides a useful benchmark for testing predictive techniques [3,5,7]. We chose q"17, a"0.2, b"0.1, and integrated Eq. (12) using a four-point Runge—Kutta method with a step size of t"1 up to t"1000. To evaluate prediction performance, TDRNN and multiple recurrent neural network (MRNN) proposed in this paper, ATNN, and TDNN were simulated. The schematic architecture of the ATNN is depicted in Fig. 3. The TDNN is constructed
258
S.-S. Kim/Neurocomputing 20 (1998) 253–263
Fig. 3. Adaptive time-delay neural network [5].
from the ATNN if we fixed the time delays of the ATNN and applied weight learning without updating time delay variables. The MRNN is obtained from the TDRNN if we removed the input time delays of the TDRNN. The networks simulated were trained to predict six time steps into the future by using the same training samples. The training set consisted of the first 300 samples (from t"1 to 300), and the test set included a 1000 samples (from t"1 to 1000) of the differential Eq. (12). The TDRNN and MRNN were configured with 1 input unit, 2 first hidden units, 1 second hidden unit and 1 output unit, and the ATNN and TDNN were constructed with 1 input unit, 3 hidden units, and 1 output unit. These networks were structured consisting of the same number of processing units. We simulated the networks with various connection parameters. The connection parameters of the simulated networks are summarized in Table 1, where K , K represent the total number of ji,h~1 js,h~1 connections (i.e., time delays) to the first hidden node j from the input node i and from the internal state node s respectively, and K is the number of connections to the kj,h second hidden node k from the first hidden node j, and K is the number of lk,h`1 connections to the output node l from the second hidden node k. K represents hi,h~1 the number of connections to the hidden node h from the input node i and K is the oh,h number of connections to the output node o from the hidden node h in the ATNN and TDNN.
259
S.-S. Kim/Neurocomputing 20 (1998) 253–263 Table 1 The connection parameters and the simulation results of TDRNN, MRNN, ATNN, and TDNN Network K ji,h~1 architectures
K js,h~1
K kj,h
K lk,h`1
K hi,h~1
K oh,h
Network size (number of weights)
Prediction NMSE
TDRNN
12 13 14 14 14 14 14 15 16
4 3 1 2 3 4 5 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
x x x x x x x x x
x x x x x x x x x
35 35 33 35 37 39 41 35 37
0.00893936 0.00672498 0.00609035 0.00539166 0.00439992 0.00422026 0.00422970 0.00463554 0.00376471
MRNN
1 1
15 16
1 1
1 1
x x
x x
35 37
0.00974136 0.00956206
ATNN
x x x x x x
x x x x x x
x x x x x x
x x x x x x
4 8 10 10 12 14
8 4 2 4 4 2
36 36 36 42 48 48
0.0343383 0.0138575 0.0185332 0.0106871 0.00424409 0.00557528
TDNN
x x x x x x
x x x x x x
x x x x x x
x x x x x x
4 8 10 10 12 14
8 4 2 4 4 2
36 36 36 42 48 48
0.0880004 0.0467372 0.0407395 0.0148453 0.0057593 0.0108520
Note: (x) None.
We used the normalized mean-squared error (NMSE) to monitor prediction performance of the networks. The NMSE is defined as the following [6]: E[(x(t)!xL (t))2] NMSE" , E[(x(t)!E[x(t)])2]
(13)
where x(t) is the original signal and xL (t) is the network prediction value. An NMSE of zero indicates perfect prediction and an NMSE of unity means the prediction is no better than a constant predictor at E[x(t)]. The performance of each network was monitored after 3000 iterations. Fig. 4 shows learning curves for the networks configured with the same network size. The convergence of TDRNN is faster and steadier than that of the other networks. We tested the networks to predict the Mackey—Glass chaotic signal six time steps into the future. The predicted NMSE values of TDRNN, MRNN, ATNN and TDNN
260
S.-S. Kim/Neurocomputing 20 (1998) 253–263
Fig. 4. Learning curves showing the normalized mean-squared error (NMSE) of networks trained to predict the Mackey—Glass signal of an interval ¹"6, into the future. The solid line is for the TDRNN, the dashed double-dot line is for the MRNN, the dotted line is for the ATNN, and the dashdot line is for the TDNN.
Fig. 5. The simulation result of TDRNN. The solid curve is for the Mackey—Glass signal and the point curve is for the TDRNN outputs (NMSE"0.00376).
are shown in Table 1. When time delays were included but were fixed and weights were adapted (e.g., TDNN), the NMSE was 0.01484. When both weights and time delays were adapted (e.g., ATNN), the NMSE was improved at 0.01068. When weights and input time delays were adapted and recurrences were included (e.g., TDRNN), the NMSE was considerably improved at 0.00376. The prediction outputs of the networks are shown in Figs. 5—8 where the TDRNN and MRNN were constructed with less number of connections than the ATNN and TDNN. As we can observe from these figures, the TDRNN catches the variations of the bumping peaks better than the other networks do, and the best performance is attained by the TDRNN. These results suggest that adapting time delays offers more flexibility for network to achieve more accurate pattern matching than fixing time delays arbitrarily, and employing time delayed recurrences in the layered network is more effective for temporal correlations and prediction than putting multiple time delays into the neurons or their connections.
S.-S. Kim/Neurocomputing 20 (1998) 253–263
261
Fig. 6. The simulation result of ATNN. The solid curve is for the Mackey—Glass signal and the point curve is for the ATNN outputs (NMSE"0.01068).
Fig. 7. The simulation result of TDNN. The solid curve is for the Mackey—Glass signal and the point curve is for the TDNN outputs (NMSE"0.01484).
Fig. 8. The simulation result of MRNN. The solid curve is for the Mackey—Glass signal and the point curve is for the MRNN outputs (NMSE"0.00956).
262
S.-S. Kim/Neurocomputing 20 (1998) 253–263
Fig. 9. The simulation result of TDRNN. The solid curve is for the real stock price index and the point curve is for the TDRNN outputs (NMSE"0.015046).
3.2. Stock market index prediction The stock price change is so complex that researchers have not been able to discover a suitable model to handle behavior of stock market efficiently [12]. In this section, we carried out the prediction of Korean stock market index. The TDRNN, ATNN and TDNN were constructed with the same network size and trained to predict the index of stock price 1-day into the future using the same training data. The training set consisted of the first 250 stock price indices and the test set included the second 150 indices. The prediction NMSE values of TDRNN, ATNN and TDNN were 0.015046, 0.068610, and 0.069934, respectively. The prediction outputs of the TDRNN is shown in Fig. 9. Our experimental results have shown that the TDRNN well learn temporal correlations between current data and past events by using dynamic time delays and recurrences and it is capable of forecasting stock market trend with more acceptable accuracy than the ATNN and TDNN.
4. Conclusions We have presented a time-delay recurrent neural network (TDRNN) for temporal correlations and prediction. The TDRNN employs adaptive time delays and recurrences for temporal processing of sequences. The adaptive time delays let the TDRNN choose the optimal values of time delays for the temporal location of the important information in the input sequence and the recurrences enable the network to encode and integrate temporal context information of sequences which have arbitrary interval times and arbitrary length of temporal context. The TDRNN and MRNN described in this paper, ATNN proposed by Lin, and TDNN introduced by Waibel were simulated and applied to the chaotic time series prediction of Mackey—Glass delay-differential equation and the Korean stock market index prediction. The simulation results have shown that the TDRNN well learn temporal correlations between current data and past events by using dynamic time
S.-S. Kim/Neurocomputing 20 (1998) 253–263
263
delays and recurrences and the best performance is attained by the TDRNN. The results suggest that adaptive time delays improve performance, but it appears to be more useful when both adaptive time delays and recurrences were used. This TDRNN will be well applicable for temporally continuous domain, such as speech recognition, language processing and temporal signal identification.
References [1] U. Bodenhausen, A. Waibel, The tempo2 algorithm: Adjusting time-delays by supervised learning, in: R.P. Lippmann, J.E. Moody, D.S. Touretzky (Eds.), Advances in Neural Information Processing Systems 3, Denver, vol. 3, Morgan Kaufmann, San Mateo, 1991, pp. 155—161. [2] S.P. Day, M.R. Davenport, Continuous-time temporal back-propagation with adaptive time delays, IEEE trans. Neural Networks 4 (2) (1993) 348—354. [3] D. Farmer, Predicting chaotic time series, Phys. Rev. Lett. 59 (1987) 845—848. [4] D.T. Lin, J.E. Dayhoff, P.A. Ligomenides, Trajectory recognition with a time-delay neural network , Proc. IJCNN’92 3 (1992) 197—202. [5] D.T. Lin, J.E. Dayhoff, P.A. Ligomenides, Adaptive time-delay neural network for temporal correlations and prediction, in: SPIE Intelligent Robots and Computer Vision XI: Biological, Neural Net, and 3-D Method, vol.1826, Boston, 1992, pp. 170—181. [6] D.T. Lin, P.A. Ligomenides, J.E. Dayhoff, Learning spatiotemporal topology using an adaptive time-delay neural network, Proc. WCNN’93, 1 (1993) 291-294. [7] J. Moody, C.J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput. 1 (2) (1989) 281—294. [8] F.E. Norrod, M.D. O’Neill, E. Gat, Feedback-induced sequentiality in neural networks, Proc. ICNN’87 2 (1987) 251—258. [9] T. Uchiyama, K. Shimohara, Y. Tokunaga, A modified leaky integrator network for temporal pattern processing, Proc. IJCNN’89 1 (1989) 469—475. [10] A. Waibel, Modular construction of time-delay neural networks for speech recognition, Neural Comput. 1 (2) (1989) 39—46. [11] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K.J. Lang, Phoneme recognition using time-delay neural networks, IEEE trans. Acoust. Speech Signal Process. 37 (3) (1989) 328—339. [12] J.H. Wang, J.Y. Leu, Stock market trend prediction using ARIMA-based neural networks, Proc. ICNN’96 4 (1996) 2160—2165. [13] R.J. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Comput. 1 (2) (1989) 270—280.
Sung-Suk Kim received the B.S. degree in electrical engineering from Yeungnam University, Kyungsan, Korea, in 1985 and the M.S. and Ph.D. degree in electronics and computer engineering from the University of Ulsan, Ulsan, Korea in 1987 and 1990, respectively. From 1985 to 1991, he joined the Korea Electric Power Corporation (KEPCO). Since 1995, he has been with the department of computer science and statistics, Yongin University, Yongin, Korea where he is an Assistant Professor. His research interests include neural network, speech recognition, and multimedia.