Neurocomputing 2 (1990/91) 147-159 Elsevier
147
Time series forecasting using backpropagation neural networks F .S . Wong National University of Singapore, Institute of Systems Science, Kent Ridge . Republic of Singapore 0511
Abstract Wong, F .S ., Time series forecasting using hackpropagation neural networks, Neurocomputing 2 (1990/91) 147-159 . This paper describes a neural network approach for time series forecasting . This approach has several significant advantages over other conventional forecasting methods such as regression and Box-Jcnkins : besides simplicity, another major advantage is that it does not require any assumption to be made about the underlying function or model to he used . All it needs are the historical data of the target and those relevant input factors for training the network . In some cases, even the historical targets alone are sufficient to train the network for forecasting . Once the network is well trained and the error between the target and the network forecasts has converged to an acceptable level, it is ready for use . The proposed network has a three-dimensional structure which is proposed for capturing the temporal information contained in the input time series . Several real applications, including forecasting of electricity load, stock market and interbank interest rate forecastings were tested with the proposed network and the findings were very encouraging . Keywords . Backpropagation ; forecasting; prediction ; time series
1 . Introduction A problem that occurs repeatedly among the business, government and scientific communities is that of prediction of future sample values of a time series . The predicted tracks of aircrafts are often needed for a number of applications like prevention of mid-air collision . Prediction of the coordinates of aircraft position at some time in the future on the basis of measured positions provided by a radar sensor can be viewed as a problem in prediction of time series . Prediction of sunspot activity can also be viewed in this light . And, of course, there is always a great deal of interest in predicting the tourist arriving pattern, migration and immigration trends, number of passenger registration, national GNP, manpower and employment, bank interest and foreign exchange rates, future 0925-2312/91/$03 .50 © 1991-Elsevier Science Publishers B .V
values of common stocks, electricity usage, property prices, profitability of new ventures, etc ., based on various historical and current composite indices . Very little research work has been reported in the literature on using neural networks for time series prediction [7], a large amount of work on neural network applications is focused on pattern recognition and classification . However, Alan Lapedes and Robert Farber [1] attempted to do prediction of time series using a multilayer pereeptron, and they found that the neural network gave superior predictions of certain types of nonlinear, dynamical systems than several conventional methods based on adaptive filtering and polynomial curve-fitting . It is also reported in the 1988 DARPA Neural Network Study that neural network signal processors for time series prediction applications will likely outperform
1 48
F.S . Wong
those using conventional techniques only, and costs are also expected to be lower .
floating-point computations to such an extent that supercomputing or parallel processing is needed to deliver the speed . This is especially so during the design phase where a large number of tests and training of the networks are required . To make the situation worse, integration with other techniques would give rise to even more substantial increase in the computation load, and supercomputing or parallel processing will naturally become a must . On the other hand, the neural network approach is able to add more flexibility to the system capabilities and leads in certain circumstances to much better performance, as had been proven by Lapedes and Farber [1] . As for the speed requirement, a low-cost alternative is to implement the neural network on a hardware accelerator such as the INMOS transputer using an appropriate parallel computing model (e .g . [6]) . In general, a time series can be broken down into the following four components : secular trend, cyclical variation, seasonal fluctuation and irregular fluctuation [2] . Secular trends are long-
2. The Integrated Neural Network (INN) approach The objective of our work is to use neural networks as the underlying technique for predictive analysis, and then to integrate them with the conventional predictive techniques such as adaptive filtering, spectral analysis, moving averages, parabolic interpolation, trigonometric curve fitting, multiple regression, autoregession, BoxJenkins method [2], etc . Other areas of artificial intelligence (AI) such as Expert Systems and fuzzy theory are also being investigated for such integration . A proposed stock selection system [8] which integrates neural network and fuzzy logic for assessing country risks and rating of stocks is shown in Fig . 1 . The use of neural networks introduces nonlinear processing which requires massive,
state Market Forecast ------------
Country List
Stock List
Knowledge Acquisition ~`
Ccnpartg R les
es Rules
to
Country Rules
ig . 1 . The proposed Intelligent Stock Selection System . 'the FuzzyNet makes use of fuzzy logic and neural network techniques for processing of fuzzy rules and data .
Time series forecasting
term, slow moving, low frequency components ; cyclical variations usually take a few years to complete a cycle, and seasonal fluctuations within,a year [3] ; irregular fluctuations are random in nature and usually difficult to predict . Using spectral analysis, one can determine the relative strength of these components for a particular time series, and extract these components to extrapolate the future values of that time series in the time domain . The Fast Fourier Transform (FFT) or the Fast Hartley Transform (FHT) techniques could be used for this part of the analysis . As for the irregular, random components, they can be further divided into two subcomponents : the deterministic chaotic behavior and the stochastic noise . Conventional signal processing techniques, such as correlation function analysis, cannot distinguish between these two subcomponents . In this proposal, a feedforward, backpropagation neural network is proposed to uncover the underlying, deterministic algorithm of the chaotic subcomponent . The stochastic subcomponent is random and also indeterministic ; however, one can always extract the statistical information of a time series from its past values . The scope of this paper includes the description of the proposed neural network and how it can he applied to time series forecasting .
149 Sings Pnaiasd Oulpul
Output Layer
Hidden Layer #2
I 8 I I
n
1
f
Constant
e
Hidden Layer #1
I Input Layer
10
}}}}}}}}4}
-r
Single Record of Input Data
Fig . 2 . The 2-D neural forecast model . The numbers in the layers indicate the number of cells .
There is no theoretical limit on the number of hidden layers but typically one or two are sufficient to handle most problems ; a network with three hidden layers can solve some very complicated pattern classification problems . In a BP network, each layer is fully connected to the succeeding layer . The arrows indicate flow of information during computation . There are two phases in using the network : learning and recall . The network can be either hetero-associative or auto-associative (to be discussed later) . The standard BP processing element (also referred to as cell or node) is illustrated in Fig. 3, where :
3. Back Propagation Neural Network (BPNN) 3 .1 . Historical perspective and overview
Back propagation (BP) is a technique for solving the credit assignment problem posed by Minsky and Papert in Perceptrons . Rumelhart et al . [4] are usually associated with the invention of BP. Parker introduced a similar algorithm about the same time [5] and others have studied similar kinds of networks . Figure 2 shows the schematic diagram of a typical BP network . It always has an input layer, an output layer, and at least one hidden layer .
outputs Transfer Function Summation
/ '
weights (Vaneb , set, axaq maeirsme) inputs
Fig . 3 . A typical processing cell .
li0
F .S . Wong
x,[s] :
current output state of the Ith neuron in
layer s . w„[s] : weight of connection joining the ith neuron in layer (s - 1) to the jth neuron in layer
differentiable function of all the connection weights in the network . The actual error function is unimportant to understand the mechanism of BP. The critical value that is propagated back through the network layers is defined by :
S.
e,[s] = nEldll[s] . II[s] : weighted summation of inputs to the jth neuron in layer s . A BP cell therefore transfers its inputs as follows : X, [s
]=f(£i(w/iIs]*xi[S
I]))=f(I[s]),
where f is commonly the sigmoid function but can be any function, preferably differentiable . The sigmoid function is defined as : f(z) _ (1 .0 + e - ') - i . It is shown in Fig . 4 together with three other commonly used functions .
It will be shown that this can be considered a measure of the local error at cell j in layer s . Using the chain rule twice in succession gives rise to a relationship between the local error at a particular cell at level s and all the local errors at the layers s + 1 : el[s] = f (Jl[S]) *!k(ek[S + 1]
*
Wk)[S + 1]) -
If f is the sigmoid function as defined previously, then its derivative can be expressed as a simple function of itself as follows : f'(z) = f(z) * ( L0 - f(z)) Therefore the error function can be rewritten as :
3.2 . Backpropagating the local error Supposing now that the network has a global error function E associated with it which is a Sigmold function : t Y(s) =
t+e •s
$ Jo
(also sine function : y(s) = sin (s) Tanh function : y(s) = tanh (s)) Heaviside function :
0
Signum function ;
V
e, [s] = e,[s] * (1 .0 -xj[s])*Zk(ek[s+1]*Wki[s+1]), provided the transfer function is a sigmoid . The same procedure can be used to derive the error function when the transfer function is a hyperbolic tangent or sine . In short, the main mechanism in a BP network is to forward the input through the hidden to the output layers, determine the error between the actual output and the desired output (target), and then backpropagate the error from the output to the input through the hidden layer(s) . 3 .3 . Minimizing the global error
Hard-limiting function :
0
Fig . 4 . Some common transfer functions .
During the learning phase, the objective is to minimize the global error E by modifying the weights associated with the links . Given the current set of weights w„[s], one
Time series forecasting
needs to know how to increase or decrease them in order to reduce the global error . This can be done using a gradient descent rule as follows : d w„ [s] = -b * (d Eld w, ;[s]) ,
where b is the learning coefficient . The relationship between b and E can be easily obtained from a set of simple experiments and the recommended starting value for b is around 0 .9 . As a rule of thumb, the larger the value of b, the faster the global error will subside . On the other hand, a large value of b is likely to give rise to an unstable network . The above gradient descent rule can be used to change each weight according to its size and direction of negative gradient on the error surface. The partial derivatives in w,,[s] can be calculated directly from the local error values discussed in the previous subsection using the chain rule : dEldw;,[s] = (dEldl,[s]) * (at,[s] law, ;[s]) =-ei[s]*x,[s-1]
combining 8Eldw,,[s] into dw~,[s] : dw1;[s]=b*e[s]*x,[s-1] . 3 .4 . The global error function
The above discussion has assumed the existence of a global error function without actually specifying it . This function is needed to define the local errors at the output layer so that they can be propagated back through the network . Suppose a vector i is presented to the network, d is the desired output (target), and o is the actual output, then a measure of the error is given by :
151
ek (O) = -dEldlk (0) _ -d Eldo k - dk - O k
which is indeed the local error . E defines the global error of the network for a particular (i, d) . An overall global error function can be defined as the sum of all the pattern specific error functions . Then each time a particular (i, d) is presented, the BP algorithm modifies the weights to reduce that particular component of the overall error function . 3 .5 . The momentum rate
One of the problems of a gradient descent algorithm is how to set the learning rate . Changing the weights as a linear function of the partial derivative as defined above makes the assumption that the error surface is locally linear . At points of high curvature this linearity assumption does not hold and a divergent behavior might occur near such points . It is therefore important to keep the learning coefficient low in order to avoid such behavior . On the other hand a small learning coefficient can lead to very slow learning . The concept of momentum term was introduced to resolve this dichotomy . The delta weight equation is modified so that a portion of the previous delta weight is fed through to the current delta weight : dw,;[s]=R*et[s]*x ;[s-1]+a*dw,;[s] .
This acts as a low-pass filter on the delta weight terms since general trends are reinforced, whereas oscillatory behavior cancel itself out . This allows a low learning coefficient but faster learning rate . a is referred to as the momentum rate . The recommended starting value of a is around 0 .7 .
E=0 .5*F'k(dk-Ok)',
where k indexes the components of d and o . From the previous definition of the local error at each output cell,
3 .6. Creating a BP network
A typical BP network consists of at least three layers . The input layer acts as a buffer for ac-
152
F. S. Wong
cepting input data, the hidden layer(s) and output layer are associated with modifiable weights which will be adjusted during the learning process to approximate the system function . A BP network can be either auto-associative (unsupervised learning) or hetero-associative (supervised learning) depending on the objective of the network . In our study, the network is heteroassociative, i .e . it has to be trained with historical data patterns. Although one hidden layer is sufficient for most cases, there is no guarantee that the network with one hidden layer will attain the performance of those with more than one . The number of input cells must always match the number of fields in the input record set . As a rule of thumb in this application, the number of cells in the first hidden layer is usually twice the number of input cells . The number of output cell(s) must match that of the target(s) to be forecasted, and is usually one .
a training set could be presented cyclically until weights stabilize . Step 3: calculate actual outputs .
Use the sigmoid nonlinearity from above to calculate outputs y,,, y„ y 2 , . . . , y M , . Step 4 : adjust weights .
Use a recursive algorithm starting at the output nodes and work back to the first hidden layer . Adjust weights using : w,J (t + 1) = w,t (t)
+
P5'x ; .
In this equation, w,,(t) is the weight from hidden node i or from an input to node j at time t, x' is either the output of node i or is an input, P is the learning coefficient, and S, is an error term for node j . If node j is an output node, then : Si = y ;(1 - y,)(d1 - y) ,
3 .7 . The BP training algorithm in summary
The BP training algorithm is an iterative gradient descent algorithm designed to minimize the mean square error between the actual output of a multilayer feed-forward neural network and the desired output . It requires continuous differentiable nonlinearities such as the sigmoid logistic function f(z) defined below : f(z) = (1 + e -2 ) _' . Step 1 : initialize weights and offsets .
Set all weights and node offsets to small random values .
where d1 is the desired output of node j and y, is the actual output . If node j is an internal hidden node, then 3i = x'(1-xj)I k d k w, k , where k is overall index for nodes in the layer after node j . Internal node thresholds are adjusted in a similar manner by assuming they are connection weights on links from auxiliary constant-valued inputs . Convergence is sometimes faster if a momentum term is added and weight changes are smoothed by : w,i (t + 1) = W;r (t) + /38,x ;
Step 2 : present input and desired outputs .
Present a continuous valued input vector x 0 , x„ x„ . . . , x,v _, and specify the desired outputs d o , d l , . . . , d M , . If the net is used as a classifier then all desired outputs are typically set to zero except for that corresponding to the class the input is from . That desired output is 1 . The input could be new on each trial or samples from
+ a(w ii (t) - w 1i (t - 1)) ,
where 0 < a < 1 and a is the momentum rate coefficient . Step S: repeat by going to Step 2 until the error term is reduced to an acceptable level .
Time series forecasting
3 .8. Discussions
The generally good performance found for the BP algorithm is somewhat surprising considering that it is a gradient search technique that may find a local minimum in the error term instead of the desired global minimum . Suggestions to avoid this problem include : (1) train the BP network many times, each with different initial random weights (this can be achieved by varying the seed value used by the random generator) ; (2) lower the learning coefficient ; (3) add extra hidden layers . Another problem noted with the BP algorithm is the rather large amount of training required for convergence (e .g . 100 to 400 passes through all the training data) . Although a number of training algorithms have been proposed to speed up the convergence, it seems unlikely that a significant reduction in the amount of training can be achieved .
4 . Using a BPNN for time series forecasting 4 .1 . Overview
A great deal of interest has been shown in using neural networks to perform qualitative reasoning, and relatively little work has been done in exploring their ability in processing floating point numbers in a massively parallel fashion . This section will examine the forecasting of `chaotic' time series using a backpropagation (BP) network . The results presented here can be applied to many engineering, business and industrial management problems which require accurate forecast . Many conventional signal processing tests, e .g . correlation function, cannot distinguish deterministic chaotic behavior from stochastic noise . Particularly difficult systems to predict are those nonlinear and chaotic ones . Chaos has a technical definition based on nonlinear, dynamical systems theory, but intuitively a chaotic system is
1 53
one which is deterministic but 'random'- rather similar to deterministic, pseudo-random number generators . A time series, by definition, is an ordered succession of numbers representing the values of a particular variable over a given period of time (e .g . monthly sales figures for 1970 through now) . Clearly, if one can uncover the underlying, deterministic algorithm from a chaotic time series, using a neural network or a non-neural network approach, one is likely to forecast its future values extrapolation quite accurately . 4 .2 . The forecasting power of BPNN
BP applications involve using binary (0, 1) or analog data (any floating-point numbers) as inputs and outputs . Let us consider a natural system described by nonlinear differential equations which have an infinite dimensional phase space (i .e . an infinite number of values are necessary to describe the initial conditions) . The GlassMackey equation is one such a delay, nonlinear differential equation : x'(t) = a `x(t -
T) /(1
+ x "'(t - r)) - b"x(t - T)
.
The initial condition is specified by an initial function defined over a strip of width T (hence the infinite dimensional phase space, i .e . initial functions, not initial constants, are required) . By choosing this function to be a constant function, and by integrating x'(t) with respect to time t, one is able to obtain a time series x(t) which is function of a, b and T . For example, let a = 0 . 2, b = 0 . 1, and T = 17, a time series x(t) which is chaotic with a fractal attractor of dimension of 2.1 is obtained . By varying the value of T, one is able to obtain other chaotic systems commonly found in nature . The time series can now be used to determine the forecasting accuracy of BP networks . The procedure is as follows : take a set of n values x(t- n + 1) . . . . x(t -1), x(t) as training inputs, and x(t + P) as the target output to train the BP
154
F.S . Wong
network, such that (t + P) -_ T _~ Tcurrentt where P is the number of forecasting steps ahead of time t. Repeat the training process until the Mean Square Error between the network output and the target has dropped to a certain acceptable level. Now feed the BP network with test inputs x(t - n + 1) . . . . x(t-1), x(t) such that min(T, Tcu rrent - P) ~_ t g Tcurrenr~ compare the output (i.e. the forecast) with the actual values of x(t) and compute the accuracy measurements for P. Repeat the training and testing procedures for a range of values for P and obtain the corresponding measurements . The fundamental nature of chaos dictates that forecasting accuracy will decrease as P increases - this applies to all existing forecasting methods including the BP approach . The questions now are "How rapidly does the degradation occur? Does the BP approach fare better than others?" The work of Lapedes and Farber showed that the BP approach can be more accurate than conventional methods, especially at large values of P. 4 .3 . Discussions
The nonlinear nature of the activation (transfer) function used by the hidden neurons allows chaotic time series to be forecasted . Chaotic time series are emitted by deterministic nonlinear systems and are sufficiently complicated that they appear to be `random' time series . However, because there is an underlying deterministic map that generates the series, there is a closer analogy to pseudo-random number generators than to stochastic randomness . BP networks are able to forecast well because they extract, and accurately approximate these underlying maps . Deterministic chaos has been implicated in a large number of physical situations including the onset of turbulence in fluids, chemical reactions, lasers and plasma physics, to name but a few . In addition to these engineering applications, there are also several independent studies on using BP for financial applications with very promising findings .
Besides, there are other nonlinear system modeling examples to show the ability of neural networks in inferring the correct mappings used to transform the input data sets . This somewhat mysterious ability to infer mappings is actually nothing more than real valued function approximation when viewed in the context of signal processing . 5. The neural forecaster models 5.1 . The 2-D model
Based on the BP network described before, a two-dimensional neural network model has been built and is shown in Fig . 5 . The cells in the input layers use linear transfer functions, and half of the number of cells of the second layer use the sigmoid function, and the other half use the sine function . All the cells in the third layer use the sigmoid function, and the output cells use a linear function . The input data file structure used by the model is presented in Figs . 6 and 7 . It only accepts a single record at a time, and generates a single predicted output . This 2-D model is found to be useful for classification problems such as credit rating, in which the input SinglePredcedOUn s
1 8 Sigmoid
constant ='1'
I
Single Record o1 Input Data
Fig . 5 . The transfer functions of the various layers in a 2-D neural forecast network .
Time series forecasting Dale
Target (Inl) In2 In3
( Jan89 - .05305 1 .8 1.9 Feb89 0 .034148 Mar89 0.022623 .4 .9 Apr89 0 .000456 3 3 May89 -0 .06634 Jun89 -0.05656 3.3
In4
17 .9 0 .0602 17.5 0.0613 6 .46 0 .0622 1056 0 .059 10.56 0 .0632 9.6 0.0619
Ins
In6
In7
InS
0 .0632 0.0634 0.0613 0.0606 0 .0629 0 .0623
-0 .01593 -0.05305 0.034148 0.022623 0.000456 -0 .06634
0.058441 0.061688 0.054838 0.054662 0 .050955 0 .050793
5.4 ) A Single Input Record 7.3 9 10.2 1 9.5 Multiple Input Records 9.4
b
s
0 d z
Aug89 -0 .02083 Sep89 0.056189 Oct89 -0 .03313 Nov89 -0.01534 DecS9 -0.0709
155
.:
1
I.
3.3 9 .6 0.0628 0.0724 0.047492 3.5 9.05 0.0653 0.0708 -0.02083 3 .3 9.6 0 .0653 0 .072 0. 056189 3.4 9.31 0.0676 0.0774 -0.03313 3 .5 9 .05 0.0692 0.0828 -0.01534
0.066878 0 .050156 0 .059190 0 .052631 0.049079
7 .5 9.7 8.8 4.8 3.4
No. of Fields 21 Fig . 6 . A typical input file structure used to train and test the Neural Forecaster . When the number of fields equals 1, the training/testing is done using the target time series itself .
Dale Current Record
Multiple Step Ahead Targets
Target (Inl) In2 In3
1 .8 Ian89 -005305 Feb89 0.034148 .9 1 Mar89 0.022623 4 .9 Apr89 .0000456 3 May89 A 06634 3 Jun89 -0 .05656 ) 3 .3 Jul89y 0.,0 p42704g9}2 3 .4 Setr84 0.056189 ) 3.5 Oc189 -0.03313 ) 3 .3 3.4 Nov89 .0.01534 Dec89 -0.0709 3 .5
179 175 6 .46 10 .56 10.56 9.6 9.31
In4
Ins
In6
00607 0.0613 0.0622 0.059 0.0632 0.0619 0.0612
00632 0.0634 0.0613 0.0606 0.0629 0.0623 0.0728
TO
-0 01591 -0.05305 0.034148 0.022623 0 .000456 -0.06634 -0.05656 0.047492 9.05 0.0653 0.07 -0.02083 9 .6 0 .0653 0 .072 0.056189 9.31 0.0676 0.0774 -0.03313 9 .05 0.0692 0.0828 -0.01534
8
Toll
0.058441 5 4 0.061688 7 .3 0.054838 9 0.054662 10 .2 0.050955 95 0.050793 9 .4 0 .053968 8 0.066878 7.5 0.050156 9.7 0 .059190 8.8 0 .052631 4.8 0 .049079 3.4
Fig . 7 . The current record(s) for training! testing and the targets .
to the networks are attributes of the problems which need not be a time series .
4}t / Multiple Predicted Outputs Linear
5.2. The 3-D model
The three-dimensional model is an extension of the 2-D model (see Fig. 8) . Instead of taking one set of attributes at a time, it is designed to accept several sets together, thus capturing the temporal information contained in the sets . This model is also used to generate multiple forecasts as depicted in Figs . 7 and 8 . Because it is designed to capture the temporal information of the input record sets, it is therefore only suitable for processing time series . For non-time series applications, the number of records per set should be reduced to one, in which case the 3-D model is the same as the 2-D .
Constant
='1'
Fig . 8 . The architecture of a 3-D neural forecast network .
1 56
F .S . Wong
5.3. Discussions
The 2-D model is easier to implement than the 3-D model, and has more applications than the latter . On the other hand, the 3-D model is able to capture the temporal information contained in each set of the records, giving rise to more accurate results when processing time series . The penalties one has to pay for using the 3-D model are much higher computer storage requirement and computation time .
6. Implementation Both the 2-D and 3-D Neural Forecasters have been implemented using the C language and run on the Apple Macintosh Il (compiled with Think's C), IBM PC/AT and PS/2 (compiled with Turbo C), and will be implemented on workstations soon . Currently they are developed as individual packages, but they can be integrated with other applications .
7. Application examples To illustrate the forecasting power of the neural network, we applied it to forecast the S & P500 weekly closing price using the 3-D network . The 3-D network was chosen because the training and testing data were in the form of a time series . The training file contained the closing prices for 140 consecutive weeks, whereas the test file contained the same with an addition of the following 12 weeks . With reference to the terms used in Figs . 6 and 7, the number of fields in this case is 1 (only the target itself), the number of records per set is 10, the forecasting step is 1 . Therefore the total number of sets available for training is (140-1-10) = 129, and (140 + 12-10) = 142 in the case of testing . The neural network forecasts are shown in Fig . 9 together with the 10-week moving average for comparison . Similar tests were also carried out
on the ST and KLSE indices . Singapore (Fig . 10) and US stock market returns, interest rate, and the Singapore electricity usage patterns (Fig . 11) .
8. Other potential applications Forecasts are one of the most important methods managers develop to support the decision-making process . Virtually every important operating decision depends to some extent on a forecast . Inventory accumulation is related to the forecast of expected demand ; the production department has to schedule employment needs and raw materials orders for the next month or two ; the finance department must arrange shortterm financing for the next quarter ; the personnel department must determine hiring and layoff requirements . The list of forecasting applications is quite lengthy . Executives are always keenly aware of the importance of forecasting . Indeed, a great deal of time is spent studying trends in economic and political affairs and how events might affect demand for products and/or services . One issue of interest here is the importance executives place on quantitative forecasting versus their own opinions . But one problem with quantitative forecasting methods is that they depend on historical data . For this reason they are less effective in calling the turn that often results in sharply higher or lower demand . Through techniques such as neural networks, the computer can aid managers in automating sensitivity to sharp changes in demand . The description so far has centred around forecasting of time series . In fact, the BP network can also be used for classification applications which do not involve time series . One promising use is for processing credit card or bank loan applications, based on facts about the applicants . These facts usually include such things as salary, number of checking accounts and previous credit history . Large banks and
157
lithe series forecasting
L 0.1200 0.1160 0.1220 0.1310 0.1160 0.1350 0.1510 0.1620 0.1610 0.1690 0.1610 0.1620 0.1730 0.1760 0 .1800 0 .1920 0.1850 0.1740 0.1670 0 .1520 0 .1590 0 .1580 0 .1550 0 .1550 0 .1300 0 .1350 0 .1390 0 .1350 0 .1530 0 .1590 0 .1710 0 .1820 0 .2090 0 .2210 0.2260 0.2350 0.2630 0 .2740
M 0.1330 0.1305 0.1284 0.1299 0.1352 0.1330 0.1404 0.1511 0.1604 0 .1659 0 .1725 0 .1722 0.1713 0.1743 0 .1760 0 .1761 0 .1829 0 .1840 0 .1799 0 .1731 0 .1612 0 .1564 0 .1532 0 .1502 0 .1492 0 .1394 0 .1353 0 .1355 0 .1342 0 .1408 0 .1472 0 .1575 0 .1715 0 .1939 0.2156 0.2287 0.2399 0.2588
R 0 .1179 0 .1199 0 .1204 0 .1202 0 .1203 0 .1199 0 .1204 0 .1231 0 .1281 0 .1331 0 .1383 0 .1424 0 .1470 0 .1521 0 .1566 0 .1630 0 .1687 0 .1721 0 .1733 0 .1739 0 .1722 0 .1720 0 .1716 0 .1698 0 .1677 0 .1627 0 .1570 0 .1524 0 .1485 0 .1471 0 .1478 0 .1490 0 .1514 0 .1568 0 .1634 0 .1730 0 .1830 0 .1954
L
M
R
0.2620 0 .2700 0 .2830 0 .2600 0.2510 0 .2730 0 .2870 0 .3130 0 .3370 0.3490 0,3420 0 .3970 0 .3970 0 .4090 0 .4090 0 .3970 0 .4300 0 .4380 0 .4300 0 .4040 0 .4020 0 .4210 0 .4540 0 .4430 0 .4430 0 .4520 0 .4630 0 .4780 0 .4350 0 .4050 0 .4150 0 .3970 0 .4000 0 .4500 0.4660 0.4810 0.4840 0.4550
0.2739 0.2783 0 .2837 0 .2926 0.2918 0.2874 0.2924 0.3045 0,3239 0 .3409 0.3533 0.3604 0.3711 0.3782 0 .3871 0 .3953 0 .4000 0 .4083 0 .4165 0 .4225 0 .4238 0 .4254 0 .4265 0 .4347 0 .4404 0 .4454 0 .4514 0 .4552 0 .4616 0 .4543 0 .4453 0 .4428 0,4384 0 .4352 0.4428 0.4552 0.4690 0.4757
0 .2093 0 .2202 0 .2313 0 .2425 0.2503 0 .2545 0 .2597 0 .2658 0 .2736 0.2810 0 .2885 0 .2965 0 .3092 0 .3206 0.3355 0.3513 0.3637 0.3780 0.3905 0.3998 0.4053 0.4113 0.4137 0.4194 .4228 0 0 .4262 0 .4317 0.4350 0.4390 0 .4395 0 .4396 0 .4409 0 .4385 0 .4331 0 .4338 0 .4361 0 .4390 0 .4411
L 0 .3760 0 .3960 0 .3840 0 .3980 0.4120 0.4110 0.4340 0.4470 0.4500 0 .4440 0 .4610 0 .4850 0 .4700 0 .4650 0 .4580 0 .4470 0 .5080 0 .5460 0 .5850 0 .5920 0 .6210 0 .6140 0 .6430 0,6360 0,6690 0,6710 0 .7060 0 .7240 0 .7180 0 .7250 0 .6500 0 .6810 0 .6570 0 .6930 0.6870 0.6480 0.6690 0.6910
M 0 .4651 0 .4396 0 .4304 0 .4250 0.4289 0.4368 0.4400 0 .4431 0.4432 0.4403 0.4375 0.4446 0.4569 0.4650 0 .4688 0 .4688 0.4668 0.4858 0.5196 0,5647 0 .5944 0 .6212 0 .6331 0 .6529 0 .6618 0 .6813 0 .6905 0 .7107 0 .7306 0 .7391 0 .7456 0 .7069 0 .6988 0 .6808 06825 06788 0.6489 06516
R 0.4388 0.4329 0.4320 0.4289 0 .4290 0 .4302 0 .4263 0 .4231 0.4197 0 .4163 0 .4152 0 .4237 0 .4326 0 .4412 0 .4479 0 .4525 0 .4561 0 .4635 0 .4734 01.4869 0 .5017 0 .5177 0 .5306 0 .5479 0 .5650 0 .5861 0 .6085 0 .6283 0 .6461 0.6594 0 .6727 0.6756 0.6823 0.6837 0.6894 0.6912 0.6889 0.6852
L 0 .7240 0 .7500 0 .7640 0 .7550 0 .7580 0 .7890 0 .7730 0 .8090 0 .8310 0.8890 0.8960 0.9000 0.8650 0 .8260 0.8310 0.8220 0.8560 0.8560 0.7880 05070 0 .4740 0A930 0 .4570 0 .4480 0 .4460 0 .3820 0 .4090 0 .4600 0 .4800
M 0 .6722 0 .7028 0 .7408 0.7614 0.7720 0.7784 0.8045 0.8076 0.8216 0.8352 0.8645 0.8752 0.8776 0.8682 0 .8443 0 .8250 0 .7999 0 .7727 0 .7348 0 .6514 0 .4491 0 .4300 0 .4428 0 .4385 0 .4328 0 .4328 0 .4407 0 .4696 0 .4810
R 0.6819 0.6825 0.6850 0.6964 0 .7038 0 .7139 0,7235 0 .7321 0 .7482 0 .7644 0 .7842 0 .8014 0 .8164 0 .8265 0 .8336 0 .8409 0 .8442 0 .8525 0 .8572 0 .8529 0 .8147 0.7725 0.7318 0 .6910 0.6532 0.6147 0.5707 0.5260 0.4864
Fig. 9 . The S & P500 weekly closing price (the targets) from 1985 to 1987 on the left (L) side of the columns, the neural network predictions in the middle (M) . and the ten-week average is on the right (R) . All the data were normalized to the range [0, 1] .
lenders lose millions each year from bad debts . Even a small increase in the ability to predict the credibility of the applicants can result in hundreds of thousands of dollars saved each year . Neural network techniques have also been applied to the field of marketing . For years, advertising agencies and marketing companies have been trying to identify and sell to target, or specific markets . For example, consider a direct
marketing company which sends out advertisement brochures enclosed in monthly credit card bills on a regular basis . What the company would like to do is to send out only a small percentage of these brochures and keep information on those who respond . Once the company has these data, it can build a predictive model using neural networks to select only those who are likely to respond, thus cutting down the expenses .
1 58
F.S. Wong
Fig . 10 . The actual stock market returns, outputs from neural forecaster and regression . The neural network was trained on data prior to January 1987 .
1 .0
iii
.hiii
IIIII11111117111
1
111111
,
n
i
iii
u
TIME Fig . 11 . The Singapore power demand pattern (daily from 2 pm to 12 noon next day) and the 3-D neural forecaster output (thick line) .
Time series forecasting
Another possible application is in evaluating and forecasting property prices . The network can be trained with attributes such as the location, land size, built-in area, number of rooms, together with other economic input data, to suggest the current and future selling prices .
9 . Conclusions We have outlined an integrated neural network (INN) approach for business and industrial forecasting, using the backpropagation neural network (BPNN) as the underlying technique . We have also presented the construction, training, testing and performance of the BPNN for use in forecasting, and the 2-D and 3-D Neural Forecasters . Although the work presented here is mainly centred around time series forecasting, we expect that many other business and industrial application areas may be fruitfully investigated with the approach, especially for areas where no a priori theoretical model or underlying mathematical function is available nor can be easily determined .
References [1] A . Lapedcs and R . Farber, How neural nets work, in : Proc . Conf. on Neural Information Processing Systems (AIP, 1987) .
1 59
[2] J . Hanke and A . Reitsch, Business Forecasting, 3rd ed . (Allyn & Bacon, Newton, MA, 1989) . [3] E . Helfert, Techniques of Financial Analysis, 6th ed . (Irwin . Homewood, IL, 1987) . [4] D . Rumelhart, G . Hinton and R . Williams, Parallel Distributed Processing (MIT Press, Cambridge, MA, 1986) . [5] D . Parker, Learning-Logic, Report I'R-47 .. MIT Center for Computational Research in Economics and Management Science, 1985 . [6] F . Wong, P. Tan and K . Tan, Parallel implementation of the ncocognitron neural network on an array of transputers, Technical Report, Institute of Systems Science, National University of Singapore, Jan 1990 . [7] F . Wong, Time series predictive analysis using back propagation neural network, Technical Report, Institute of Systems Science, National University of Singapore, May 1990 . [8] F. Wong, P .Z . Wang and H .H . Heng, A stock selection strategy using fuzzy neural networks, to appear in : L .F . Pau, ed . . Comput . Sci . in Economics and Management . (Kluwer, Dordrecht, The Netherlands),
Dr . Francis Wong is Research Staff Member of the Institute of Systems Science, National University of Singapore . Prior to joining ISS, he has worked as assistant professor and technical consultant in US and Canada . He has B . Eng . (Hons), MASc and PhD degrees in Electrical and Computer Engineering . His current research interests include : parallel processing, transputer programming, pattern recognition, neural network and fuzzy engineering for various financial and industrial applications with emphasis on forecasting selection and monitoring problems . He has implemented several prototypes and commercial systems for the above-mentioned applications and published over 25 technical papers in these areas .