NEURAL NETWORKS FOR ECONOMIC FORECASTING PROBLEMS KAZUHIRO KOHARA NTT Cyber Solutions Laboratories, Musashino-shi, Tokyo 180-8585, Japan
I. INTRODUCTION 1175 II. UNIVARIATE TIME-SERIES FORECASTING 1175 III. MULTIVARIATE PREDICTION 1177 A. Multivariate Stock Market Prediction 1177 B. Bond Rating Prediction 1182 C. Electricity Load Forecasting 1184 IV HYBRID SYSTEMS 1187 A. Integration of Knowledge and Neural Networks 1187 B. Prior Knowledge and Event-Knowledge 1188 C. Selective Presentation Learning for Forecasting 1189 D. Stock Market Prediction 1190 V RECURRENT NEURAL NETWORKS 1194 VI. SUMMARY 1195 REFERENCES 1195
I. INTRODUCTION Predicting the future behavior of real-world time series using neural networks [1] has been extensively investigated (e.g., [2-7]), because neural networks can learn nonlinear relationships between inputs and desired outputs. Various attempts have been made to apply neural networks to financial market prediction (e.g., [8-10]), electricity-load forecasting (e.g., [11, 12]), and other areas (e.g., flour price prediction [13]). This chapter describes various neural network approaches for economic forecasting problems: univariate time-series forecasting, multivariate prediction, hybrid systems, and recurrent neural networks. II. UNIVARIATE TIME-SERIES FORECASTING White [14] reported niques to search for focusing on the case ral prediction model
some results on neural network modehng and learning techand decode nonlinear regularities in asset price movements, of IBM common stock daily returns. Figure 1 shows the neufor forecasting IBM daily stock returns. The structure of the
Expert Systems, Edited by Cornelius T. Leondes, Volume 4 Copyright © 2002 by Academic Press. All rights of reproduction in any form reserved.
ISBN: 0-12-443884-9 / $35.00
I I 75
176
KAZUHIRO KOHARA
RETURN{t) — RETURN(t-\) RETURN{t-2)
Neural Network
RETURN{t-3)
(size: 5-5-1)
- > RETURN(t + l)
RETURN{t-4) RETURN(t)\ the one day-rate of return to holding IBM common stock on day t. FIGURE I
Neural network for forecasting IBM daily stock returns [14].
neural network was 5-5-1 (five neurons in the input layer, five in the hidden layer, and one in the output layer). Inputs were five previous days' rate of return. The one-day return r^ is defined as
where p^ is the closing price on day t and d^ is the dividend paid on day t. As a standard procedure, White [14] used a linear autoregressive model for asset returns of the form r, = u;o + w^r,_^ + • • • + Wpr,_p + ^„
r = 1, 2 , . . . ,
where w is an unknown column vector of weights, p is a positive integer determining the order of the autoregression, and ^^ is a stochastic error. During the training (in-sample) period of 1000 days, a linear-autoregressive model did not detect nonhnear regularities (i.e., did not learn the nonlinear inputoutput relationship), but the neural prediction model detected them (i.e., learned the nonlinear input-output relationship very well). The backpropagation algorithm [1] was used to train the network. Table 1 shows experimental results for an out-ofsample period (postsample period of 500 days and presample period of 500 days). Correlation in the out-of-sample period was small for both models. The in-sample results are either the result of overfitting or of learning evanescent features. White [14] pointed out that expansion of the research could be done by elaborating the network to allow additional inputs (e.g., other stock prices, leading indicators, macroeconomic data) and by permitting recurrent connections. White [14]
TABLE I Experimental Results for Forecasting IBM Daily Stock Returns [14]
Correlation'' in a post-sample period of 500 days Correlation in a pre-sample period of 500 days
Neural network
Linear autoregressive model
-0.0699
-0.207
0.0751
0.0996
"Correlation coefficient between real values and predicted values.
1177
NEURAL NETWORKS FOR ECONOMIC FORECASTING
also pointed out that neural networks for market trading purposes should be evaluated and trained using profit and loss in dollars generated trades, not squared forecast error. Tang [15] discussed the results of a comparative study of neural networks versus Box-Jenkins methodology [16] in forecasting univariate time series: airline passenger data, domestic car sales, and foreign car sales. Tang [15] concluded that neural networks represent a promising alternative forecasting process, but there are problems determining the optimal topology and parameters for efficient learning.
III. MULTIVARIATE PREDICTION A. Multivariate Stock Market Prediction I. German Stock Market Prediction Freisleben [17] evaluated the performance of backpropagation neural networks applied to the problem of predicting German stock market prices. Figures 2, 3, and 4 show the three neural prediction models for predicting the weekly price of the FAZ Index, which is calculated on the basis of 100 major German stocks and
^1(0 alio bl{t) b2{t) b3(t) c\{t) .
Neural Network
- • al(t + l)
(size: 12-11-1)
c2(0c3(0 e\{t) e2{t) -
eSit) al: price of FAZ Index, a2: absolute difference to previous price of FAZ Index, a3: trend of FAZ Index price movement, bl: value of 5-day moving average, b2: absolute difference to previous value of 5-day moving average, b3: trend of 5-day moving average, cl\ value of 10-day moving average, c2: absolute difference to previous value of 10-day moving average, c3: trend of 10-day moving average, el: value of bond market index, e2: absolute difference to previous value of bond market index, e3: trend of bond market index. FIGURE 2
Neural Network I for forecasting German weekly stock price [17].
1178
KAZUHIRO KOHARA
a\{t) a2{t)~
am b\{t) bl{t) ^3(0 c\{t) c2{t) c3(0 d\{t) ^2(0J3(0d(/) e2(t) . e3(0/KO/2(0/3(0gl(0 g2(0g3(/)-
Neural Network (size: 21-20-1)
- > fll(^ + l)
cf/: value of 90-day moving average, d2: absolute difference to previous value of 90-day moving average, d3: trend of 90-day moving average, fl: value of order index, f2: absolute difference to previous value of order index, f3: trend of order index, gJ: value of U.S.-dollar exchange rate, g2: absolute difference to previous value of U.S.-doUarexchange rate, g3: trend of U.S.-dollar exchange rate.
FIGURE 3
Neural Network 2 for forecasting German weekly stock price [17].
al(0 — a\{t-\)al(r-2)al(t-3)al(t-4)-.
Neural Network
a\{t-5)^
(size: 10-9-1)
-> al{t + \)
a\(t-6)~ al(t~7y. al{t-9)al: price of FAZ Index.
FIGURE 4
Neural Network 3 for forecasting German weekly stock price [17].
NEURAL NETWORKS FOR ECONOMIC FORECASTING
WtM
I I 79
TABLE 2 Experimental Results of t h e Neural N e t w o r k I (Fig. 2) for Forecasting German Weekly Stock Price [ 1 7 ]
Minimum error Maximum error Average error Trend correct (absolute)
Learning set
Test set
0.0004 0.0584 0.0160 90% (90/99)
0.0000 0.1545 0.0519 74% (38/51)
may be regarded as one of the German equivalents of the Dow-Jones Index in the United States. Not only the FAZ Index but also a variety of other economical factors (example, moving average, order index, U.S.-dollar exchange rate, and bond market index) were input. In the first prediction model (Fig. 2), FAZ Index, 5-day and 10-day moving average, and bond market index were used. In the second model (Fig. 3), 90-day moving average, order index, and U.S.-dollar exchange rate were additionally used. In the third model (Fig. 4), only 10-day FAZ Indexes were used. The time period spanned January 23, 1987 to December 22, 1989. The first 100 weeks were used as the training set and the remaining 53 weeks were used as the test set. All the data were scaled to the interval [0.1, 0.9]. All weights of a network were initialized to small random values between —0.1 and 0.1. The learning rate was 0.7 in the second model (Fig. 3) and 0.9 in the first model (Fig. 2) and the third model (Fig. 4). The momentum parameter was 0.9 in all models. The network learning was stopped at 3000 cycles. Tables 2, 3, and 4 show experimental results. Too many factors did not necessarily improve the learning capabilities of the network, but the presence of several fundamental factors led to better prediction results. Regression methods applied to the time series used in the above experiments correctly predicted the trend between two successive values in the test set in 51-62% of the cases, whereas the range of correct trend predictions in the test set with the neural networks lay between 72% and 76%. 2. U K S t o c k M a r k e t P r e d i c t i o n
Refenes and co-workers [9, 18] examined the use of neural networks as an alternative to classical statistical techniques for forecasting within the framework of the arbitrage pricing theory (APT) model [19] for stock ranking. The APT model
TABLE 3 Experimental Results of t h e Neural N e t w o r k 2 (Fig. 3) for Forecasting German Weekly Stock Price [17]
Minimum error Maximum error Average error Trend correct (absolute)
Learning set
Test set
0.0001 0.0656 0.0208 78% (78/99)
0.0007 0.1722 0.0482 76% (39/51)
1180
KAZUHIRO KOHARA
TABLE 4 Experimental Results of the Neural Network 3 (Fig. 4) for Forecasting Gernnan Weekly Stock Price [17]
Minimum error Maximum error Average error Trend correct (absolute)
Learning set
Test set
0.0004 0.1042 0.0327 81% (73/90)
0.0001 0.1655 0.0485 72% (37/51)
is widely used in portfolio management as an alternative to the capital asset pricing model (CAPM) [20]. According to the APT framework, three stages are necessary during the investment process: (1) preprocessing of data to calculate relative values for the factor involved; (2) ranking the stocks; (3) constructing the portfolio. The purpose of stock ranking is the construction of a portfolio. Stock ranking is defined as the task of assigning ratings to different stocks within a universe. Figure 5 shows the neural prediction model for U.K. stock ranking. Inputs A, B, and C were parameters extracted from the balance sheets of the companies in the universe of the U.K. stocks (details of these factors are not specified). Output was outperformance of each stock six months ahead. The data set covered the period May 1985 to December 1991 on a monthly basis, and concerned 143 stocks. The learning rate was 0.3, momentum rate was 0.3, and the network learning was stopped at 25,000 iterations. Multiple linear regression models were also examined. Table 5 shows experimental results. The neural network yielded much better in-sample fitness than regression. The network produced results that were much better than regression. 3. Japanese S t o c k M a r k e t P r e d i c t i o n
Kimoto [21] discussed a buying and selling timing prediction system for TOPIX, using modular neural networks. TOPIX (Tokyo Stock Exchange Price Index) is a weighted average of the market prices of all stocks listed on the First Section of the Tokyo Stock Exchange and may be regarded as one of the Japanese equivalents of the Dow-Jones Index in the United States. Several modular neural networks learned the relationships between the past technical and economic
A
^
B
^
Neural Network
-• Y
(size: 3-32-16-1) C
N A, B, C: parameters extracted from the balance sheets of the companies in the universe of the U.K. stocks, Y: outperformance of each stock six months ahead.
FIGURE 5
Neural network for forecasting U.K. stock price [9].
181
NEURAL NETWORKS FOR ECONOMIC FORECASTING
TABLE 5 Experimental Results for Forecasting U.K. Stock Price [18] Multiple linear regression Training Root mean square 0.138
Neural network
Test
Training
Test
0.128
0.044
0.066
indexes and the timing for when to buy and sell. Figure 6 shows one neural network module. Each module is a three-layer neural network and has its own training data set. Kimoto [21] improved prediction accuracy by averaging prediction results of modular networks that learned for different learning data. Prediction was done for 33 months from January 1987 to September 1989. The TOPIX index of January 1987 was considered as 1.00. It was 1.67 by the "buyand-hold" strategy at the end of September 1989. The prediction system produced a profit index of 1.98. It was better than for the "buy-and-hold" strategy. Table 6 shows other experimental results, comparing with multiple regression analysis. Weekly learning data from January 1985 to September 1989 were used for modeUng, and the network learned 100,000 iterations. The neural network produced a much higher correlation coefficient in the above training data than did multiple regression. Baba [22] tried to make a forecast of Japanese stock price by using a neural network (Fig. 7) to which 15 variables were input. The objective of the neural network was to produce a value close to " 1 " when the stock price becomes high and to produce a value close to "0" when the stock price becomes low. The network was trained by the original backpropagation method and a hybrid algorithm that combines a modified backpropagation method with a random optimization method. Baba [22] reported that (1) the trained network produces outputs quite close to real values when the current trend is the same as that under which the network was trained; and (2) the trained network produces fuzzy outputs when the current trend is different from that under which the network was trained. Baba [22] suggested that one should (1) use two kinds of neural network (learned decreasing-trend and increasing-trend), (2) input the current data, (3) watch the outputs from both of the networks, and (4) decide buy, sell, stay, and so on.
VECTOR TURNOVEREXCHANGE-
Neural Net-work
-^
BUY/SELL
INTERESTmCTOR: Vector curve, EXCHANGE: foreign exchange rate, INTEREST, interest rate, BUY/SELL: TOPDC buying or selling decision. FIGURE 6
Neural network module for forecasting TOPIX buying/selling [21].
182
KAZUHIRO KOHARA TABLE 6
Experimental Results for Forecasting T O P I X [21]
Correlation coefficient between real values and predicted values
Multiple linear regression training data
Neural network training data
0.543
0.991
B. Bond Rating Prediction Dutta [23] applied neural networks to a generalization problem of predicting the ratings of corporate bonds, where conventional mathematical modeling techniques had yielded poor results and it was difficult to build rule-based artificial intelligence systems. Figures 8 and 9 show the neural prediction models for bond ratings. Dutta [23] selected ten financial variables for predicting bond ratings, concerning
x\
•
x2
•
jc3 •
x4 . ;c5 . x6
.
xl
.
x% . x9
Neural Network (size: 15-10-5-1)
-•
UP/DOWN
•
x\Q • xU •
xU . xl3 . :<:14 . jcl5
xl: final stock price, x2\ changes of the stock price (today), x2: turnover (today) / averages of the turnover in the last week, x4: turnover (today) / averages of the turnover in the last month, x5. changes of the turnover (today), x6. 50 - PER, x7: bank rate, x8: the highest stock price in this year - final stock price (today), x9. changes of the capital of stockholders, xlO: changes of the currency rate, xll: pattern of the changes of the stock price, xl2: changes of the profit of unit stock, xlS: anticipation of the capital increase, xl4: changes of Dow-Jones averages, xl5: (final stock price - the lowest stock price in the last three years) / (the highest price in the last three years - the lowest price in the last three years), UP/DOWN: 1 when the stock price becomes high, 0 when the stock price becomes low. FIGURE 7
Neural network for forecasting Japanese stock price [22].
183
NEURAL NETWORKS FOR ECONOMIC FORECASTING
VI VI
^
V3
Neural Network
V4
RatingAAA RatingAA RatingA
(size: 6-x-4)
V5
w
RatingBBB
V6 VI: liability / (cash + assets), V3: sales / net worth,
V2: debt proportion,
V4: profit / sales,
V5\ financial strength,
V6: earning / fixed costs.
AAA: The highest rating assigned. Capacity to pay interest and principal very strong. AA: Very strong capacity to pay interest and principal. Differfi-omhighest rated issues only in small degree. A: Strong capacity to repay interest and principal but may be susceptible to adverse changes in economic conditions. BBB: Adequate protection to repay interest and principal but more likely to have weakened capacity in periods of adverse economic conditions. FIGURE 8
Neural Network I for forecasting bond rating [23].
the influence of a variable on the bond rating and ease of availabihty of data. Bond ratings (AAA, AA, A, BBB) given by S&P (Standard & Poor's) were used. Tables 7 and 8 show the experimental results. Neural networks consistently outperformed the regression model in predicting bond ratings from the given set of financial ratios. Kwon [24] and Maher [25] also reported bond rating prediction using neural networks.
v\ V2 V3
V4
J
^^
H
NeuralNetwork Neural Network
^^
H
(size: 10-x^) lO-x-4)
V7
J
I
w •
I
•
RatingAAA ^^^^"^^
• ^ ^ ' " ^ '^ ^
RatingBBB
VS V9
no VI: liability / (cash + assets), V3: sales / net worth, V5:financialstrength,
V2: debt proportion,
V4: profit / sales, V6: earning / fixed costs,
V7: past 5-year revenue growth rate, V8: projected next 5-year revenue grovrth rate, V9: working capital / sales, VJO: subjectiveprospect of company. FIGURE 9
Neural Network 2 for forecasting bond rating [23].
184
KAZUHIRO KOHARA
TABLE 7 Experimental Results of the Neural Network I (Fig. 8) for Forcasting Bond Rating [23] Learning
Testing Neural net
Neural net
Correctness of prediction Errors in prediction''
2-layers
3-layers
Regression
2-layers
3-layers
Regression
80% 0.2365
80% 0.1753
63.33% 1.107
82.4% 0.198
76.5% 0.1939
64.7% 1.528
""The sum of the squares of the errors in prediction.
C. Electricity Load Forecasting Park [11] used neural networks to learn the relationship among past, current, and future temperatures and electricity loads for the Seattle/Tacoma area in the United States. Figures 10, 11, and 12 show the neural prediction model for peak load, total load, and hourly load. Table 9 shows the experimental results. The average absolute errors of the peak load, total load, and hourly load forecasting were shown to be 2.04%, 1.68%, and 1.40%, respectively. This compares quite favorably with errors of about 3% to 4% from a currently used forecasting technique applied to the same data. Caire [12] studied the use of neural networks to forecast electricity consumption in France, considering their possibility of including exogenous variables, data resulting more than one step ahead, and their possibility of changing the minimization criterion according to the economic conditions. In the first model, most of the variables which are correlated with the forecast consumption are introduced as input; this network is called the "maximum model." In the second model, only the most important variables are retained as input; this network is called the "minimum model." Caire [12] tried to reduce the number of maximum-model connections by using a minimization criterion [2]. This last model is called the "reduced maximum model." Figures 13 and 14 show the maximum model and the minimum model. The reduced maximum model has the same input as the maximum model. An algorithm based on weight elimination was used which reduced the number of connections by 30%. The purpose of the new criterion was to try to
TABLE 8 Experimental Results of the Neural Network 2 (Fig. 9) for Forcasting Bond Rating [23] Testing
Learning
Neural net
Neural net
Correctness of prediction Errors in prediction''
2-layers
3-layers
Regression
2-layers
3-layers
Regression
80% 0.2241
92.4% 0.0538
66.7% 0.924
88.3% 0.1638
82.4% 0.2278
64.7% 1.643
''The sum of the squares of the errors in prediction.
185
NEURAL NETWORKS FOR ECONOMIC FORECASTING
T\{t) Neural Network
T2(t)-
- • ^1(0
(size: 3-5-1)
n(t)/: day of predicted load, Tl(i): average temperature at day /, T2(t): peak temperature at day /, T3(t): lowest temperature at day t. Lift): peak load at day /.
FIGURE 10
Neural network for forecasting peak electricity load [I I].
n{t) Neural Network
-•
L2{t)
(size: 3-5-1) T3(t)t day of predicted load, Tift): average temperature at day /, T2(t): peak temperature at day /, T3(t): lowest temperature at day /, L2(t): total load at day t.
FIGURE I I
Neural network for forecasting total electricity load [II].
Lih-iy Tih-2)-
nh-\)-
Neural Network (size: 6-10-1)
-> L{h)
Tp{h)— h: hour of predicted load, L(x): load at hour X, T(x): temperature at hourx, Tp(x): predicted temperature for hour x.
FIGURE 12
Neural network for forecasting hourly electricity load [I I].
186
KAZUHIRO KOHARA
TABLE 9 Experimental Results for Forecasting Electricity Load [I I]
Error (%) of peak load forecasting Error (%) of total load forecasting Error (%) of hourly load forecasting Test data sets
Set I Set 2
Set 3
Set 4
Set 5
Average
1.73
2.40
1.55
2.60
1.91
2.04
1.78
1.07
3.39
1.15
1.03
1.68
1.35
1.39
1.29
1.36
1.64
1.40
Set Set Set Set Set
1: 2: 3: 4: 5:
01/23/89-01/30/89. 11/09/88-11/17/88, 11/18/88-11/29/88, 12/08/88-12/15/88, 12/27/88-01/04/89.
C(/)— C(/-l)C(r-34)Bl B2
31 -
n Neural Network T2
-
- > C(/+l)
(size: 134-3-1)
T72-
mN2 M2El E2 El C(t), C(t-l), ...,C(t-34): the last 35 electricity consumption values, C(t^l): electricity consumption value for the forecasting day, Bl, B2, .... B7: 7 boolean neurons for each day of the week, Tl, T2, .... T72: 72 temperature values,-i.e., the maxima and the minima for the six French cities for the last 5 days and the forecasting day, NJ, N2, ... NJ2: 12 values of nebulosity, i.e., the maxima and minima for the six cities for the forecasting day, EJ, E2. .... E7: the 7 last errors made by the network.
FIGURE 13
Neural Network I (maximum model) for forecasting electricity consumption [12].
187
NEURAL NETWORKS FOR ECONOMIC FORECASTING
C(0— C(/-2)C(r-3)C(/-4)51 B2 B3
Neural Network
B4
(size: 18-3-1)
- • C(/ + l)
B5 56 B7 Tas'l — Tav2 — Tav3 — Tav4 — Tav5 — C(f), C(t-l), ...,C(t-4): the last 5 electricity consumption values, Cft-^J): electricity consumption value for the forecasting day, BI, B2, ...,B7:1 boolean neurons for each day of the week, TcD'I, Ta\>2, .... Ta\>5: the last 5 temperatures averages.
FIGURE 14
Neural Network 2 (minimum model) for forecasting electricity consumption [12].
reduce to zero the connections not carrying data. Table 10 shows the experimental results. The minimum model was not efficient for generalizing. Both the maximum and the reduced maximum models are better than the ARIMA model. The reduced maximum model is the best one.
IV. HYBRID SYSTEMS A. Integration of Knowledge and Neural Networks Integration of knowledge and neural networks has been extensively investigated (e.g., [26-29]), because such integration holds great promise in solving compli-
TABLE 10
Experimental Results for Forecasting Electricity Consumption [12] 1986- 1989 (Training)
ARIMA model Maximum model Minimum model Reduced maximum model
990 (Test)
Relative error (%)
Error standard deviation (%)
Relative error (%)
Error standard deviation (%)
2.11 1.75 1.79 1.57
2.75 2.51 2.67 2.38
1.80 1.75 1.84 1.72
2.64 2.68 3.08 2.60
I I 88
KAZUHIRO KOHARA
cated real-world problems. One method is to insert prior knowledge into the initial network structure and refine it with learning by examples (e.g., [30, 31]). Another method is to represent prior knowledge in the form of error measurements for training neural networks [32]. The use of deterministic prior knowledge in foreign exchange trading was demonstrated in [33]. When prior knowledge is deterministic and can be represented as a rule in a rule-based system, the above methods are effective. When prior knowledge is nondeterministic and shows only tendencies, however, the above methods are not effective. In this section, our approach [34, 35] is described in detail as a case study of integration of knowledge and neural networks for multivariate prediction of stock markets. Kohara [35] investigated ways to use prior knowledge and network learning techniques to improve neural multivariate prediction ability. Daily stock prices were predicted as a complicated real-world problem, taking nonnumerical factors such as political and international events into account. Types of prior knowledge which are difficult to insert into initial network structures or to represent in the form of error measurements were studied. Kohara [35] made use of prior knowledge of stock-price predictions and newspaper information on domestic and foreign events. Event-knowledge was extracted from newspaper headlines according to prior knowledge. Several economic indicators were also chosen according to prior knowledge, and were input together with event-knowledge into neural networks. A learning technique that can learn about large changes more effectively and predict them more accurately was also investigated. Kohara [34] presented training data that correspond to large changes in the prediction-target time series more often than those corresponding to small changes. Another stopping criterion for financial predictions was also investigated. Network learning was stopped at the point having the maximum profit through experimental stock-trading. B. Prior Knowledge and Event-Knowledge I. Prior Knowledge The following types of prior knowledge (PK) for predicting Tokyo stock prices were considered. PKl. If the domestic political situation deteriorates, stock prices tend to decrease, and vice versa. PK2. If business prospects deteriorate, stock prices tend to decrease, and vice versa. PK3. If the international situation deteriorates, stock prices tend to decrease, and vice versa. PK4. If interest rates decrease, stock prices tend to increase, and vice versa. PK5. If the dollar-to-yen exchange rate decreases, stock prices tend to decrease, and vice versa. PK6. If the price of crude oil increases, stock prices tend to decrease, and vice versa. Prior knowledge PKl-3 involve nonnumerical factors, while PK4-6 involve numerical economic indicators. PKl-6 are nondeterministic knowledge and show some of the tendencies of stock-price prediction.
NEURAL NETWORKS FOR ECONOMIC FORECASTING
I I 89
2. Extracting Event-Knowledge from Newspaper Headlines Kohara [35] divided knowledge of the events that influence stock prices into two types: negative event-knowledge, for events which tend to reduce stock prices, and positive event-knowledge, for events which tend to raise them. Eventknowledge is nondeterministic knowledge showing only tendencies for stock-price prediction. When a headline indicates a bad pohtical situation, bad business prospect, or bad international situation, negative event-knowledge is extracted according to prior knowledge PKl-3. When a headline indicates a good political situation, good business prospect, or good international situation, positive event-knowledge is extracted. Otherwise, no event-knowledge is extracted. Here are some examples of headlines for negative event-knowledge (originally in Japanese): "House of Representatives to dissolve on 24th" (bad political situation); "Business prospects to deteriorate early in the New Year" (bad business prospect); and "Gulf war comes to a crisis" (bad international situation). Some examples of headlines for positive event-knowledge are: "The first Kaifu Cabinet starts" (good political situation); "Business profits increase 21% " (good business prospect); and "Cold war ends and new epoch begins" (good international situation). As the first step in the study using event-knowledge, Kohara [35] extracted event-knowledge manually from the daily headlines of a Japanese newspaper. The Nihon Keizai Shinbun. When a headhne such as "Cold war ends" appears, positive event-knowledge is extracted according to PK3, "If the international situation improves, stock prices tend to increase." Automatic event-knowledge extraction using event-dictionaries and priorknowledge rulebases is an area that needs further study. Kohara [35] described a future plan to extract event-knowledge automatically, using two event-dictionaries and a prior-knowledge rulebase. The first dictionary will comprise three eventcategories (international affairs, domestic politics, business prospects) and terms commonly used to describe them. The second dictionary will comprise two eventevaluation levels (good, bad) and the terms used for their expression. The rulebase will consist of prior knowledge of stock market predictions in an if-then rule format: 1. 2. 3. 4. 5. 6.
If If If If If If
(domestic politics = bad), then (event = negative-event). (business prospects = bad), then (event = negative-event). (international affairs = bad), then (event = negative-event). (domestic poHtics = good), then (event = positive-event). (business prospects = good), then (event = positive-event). (international affairs = good), then (event = positive-event).
Rules 1 and 4 correspond to PKl, Rules 2 and 5 correspond to PK2, and Rules 3 and 6 correspond to PK3. First, preliminary event-knowledge (e.g., domestic politics = bad) will be extracted by matching headlines against the two dictionaries. Second, event-knowledge will be extracted by applying the above if-then rules to the preliminary event-knowledge. C. Selective Presentation Learning for Forecasting In the conventional neural prediction approach, all training data are equally presented to neural networks (the number of presentations of all training data is the
I I 90
KAZUHIRO KOHARA
same) independent of the size of the changes in the prediction-target time series. Also, network learning is usually stopped at the point of minimal mean squared error between the network's outputs and the actual outputs. The following is an outline of the conventional learning (CVL) algorithm. CVL Algorithm [CVLl] Train backpropagation networks with equal presentation of all training data. [CVL2] Stop network learning at the point having the minimal mean squared error. Kohara [34] considered the following two problems with CVL. 1. Generally, the predictability of large changes is more important than that of small changes. When all training data are presented equally as in the conventional approach, we assume that neural networks learn small changes as correctly as large changes and cannot learn large changes more effectively. 2. The target function for forecasting problems is not always to minimize the average of prediction errors. In financial market predictions, maximizing profits through financial trading by using predicted values is more important than minimizing errors. Minimizing the average of prediction errors does not always correspond to maximizing profits. We [34, 35] thus investigated a learning technique that can learn about large changes more effectively and predict them more accurately. Another stopping criterion for financial predictions was also investigated. To allow neural networks to learn about large changes in prediction-target time series more effectively, the training data are separated into large-change data and small-change data. Large-change data (small-change data) have next-day changes that are larger (smaller) than a preset value. Large-change data are presented to neural networks more often than are small-change data. For example, all training data are presented in the first learning cycle, only large-change data are presented in the second cycle, and so forth. The outline of the selective presentation learning (SEL) algorithm is as follows. SEL Algorithm [SELl] Separate the training data into large-change data and small-change data. [SEL2] Train backpropagation networks with more presentations of large-change data than of small-change data. [SEL3] Stop network learning at the point satisfying a certain stopping criterion (e.g., stop at the point having the maximal profit). D. Stock Market Prediction I. Multiple T i m e Series
Several economic indicators are chosen according to prior knowledge, and are input together with event-knowledge into neural networks. The following five indicators were used as inputs to the neural network.
NEURAL NETWORKS FOR ECONOMIC FORECASTING
• • • • •
1191
TOPIX: the chief Tokyo stock exchange price index EXCHANGE: the dollar-to-yen exchange rate (yen/dollar) INTEREST: an interest rate (3-month CD, new issue, offered rates) (%) OIL: the price of crude oil (dollars^arrel) NY: New York Dow-Jones average of the closing prices of 30 industrial stocks (dollars)
TOPIX was the target of prediction. EXCHANGE, INTEREST, and OIL were chosen according to prior knowledge (PK4-6) regarding numerical economic indicators. The Dow-Jones average was used because Tokyo stock market prices are often influenced by NY Exchange prices. The information for the five indicators was also obtained from The Nihon Keizai Shinbun. The multivariate time series used in the experiments are shown in Figure 15. The time period is from August 1, 1989 to March 31, 1991. The Pearson correlation coefficients between TOPIX and other time series are shown in Table 11. Values of the Pearson correlation coefficients are between —1 and + 1 . EXCHANGE was positively correlated with TOPIX, as indicated by PK5. INTEREST and OIL were negatively correlated with TOPIX, as indicated by PK4 and PK6. NY was slightly positively correlated with TOPIX. 2. Neural Prediction Model
Kohara [35] supposed that tomorrow's change in TOPIX is determined by today's changes in the above five indicators and today's event-knowledge according to the prior knowledge PKl-6. Therefore, the daily changes in these five indicators [e.g., A TOPIX(0 = TOPIX(0 - TOPIX(f - 1)] and daily event-knowledge were input into neural networks, and the next day's change in TOPIX was presented to the neural network as the desired output. In the neural prediction model 1 (Fig. 16), the above five indicators were used as input. In the model 2 (Fig. 17), daily event-knowledge was additionally input. As the task was to predict how much TOPIX will rise and fall, the simple difference TOPIX(/)-TOPIX(/-1) was used as input and target data. Instead of the difference, the relative difference [i.e., (TOPIX(/) - TOPIX(/ - 1))/T0PIX(/ - 1)] or the natural logarithm of the relative variable [i.e., ln(TOPIX(/)/TOPIX(/- 1))] could have been considered [8]. "0" was given to negative event-knowledge, " 1 " to positive event-knowledge, and "0.5" to no event-knowledge. A feedforward neural network (FFN) was used in the experiments. A small network was chosen, because it was considered more likely to be generalizable for new time-series data than a larger network [4]. The number of hidden layers was one. The number of neurons in the hidden layer for the FFN was chosen so that it was not larger than the number of neurons in the input layer and gave the best prediction performance. The structure of the FFN which did not use eventknowledge was 5-5-1 (five neurons in the input layer, five in the hidden layer, and one in the output layer). The structure of the knowledge-using FFN was 6-6-1. 3. Experiments
Data from a total of 408 days were used (from August 1, 1989 to March 31, 1991): data from 300 days for training, data from 30 days for vahdation (making decisions on the stopping of network training), and data from 78 days for making
192
KAZUHIRO KOHARA
400
400
400
400
400
Days FIGURE 15
Multiple time series [34].
NEURAL NETWORKS FOR ECONOMIC FORECASTING
^ H
TABLE I I
I I 93
Correlation Coefficient with T O P I X [34] Correlation coefficient
Dollar-to-yen exchange rate (EXCHANGE) Interest rate (INTEREST) Crude oil price (OIL) NY stock market prices (NY)
0.510 -0.830 -0.613 0.251
predictions. In experiments 1 and 2, all training data were presented equally. In experiments 3 and 4, large-change data were presented five times as often as smallchange data. Here, the large-change threshold was 14.78 points (about US$1.40), which was the median of TOPIX daily changes in the training data. In experiments 1 and 3, only today's changes in the five indicators were input. In experiments 2 and 4, today's event-knowledge was also input. In each experiment, two stopping criteria were examined: minimal error (the network learning was stopped at the point having the minimal mean prediction error for the validation data) and maximum profit (the learning was stopped at the point having the maximum profit for the validation data during 8000 learning cycles). The prediction error and profit were monitored after every 100 learning cycles. When a large change of TOPIX was predicted, we tried to calculate "Profit" as follows: when the predicted direction was the same as the actual direction, the daily change in TOPIX was earned, and when the predicted direction was different from the actual direction, the daily change in TOPIX was lost. This calculation of profit corresponds to the following experimental TOPIX trading system. A buy (or sell) signal is issued when the predicted next-day's up (or down) in TOPIX is larger than a preset value which corresponds to a large change. When a buy (or sell) signal is issued, the system buys (or sells) the shares of TOPIX at the current price and subsequently sells (or buys) the shares back at the next-day price. Transaction costs on the trades were ignored in calculating the profit. The more accurately a large change is predicted, the larger the profit becomes. In the experiments, the learning parameter was 0.7 and the momentum parameter was also 0.7. All the weights and biases in the neural network were initialized randomly between —0.3 and 0.3. In each experiment, the neural network was run four times for the same training data with different initial weights and the average was taken. These results are shown in Table 12, where "Prediction error on large-change test data" is the mean absolute value of the prediction error on the test data exhibiting a large change. Using maximum profit as the stopping criterion reduced the
ATOPIX(t) AEXCHANGE{t) AINTEREST(t)— AOIL(t)
Neural Network (size: 5-5-1)
-^MOPIX{t+l)
ANY(t) FIGURE 16
Neural Network I for forecasting Tokyo stock market [34].
1194
KAZUHIRO KOHARA
ATOPIX{t) AEXCHANGE(t) MNTEREST(t)
Neural Network (size: 6-6-1)
-
AOIL(t)
->
ATOPIX{t+l)
ANY(t) Event - Knowledge{t) F I G U R E 17
Neural Network 2 for forecasting Tokyo stock market [35].
prediction error on large-change data and improved profit. Applying selective presentation learning further reduced the prediction error on large-change data and improved profit. Using selective presentation learning and the maximum-profit stopping criterion, the prediction error on large-change data was reduced by 11% and the profit was improved by 67% (in the case of no event-knowledge) to 81% (in the case of event-knowledge).
Y. RECURRENT NEURAL NETWORKS Kamijo [36] applied "simple recurrent neural networks" [37-39] to the recognition of stock-price patterns. In stock trading, triangle patterns indicate an important clue to the trend of future change in stock prices, but the patterns are not clearly defined by a rule-based approach. Sixteen triangles were extracted from stock-price data for all names of corporations listed in the First Section of the Tokyo Stock Exchange. Fifteen triangle patterns were used as training patterns and one triangle pattern was used as a test pattern. Figure 18 shows the recurrent neural network model for stock-price pattern recognition. All the initial values for the context layer were zero, and the network training was iterated 2000 times. The experiments revealed that the given test triangle was accurately recognized in 15 out of 16 experiments, and that the number of the mismatching patterns was 1.06 per name on the average.
TABLE 12
Experimental Results for Forecasting Tokyo Stock Market [35]
Using event-knowledge Presentation method Stopping criterion Prediction error for largechange data Profit on test data
Experimental
Experimental 2 Experimental 3 Experimental 4
no equal minimal error 23.9
yes equal minimal error 22.9
no selective maximum profit 21.3
yes selective maximum profit 20.5
329
410
550
743
195
NEURAL NETWORKS FOR ECONOMIC FORECASTING
V{t) u{t) D(t}
^^ ^p ^w
OJ
J
^ c^
!
T3
'3
3 O, 3
^
- • D(t+1)
o
copy
V(t)=(A(t)-A(t-l))/A(t-l). U(t)={A(t)-H(t))IA{t), D(t)=(A(t)-.L(t))IA(t), A{t): average price, H{t): high price, L(t): low price, 5(^+1): 0.5 during a period ranging from the beginning of the triangle pattern to the appearance of the earliest peak, otherwise 0, Unit numbers for the first hidden layer and the context layer = 64, Unit numbers for the second hidden layer = 24. FIGURE 18
Recurrent neural network for stock-price pattern recognition [36].
Yl. SUMMARY This chapter has described various neural network approaches for economic forecasting problems: stock market prediction, bond rating prediction, and electricityload forecasting. The results show that (1) multivariate prediction results were better than those from univariate time-series forecasting, (2) using knowledge improved prediction results, (3) learning techniques such as weight elimination and selective presentation improved the network performance, (4) recurrent networks were also effective. Ways to integrate statistical analysis and neural networks is an important area for further study.
REFERENCES 1. Rumelhart, D., Hinton, G., and Williams, R. Learning internal Representations by error propagation. In Parallel Distributed Processing (D. Rumelhart, J. McClelland, and the PDP Research Group, Eds.), Vol. 1. MIT Press, Cambridge, MA, 1986. 2. Weigend, A., Huberman, B., and Rumelhart, D. Predicting the future: A connectionist approach. Int. J. Neural Syst. 1(3): 193-209. 3. Weigend, A. and Gershenfeld, N., Eds. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, Reading, MA, 1993. 4. Yemuri, V. and Rogers, R., Eds. Artificial Neural Networks: Forecasting Time Series, IEEE Computer Society Press, New York, 1994.
I 96
KAZUHIRO KOHARA 5. Pham, D. and Liu, X. Neural Networks for Identification, Prediction and Control. Springer, New York, 1995. 6. Hunt, K., Irwin, G., and Warwick, K., Eds. Neural Network Engineering in Dynamic Control Systems, Springer, 1995 7. Kil, D. and Shin, F. Pattern Recognition and Prediction with Applications to Signal Characterization. American Institute of Physics Press, 1996. 8. Azoff, E. Neural Network Time Series Forecasting of Financial Markets. Wiley, New York, 1994. 9. Refenes, A. and Azema-Barac, M. Neural network apphcations in financial asset management. Neural Comput. Appl. 2(l):13-39, 1994. 10. Goonatilake, S. and Treleaven, P., Eds. Intelligent Systems for Finance and Business. Wiley, New York, 1995. 11. Park, D., El-Sharkawi, M., Marks II, R., Atlas, L., and Damborg, M. Electric load forecasting using an artificial neural network. IEEE Trans, on Power Syst. 6(2):442-449, 1991. 12. Caire, P., Hatabian, G., and MuUer, C. Progress in forecasting by neural networks. In Proceedings of International Joint Conference on Neural Networks, 1992, pp. 11-540-11-545. 13. Chakraborty, K., Mehrotra, K., Mohan, C , and Ranka, S. Forecasting the behavior of multivariate time series using neural networks. In Neural Networks, Vol. 5, pp. 961-970. Pergamon, Elmsford, NY, 1992. 14. White, H. Economic prediction using neural networks: The case of IBM daily stock return. In Proceedings of International Conference on Neural Networks, 1988, pp. 11-451-11-458. 15. Tang, Z., Almeida, C , and Fishwick, P. Time series forecasting using neural networks vs. BoxJenkins methodology. Simulation 57(5):303-310, 1991. 16. Box, G. and Jenkins, G. Time Series Analysis: Forecasting and Control, revised ed. Prentice-Hall, Englewood Cliffs, NJ, 1976. 17. Freisleben, B. Stock market prediction with backpropagation networks. In Lecture Notes on Computer Science, Vol. 604, pp. 451^60. Springer-Verlag, Heidelberg, 1992. 18. Refenes, A., Zapranis, A., and Francis, G. Stock performance modeUng using neural networks: A comparative study with regression models. Neural Networks 7(2):375-388, 1994. 19. Ross, R. L. and Ross, F. An empirical investigation of the arbitrage pricing theory. /. Finance 44:1-18, 1990. 20. Sharpe, W. Capital asset prices: A theory of market equilibrium under conditions of risk. J. Finance 19:425-^42, 1964. 21. Kimoto, T, Asakawa, K., Yoda, M., and Takeoka, M. Stock market prediction with modular neural networks. In Proceedings of International Joint Conference on Neural Networks, 1990, pp. I-1-I-6. 22. Baba, N. and Kozaki, M. An intelhgent forecasting system of stock price using neural networks. In Proceedings of International Joint Conference on Neural Networks, 1992, pp. I-371-I-377. 23. Dutta, S. and Shekhar, S. Bond rating: A non-conservative application of neural networks. In Proceedings of International Conference on Neural Networks, 1988, pp. II-443-II-450. 24. Kwon, Y, Han, I., and Lee, K. Ordinal pairwise partitioning (OPP) approach to neural networks training in bond rating. Int. J. Intell. Syst. Account. Finance Manage. 6(1):23^0, 1997. 25. Maher, J. and Sen, T. Predicting bond ratings using neural networks: A comparison with logistic regression. Int. J. Intell. Syst. Account. Finance Manage. 6(l):59-72, 1997. 26. Fu, Li. Knowledge-based connectionism for revising domain theories. IEEE Trans. Syst., Man, Cybern. 23(1):173-182, 1993. 27. Khosla, R. and Dillon, T. Learning knowledge and strategy of a generic neuro-expert system architecture. In Proceedings of International Symposium on Integrating Knowledge and Neural Heuristics, 1994, pp. 103-112. 28. Goonatilake, S. and Khebbal, S., Eds. Intelligent Hybrid Systems. Wiley, New York, 1995. 29. Sun, R. and Bookman, L., Eds. Computational Architectures Integrating Symbolic and Connectionist Processing. Kluwer Academic PubHshers, Dordrecht, 1997. 30. Towell, G., Shavlik, J., and Noordewier, M. Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of National Conference on Artificial Intelligence, 1990, pp. 861-866. 31. Giles, C. and Omlin, C. Rule refinement with recurrent neural networks. In Proceedings of International Conference on Neural Networks, 1993, pp. 801-806.
NEURAL NETWORKS FOR ECONOMIC FORECASTING
I I 97
32. Abu-Mostafa, Y. A method for learning from hints. In Advances in Neural Information Processing Systems (S. Hanson, J. Moody, S. Hanson, and R. Lippman, Eds.), Vol. 5. Morgan Kaufmann, Los Altos, CA, 1993. 33. Abu-Mostafa, Y. Financial market appUcations of learning from hints. In Neural Networks in the Capital Markets Wiley, (A. Refenes, Ed.). New York, 1995. 34. Kohara, K. Selective presentation learning for forecasting by neural networks. In Proceedings of International Workshop on Applications of Neural Networks to Telecommunications, 1995, pp. 316-323. 35. Kohara, K. Neural multivariate prediction using event-knowledge and selective presentation learning. In Proceedings of International Conference on Neural Networks, 1995, pp. 359-364. 36. Kamijo, K. and Tanigawa, T. Stock price pattern recognition—A recurrent neural network approach. In Proceedings of International Joint Conference on Neural Networks, 1990, pp. 1-215-1-221. 37. Elman, J. Finding structure in time. CRL Technical Report 8801, Center for Research in Language, University of Cahfomia, San Diego, 1988. 38. Bartell, B. and Cottrell, G. A model of symbol grounding in a temporal environment. In Proceedings of International Joint Conference on Neural Networks, 1991, pp. I-805-I-810. 39. Ghahramani, Z. and Allen, R. Temporal processing with connectionist networks. In Proceedings of International Joint Conference on Neural Networks, 1991, pp. 11-541-II-546.