Predicting the direction of stock markets using optimized neural networks with Google Trends

Predicting the direction of stock markets using optimized neural networks with Google Trends

ARTICLE IN PRESS JID: NEUCOM [m5G;January 30, 2018;13:42] Neurocomputing 0 0 0 (2018) 1–8 Contents lists available at ScienceDirect Neurocomputin...

712KB Sizes 0 Downloads 39 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 30, 2018;13:42]

Neurocomputing 0 0 0 (2018) 1–8

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Brief papers

Predicting the direction of stock markets using optimized neural networks with Google Trends Hongping Hu a, Li Tang b, Shuhua Zhang b, Haiyan Wang c,∗ a

School of Science, North University of China, Taiyuan, Shanxi 030051, China Coordinated Innovation Center for Computable Modeling in Management Science, Tianjin University of Finance and Economics, Tianjin 300222, China c School of Mathematical & Natural Sciences, Arizona State University, Phoenix, AZ 85069-7100, USA b

a r t i c l e

i n f o

Article history: Received 11 July 2017 Revised 4 January 2018 Accepted 15 January 2018 Available online xxx Communicated by Wei Chiang Hong Keywords: Back propagation neural network Sine cosine algorithm Google Trends Stock price

a b s t r a c t The stock market is affected by many factors, such as political events, general economic conditions, and traders’ expectations. Predicting the direction of stock markets movement has been one of the most widely investigated and challenging problems for investors and researchers as well. Many researchers focus on stock market analysis using advanced knowledge of mathematics, computer sciences, economics and many other disciplines. In this paper, we present an improved sine cosine algorithm (ISCA), which introduces an additional parameter into the sine cosine algorithm (SCA), to optimize the weights and basis of back propagation neural networks (BPNN). Thus, ISCA and BPNN are combined to create a new network, ISCA-BPNN, for predicting the directions of the opening stock prices for the S&P 500 and Dow Jones Industrial Average Indices, respectively. In addition, Google Trends data are taken into consideration for improving stock prediction. We analyze two types of prediction: Type I is the prediction without Google Trends and Type II is the prediction with Google Trends. The predictability of stock price direction is verified by using the hybrid ISCA-BPNN model. The experimental results indicate that ISCA–BPNN outperforms BPNN, GWO-BPNN, PSO-BPNN, WOA-BPNN and SCA-BPNN in terms of predicting the direction of the opening price for both types and significantly for Type II. The hit ratios for ISCA-BPNN with Google Trends reach 86.81% for the S&P 500 Index, and 88.98% for the Dow Jones Industrial Average Index. Our results show that Google Trends can help in predicting the direction of the stock market index. © 2018 Elsevier B.V. All rights reserved.

1. Introduction The stock market is one of the most vital components of current economy. Many people always attempt to predict stock prices including the opening prices, the highest prices, the lowest prices, the closing prices and the trading volumes of their favorite stocks. Accurately predicting the stock prices can provide investors more opportunities of gaining profit in the stock exchange [1]. But it is a difficult issue to predict stock prices. Under all kinds of influences such as economic environment, political policy, industrial development, market news, and natural factors, the stocks are dynamic and exhibit wide variation, and the prediction of the stock market thus becomes a highly challenging task because of the highly nonlinear nature and complex dimensionality [2,3]. In recent years, many techniques and various models have been researched and applied to predict the stocks. These methods in-



Corresponding author. E-mail address: [email protected] (H. Wang).

clude artificial neural networks (ANNs) [1,4–18], the support vector machines (SVMs) [16,19,20], the autoregressive integrated moving average (ARIMA) [6,19], the adaptive exponential smoothing [5], prospect theory [21], and multiple regression [22,23]. ANNs are nonlinear networks and have advantages of selforganizing, data-driven, self-study, self-adaptive, and associated memory similarly to human behaviors and have conducted the classification, prediction, and pattern recognition. In addition, ANNs can learn and obtain hidden functional relationships. Researchers have utilized many ANNs to predict financial time series. They include back propagation neural networks (BPNN) [1,4,10,16], elman recurrent neural network [11,24], radial basis function neural networks [12], generalized regression neural networks [13], wavelet neural networks [10,14], and so on. Unfortunately, the functional relationships between the given data and the outputs are unknown or difficult to identify. In order to optimize the parameters of ANNs and making ANNs stable, algorithms based on population such as particle swarm optimization algorithm [15], artificial fish swarm algorithm [12], artificial bee colony algorithm (ABC) [9], fruit fly op-

https://doi.org/10.1016/j.neucom.2018.01.038 0925-2312/© 2018 Elsevier B.V. All rights reserved.

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8

timization algorithm [25], sine cosine algorithm (SCA) [26], and evolutionary algorithm, genetic algorithm (GA) [1,4,18,27], have been proposed. A number of the methods above have been combined to create new models for predicting stock trends or the stock prices. For example, Qiu et al. [1] optimized the ANN model using GA to determine the optimal set of weights and biases for predicting the direction of the next days’ price of the Japanese stock market index. Zahedi and Rounaghi [7] applied ANN and the principal component analysis method to predict stock price on Tehran Stock Exchange. The method can accurately predict and identify effective factors in stock price by using real values. Kumar Chandar et al. [10] used the discrete wavelet transform to decompose the financial time series data as input variables of BPNN to forecast future stock prices. Pai and Lin [19] proposed that a hybrid model of ARIMA and SVMs greatly improves the prediction performance of the single ARIMA model or the single SVMs model in forecasting stock prices. Enkea et al. [22] used a fuzzy neural network with fuzzy clustering and multiple regression analysis for stock market prediction. The search engine, Google, provides access to aggregated information on the volume of queries for different search terms and how these volumes change over time, via the publicly available service Google Trends, which is an online search tool that allows the user to see how often specific keywords, subjects and phrases have been queried over a specific period of time. From Google Trends, people can observe similar patterns of stock spikes due to popularity of Internet. Not only does the data reflect that stock changes, but, that data can be used to anticipate certain future trends [28]. Hamid and Heiden [29] introduced an economically motivated model for using Google search frequency data to forecast volatility, based on the concept of empirical similarity (ES), and the highlight that the forecasting performance is indeed driven by the use of Google Trends data in combination with the ES framework. In this paper, we present an improved sine cosine algorithm (ISCA) to predict the directions of the opening stock prices for the S&P 500 and Dow Jones Industrial Average (DJIA) Indices, respectively. We introduce an additional parameter into the sine cosine algorithm (SCA), to optimize the weights and basis of back propagation neural network (BPNN). In addition, Google Trends data are taken into consideration for enhancing stock prediction. We analyze two types of prediction: Type I is the prediction without Google Trends and Type II is the prediction with Google Trends. The predictability of stock price direction is verified by using the hybrid ISCA-BPNN model. The experimental results indicate that ISCA-BPNN outperforms BPNN, GWO-BPNN, PSO-BPNN, WOABPNN and SCA-BPNN in terms of predicting the direction of the opening stock price for both types and significantly for Type II. The hit ratios for ISCA-BPNN with Google Trends reach 86.81% for the S&P 500 Index and 88.98% for the DJIA Index, the highest among the models. Our results show that Google Trends can help in more accurately predicting the direction of the stock prices. Therefore, we improve the performance of optimizing artificial neural network (ANN) by Google Trends and an improved sine cosine algorithm based on population. The rest of the paper is organized as followings. Section 2 introduces the related works. Section 3 proposes the improved sine cosine algorithm (ISCA) and analyzes the parameters in ISCA. We combine ISCA with the back propagation neural network (BPNN) to create ISCA-BPNN for predicting the direction of stock markets index. Section 4 presents experimental results of ISCA-BPNN based on data from Yahoo Finances and compare with other models to demonstrate the advantage of ISCA-BPNN. In addition, Google Trends data are taken into consideration for making a better stock prediction. The paper concludes with some discussions in Section 5.

2. Related works There is a rich literature on prediction of stock markets with machine learning techniques. Here, we only choose to discuss a number of related works. One could find more in the references therein. In addition to a number of works we discuss in Section 2, Chong et al. [30] recently presented a systematic analysis of the applications of deep learning networks for stock market prediction and used deep learning networks for prediction of Korea KOSPI 38 stock returns, where four measures: normalized mean squared error (NMSE), root mean squared error (RMSE), mean absolute error (MAE), and mutual information (MI) were evaluated for prediction performance. Their results were compared to some previous results for a number of well known stock indices including the US S&P 500 [31], Korea KOSPI200 [32], US Dow Jones [33], India CNX and BSE [34], Taiwan TAIEX and US NASDAQ [35], World 22 stock market [36], Greece ASE general [37], Japan Nikkei 225 [1,18], US Apple stock [38], US SPDR S&P 500 ETF (SPY) [39], and Korea KOSPI 38 stock returns [30] Indices. Their results demonstrates that deep learning networks can be effectively used for stock predictions. Qiu and Song [1] utilized GA to optimize BPNN for prediction of Japanese Nikkei 225 Index and used hit ratios, which are defined as the percentage of trials when the predicted direction is correct, as a criteria for predicting the direction of the stock index. They compared two basic types of input variables to predict the direction of the daily stock market index and concluded that the Type II input variables can generate a higher forecast accuracy by the use of the proposed GA-BPNN. Qiu et al. [18] also selected the input variables for BPNN for prediction of Japanese Nikkei 225 Index, and employed GA and simulated annealing (SA) to improve the prediction accuracy of BPNN and overcome the local convergence performance of BPNN. Mean Square Error (MSE) was used as a criteria for predicting the price of the stock index in [18]. Chen and Hao [40] utilized the weighted support vector machine to identify the weights and then used the feature of weighted K-nearest neighbor to effectively predict stock market indices on two well known Chinese stock market indices: Shanghai and Shenzhen stock exchange Indices. MAPE and RMSE were employed to verify the performance of the models. Zhang et al. [41] proposed a new approach named status box method and used machine learning techniques to classify the boxes. Then they constructed a new ensemble method integrated with AdaBoost algorithm, probabilistic support vector machine (PSVM), and GA to perform the status boxes classification. They focused on Shenzhen Stock Exchange (SZSE) and National Association of Securities Dealers Automated Quotations (NASDAQ) for predicting the stock direction. Moghaddam et al. [42] applied several feed forward ANNs with different nodes in one hidden layer or 2 hidden layers to predict NASDAQ Index based on two input datasets (four prior days and nine prior days). They constructed a different transfer function of BPNN in hidden layers and employed the values of R2 as a criteria of models. In addition, de Oliveira et al. [43] built a neural model to predict the stock closing prices in the short term. On the other hand, The authors in Refs. [16,20,27,44] studied prediction of the stock indices in the long term and the hit ratios were employed as a criteria for measuring accuracy. Lahmiri [45–47] combined multiresolution technique with ANN and SVR to perform training and prediction. Specifically, Lahmiri [45] used the multiresolution analysis techniques to decompose interest rates, and utilized feedforward neural network with PSO to perform prediction of interest rates. In addition, various other methods have been used for stock prediction. For example, while variational mode decomposition (VMD) has often been used as an advanced multiresolution technique for signal processing, Lahmiri [46] proposed a VMD-based generalized regression neural network

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8

3

to perform the prediction of California electricity and Brent crude oil prices. Lahmiri [47] used the singular spectrum analysis (SSA) to decompose stock price time series into a small number of independent components to help predictions, and implemented the support vector regression (SVR) coupled with particle swarm optimization (PSO) for stock forecasting.

where  yts and yts are the actual value and the prediction value of the sth output neuron in the t th training sample, and Q is the number of training samples, while r is the number of output neurons. Implementation of ISCA-BPNN model is described through the flow chart as Fig. 1.

3. Predictive model

4. Numerical simulations

3.1. Improved sine cosine algorithm

4.1. Data

Mirjalili [26] proposed a new population-based optimization algorithm called sine cosine algorithm (SCA) for optimization problems. SCA uses a mathematical model based on sine and cosine functions to produce multiple initial random candidate solutions and require them to fluctuate outwards or towards the best solution. In this study, we introduce the parameter w into the SCA, changing the name to the improved sine cosine algorithm (ISCA). In ISCA, the position updating equations are given:

In this paper, we study two kinds of stock index in USA : the Standard & Poor’s 50 0 (S&P 50 0) and Dow Jones Industrial Average (DJIA) Indices for 1877 trading days chosen from Yahoo Finances, from January 1, 2010 to June 16, 2017, and choose these data sets as our data sources, respectively. We also choose two specific key words “S&P 500” and “DJIA” to obtain the Google Trends data over the corresponding period for predicting the directions of the S&P 500 and DJIA Indices, respectively. Before we state our prediction algorithm, we need to process the S&P 500 and DJIA Indices data, and the corresponding Google Trends data in such a way that they are in the interval [−1,1], according to the following formula:

Xit+1 = w × Xit + r1 × sin(r2 ) × |r3 Pit − Xit |,

(1)

Xit+1 = w × Xit + r1 × cos(r2 ) × |r3 Pit − Xit |,

(2)

Xit

where is the position of the current solution in ith dimension at tth iteration, r1 , r2 , and r3 are three random numbers, Pit is position of the destination point in ith dimension, and | | indicates the absolute value, and w is the parameter. Obviously, ISCA becomes SCA when w = 1. We combine Eqs. (1) and (2) as follows:



Xit+1

=

w × Xit + r1 × sin(r2 ) × |r3 Pit − Xit |, w × Xit + r1 × cos(r2 ) × |r3 Pit − Xit |,

r4 < 0.5 r4 ≥ 0.5

(3)

where r4 is a random number in [0, 1]. In principle, ISCA is able to determine the global optimum of optimization problems as SCA in [26]. There are four main parameters in SCA: r1 , r2 , r3 , and r4 , which have the expressions similar to those for SCA, For convenience, a random number for r2 given in [0, 2π ] and

r1 = a −

at , T

(4)

as for SCA, where t is the current iteration, T is the maximum number of iterations, and a is a constant 2. 3.2. The proposed model In this section, we propose the ISCA algorithm to determine the optimal parameters of Back propagation neural network(BPNN), and create ISCA-BPNN network models. In the ISCA, a total of p search agents are allowed to determine the global optimum over T iterations. Every search agent L is a vector to be optimized, which represents BPNN’s parameters: connection weights w0ji , w1s j (i = 1, 2, . . . , m; j = 1, 2, . . . , n; s = 1, 2, . . . , r ) and biases 

b j ( j = 1, 2, . . . , n ), bs (s = 1, 2, . . . , r ). Therefore the dimension D of each agent L in ISCA-BPNN is D = m × n + n + n × r + r. We map each agent, L, to BPNN and obtain the prediction output. The fitness function of ISCA-BPNN is defined in terms of the mean square error (MSE) between the prediction output values and the actual values in the network training process. Thereby, we can minimize the fitness value of the network by the powerfully search performance of the ISCA-BPNN algorithm. The fitness function is denoted by formula

f =e=

Q r 1  (yts −  yts )2 , Q

(5)

t=1 s=1

y = ymin + (ymax − ymin )

x − xmin , xmax − xmin

(6)

where x is the original data before normalization, xmin and xmax are the minimum value and the maximum value before x is normalized, respectively, y is the data after normalization, ymin and ymax are the minimum value and the maximum value after x is normalized, respectively. In this paper, ymin = −1 and ymax = 1. To assess the prediction performance, the prediction performance hit ratio, which denotes the percentage of trials when the predicted direction was correct, is evaluated using the following equation:

hit ratio =

s 1 Pi , s

(7)

(yi+1 − yi )( yi+1 −  yi ) > 0, otherwise.

(8)

i=1

 Pi =

1, 0,

where Pi is the prediction result for the ith trading day, which is defined by Eq. (8). The variable yi denotes the actual value of the opening price for the ith trading day, and yi is the predicted value for the ith trading day. The variable s denotes the number of the testing sample. 4.2. Models In this study, we use two methods to predict the direction of the opening prices of the S&P 500 and DJIA Indices, respectively. One method is to utilize the opening price, the highest price, the lowest price, the closing price, and the trading volume of the current trading day without considering Google Trends data to predict the opening price of the next day, which is named as Type I. Another method is to utilize the opening price, the highest price, the lowest price, the closing price, and the trading volume, Google Trends data of the current trading day to predict the opening price of the next day, which is named as Type II. In order to ensure the prediction accuracy of the ISCA-BPNN model, we choose 1276 sets of data from Jan. 1, 2010 to Jan. 28, 2015 obtained from the S&P 500 or DJIA Index as the training data and the remaining 600 sets of data Jan. 29, 2015 to Jun. 16, 2017 as testing data for Type I; and we choose 1276 sets of data from

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

JID: NEUCOM 4

ARTICLE IN PRESS

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8

Determine the fitness function f(x) according to formula (5) and the structure of BPNN

Initialize the relative parameters, including the size of search agents, SN, the lower boundary, lb, and the upper boundary, ub, of each agent, the maximum iterative steps, T, and the constant, a. Let t=1, .

Initialize the agents according to lb and ub. Map the agents to the parameters of BPNN; and calculate the fitness functions. Let destination_fitness be taken the minimum fitness value and Destination_position be taken the corresponding agent.

t=t+1

N t<=T Y Give the parameter w in ISCA. Take r1, r2, r3 and r4 be the random number.

According to formula (3), update the agents and then revise these agents by use of lb and ub.

Map the agents to the parameters of BPNN; and calculate the fitness functions. If destination_fitness is more than the minimum fitness value, then destination_fitnes takes the minimum fitness value and Destination_position is taken the corresponding algent.

Map Destination_position to the parameters of BPNN. Input training data, and train BPNN. Use the optimal structure of BPNN to perform prediction problem. Fig. 1. The flow chart of ISCA-BPNN.

Jan. 1, 2010 to Jan. 28, 2015 obtained from the S&P 500 or DJIA Index and the corresponding Google Trends data as the training data, and the remaining 600 sets of data from Jan. 29, 2015 to Jun. 16, 2017 as testing data for Type II. For Type I, the sets of data have five input features: the opening price, the closing price, the highest price, the lowest price, and the trading volume. For Type II, the sets of data have six input features: the opening price, the closing price, the highest price, the lowest price, the trading volume, and Google trends data. We set the opening price of the next day as the output for Type I and Type II. For example, Table 1 lists the the opening price, the highest price, the lowest price, the closing price, and the trading volume, Google Trends data of two days on Sep. 13, 2012 and Sep. l4, 2012 of DJIA. The vector (13329.70996, 13573.33008, 13325.11035, 13539.86035, 151770 0 0 0) is one input for Type I and the vector (13329.70996, 13573.33008, 13325.11035, 13539.86035, 151770 0 0 0,

13) is one input for Type II. The opening price 13540.40039 on Sep. 14, 2012 is the objective output of the above two vectors on Sep. 13, 2012. We build a BPNN with m–n–r structure which are composed of m-dimensional input, n nodes in the hidden layer and rdimensional output, where m = 5, r = 1 for Type I and m = 6, r = 1 for Type II. After many experiments, we take n = 10 , 50 0 0 iterations, the momentum constant 0.95, and the value of learning rate 0.002 for Type I and Type II. We also take GWO, PSO, and WOA to optimize the parameters of BPNN and obtain models: GWO-BPNN, PSO-BPNN and WOA-BPNN, respectively. Every model runs 10 times. The parameters of the BPNN parts of ISCA-BPNN, SCA-BPNN, GWO-BPNN, PSO-BPNN and WOA-BPNN were set as the same as those of BPNN. The population size and iteration of ISCA, SCA, GWO, PSO and WOA are 30 and 500, respectively. In particular, the inertia weight in PSO algorithm is the same as the

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

ARTICLE IN PRESS

JID: NEUCOM

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8

5

Table 1 Input data of neural networks. Date

Opening price

Highest price

Lowest price

Closing price

Trading volume

Google Trends

Sep. 13, 2012 Sep. 14, 2012

13329.70996 13540.40039

13573.33008 13653.24023

13325.11035 13533.94043

13539.86035 13593.37012

151,770,0 0 0 185,160,0 0 0

13 25

Type II

Type I 0.88 (0.40,1.60,86.64%)

0.95

0.87 0.86

(0.10,1.20,86.64%)

0.9

hit ratio

hit ratio

0.85 0.84

0.85

0.8

0.83 0.82

0.75

0.81 0.8 0

0.2

0.4

2

0.7 0

2

0.2

1.5

0.6

0.8

1

1.5

0.4

0.6

1

0.8

c2 c1

1

1 c2

c1 Fig. 2. The determination of c1 and c2 in ISCA-BPNN.

Type II

Type I 0.87

0.88

0.868

0.87

(10.00,86.64%)

(10.00,86.64%)

0.866 0.86

0.862

hit ratio

hit ratio

0.864

0.86

0.85

0.84

0.858 0.83 0.856 0.82

0.854 0.852

2

4

6 8 10 12 14 16 the number of nodes in the hidden layer

18

20

0.81

2

4

6 8 10 12 14 16 the number of nodes in the hidden layer

18

20

Fig. 3. The determination of s in the hidden layer in ISCA-BPNN.

parameter w in ISCA. The goal is to obtain the average hit ratio and the best hit ratio. 4.3. Analysis of parameter w in ISCA In ISCA, the parameter w can be chosen as follows:



w=

1−

 t c1  T

1 + c2

 t c2  T

;

(9)

where T is the maximum number of iterations, c1 ∈ [0, 1] and c2 ∈ [1, 2] are the positive real numbers. We choose the S&P 500 Index data to determine the optimal values of c1 and c2 in ISCA-BPNN and then the same values for the DJIA Index. We take c1 from 0 to 1 with step 0.1 and c2 from 1 to 2 with step 0.1, then we perform only once ISCA-BPNN, where

the number of the nodes in the hidden layer is 10, and obtain 121 hit ratios, shown in Fig. 2. In Fig. 2, we obtain c1 = 0.4 and c2 = 1.6 with the maximal hit ratio 86.64% for Type I and c1 = 0.1 and c2 = 1.2 with the maximal hit ratio 86.64% for Type II. Finally, on the basis of the above c1 and c2 , we perform ISCA-BPNN with the s nodes from 3 to 20 in the hidden layer one time and obtain s = 10 with the maximal hit ratio 86.64% for Type I and Type II, shown in Fig. 3.

4.4. Numerical results For convenience for comparison, we call models ERNN, GWOBPNN, PSO-BPNN, WOA-BPNN, SCA-BPNN and ISCA-BPNN as model 1, model 2, model 3, model 4 and model 5, model 6, respectively.

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8 Table 2 The hit ratios (%) of the S&P 500 index. Type I Model model model model model model model model model model model

1 2 3 3 3 4 5 6 6 6

(BPNN) (GWO-BPNN) (PSO-BPNN) with c1 = 0.1, c2 = 1.2 (PSO-BPNN) with c1 = 0.4, c2 = 1.6 (PSO-BPNN) with c1 = 0.3, c2 = 2 (WOA-BPNN) (SCA-BPNN) (ISCA-BPNN) with c1 = 0.1, c2 = 1.2 (ISCA-BPNN) with c1 = 0.4, c2 = 1.6 (ISCA-BPNN) with c1 = 0.3, c2 = 2

Type II

Best

Average

Best

Average

82.47 85.48 83.84 83.14 84.31 85.48 84.64 86.48 86.64 85.64

71.30 83.89 82.59 81.22 81.24 82.87 82.94 84.09 84.32 83.64

78.13 85.64 84.81 84.97 85.14 85.64 86.14 86.64 86.64 86.81

71.84 84.26 81.12 81.65 81.70 83.46 84.02 84.87 84.39 84.17

Note: Type I is without Google Trends and Type II is with Google Trends. Table 3 The hit ratios (%) of the DJIA index. Type I Model model model model model model model model model model model

1 2 3 3 3 4 5 6 6 6

(BPNN) (GWO-BPNN) (PSO-BPNN) with c1 = 0.1, c2 = 1.2 (PSO-BPNN) with c1 = 0.4, c2 = 1.6 (PSO-BPNN) with c1 = 0.3, c2 = 2 (WOA-BPNN) (SCA-BPNN) (ISCA-BPNN) with c1 = 0.1, c2 = 1.2 (ISCA-BPNN) with c1 = 0.4, c2 = 1.6 (ISCA-BPNN) with c1 = 0.3, c2 = 2

Type II

Best

Average

Best

Average

0.8447 0.8848 0.8765 0.8731 0.8715 0.8865 0.8831 0.8848 0.8865 0.8881

0.7194 0.8751 0.8529 0.8462 0.8357 0.8726 0.8654 0.8793 0.8795 0.8780

0.8080 0.8881 0.8848 0.8548 0.8815 0.8831 0.8815 0.8898 0.8865 0.8865

0.7317 0.8776 0.8571 0.8030 0.8442 0.8666 0.8711 0.8813 0.8755 0.8800

Note: Type I is without Google Trends and Type II is with Google Trends. Table 4 Related results. Studies

Methods

Market stock

The best hit ratio (%)

Kim and Han [27] Leung et al. [44] Huang et al. [20] Kara et al. [16] Qiu et al. [1] Our study

GA feature discretization Classification model SVM BPNN GA-ANN hybrid model ISCA-BPNN hybrid model ISCA-BPNN hybrid model ISCA-BPNN hybrid model ISCA-BPNN hybrid model

Korea US, UK, Japan Japan Istanbul Japan S&P 500 in USA S&P 500 in USA DJIA in USA DJIA in USA

61.70 68 (Nikkei 225) 75 75.74 81.27 86.81 86.64 88.98 88.81

for for for for

Type Type Type Type

We use the same parameters of c1 and c2 for both the DJIA and S&P 500 Indices. Based on the above selected parameters, we first train these six models and test the corresponding models. Then we make the normalized output values changed into the original range for calculating hit ratios for every model. We conduct the experiments 10 times, and give the best hit ratios by using all these models and calculate the average of hit ratios for those 10 times. We repeat the experiment with c1 = 0.3, c2 = 2 and s = 10 for Type I and Type II. The corresponding results are in Table 2, and Table 3, where the average values and the best values of hit ratios of the testing output by these six models are shown for both Type I and Type II. From Table 2 on the S&P 500 Index, the best model with the best hit ratio is model 6 for Type I (86.64% with c1 = 0.4 and c2 = 1.6) and Type II (86.81% with c1 = 0.3 and c2 = 2). The best model with the average hit ratio is model 6 for Type I (84.32% for Type I with c1 = 0.4 and c2 = 1.6) and Type II (84.87% for Type II with c1 = 0.1 and c2 = 1.2). From Table 3 on the DJIA Index, the best model with the best hit ratio is model 6 for Type I (88.81% with c1 = 0.3 and c2 = 2) and Type II (88.98% with c1 = 0.1 and c2 = 1.2). The best model with average hit ratio is model 6 for Type I (87.95% with c1 = 0.4 and c2 = 1.6) and Type II (88.13% with c1 = 0.1 and c2 = 1.2).

II I II I

Based on Tables 2 and 3, we conclude that model 6 for Type II is more effective in predicting the directions of the daily opening prices of the S&P 500 and DJIA Indices. Our study indicates that model 6 for Type II is useful for investors and can be a good candidate for predicting the direction of next day’s opening price. We also observe that the average hit ratio of predicting the opening price direction for Type II is mostly larger than that for Type I on the same model. From Tables 2 and 3, we also obtain three results: (1) Google Trends data can help in more accurately predicting the direction of the stock prices; (2) model 6 is superior to the other five models; (3) ISCA has also better performance optimizing BPNN than the other heuristic algorithms: SCA, GWO,PSO and WOA. 4.5. Related results Predicting the direction of stocks is an important topic for most investors. There are many studies published in the recent past that focus on the prediction of these movements. In [1], Qiu et al. optimized the ANN model using genetic algorithms (GA) for the predictability of stock price direction and then collected the performance with prior studies listed in [1]. Here, we add the method proposed in this paper into the results listed in [1] and then obtain

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

JID: NEUCOM

ARTICLE IN PRESS

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8

7

Table 4. It is worthwhile to note that the results listed in Table 4 do not use the same data set.

(Grant No. 201701D22111439 and 201701D221121) and Shanxi Scholarship Council of China (Grant No. 2016-088).

5. Conclusion and discussion

References

In this paper, we focus on the S&P 500 and DJIA Indices for predicting the directions of the opening prices with an improved sine cosine algorithm (ISCA-BPNN) and Google Trends data. In [28], it was shown that Google Trends data reflected aspects of the current state of the economy and also provided some insight into future trends in the behavior of economic actors during the investigated period. Google Trends has options to define a set of key words for collecting data. In this paper, we choose the specific key words, “S&P 500” and “DJIA” to collect data from Google Trends, respectively. To overcome the limitations of BP gradient search, we present back propagation neural network models to improve predicting the direction of stock markets. We compare the proposed model, ISCABPNN, with BPNN model, GWO-BPNN model, PSO-BPNN model, WOA-BPNN model and SCA-BPNN model and conclude that the proposed ISCA-BPNN model is better than the other five models in the prediction of the direction of the opening price. The data are taken from the S&P 500 and DJIA Indices from Yahoo Finances between Jan. 1, 2010 and Jun. 16, 2017. We also use Google Trends to improve the accuracy of the stock’ prediction. By comparison, we conclude that ISCA is able to efficiently optimize the parameters of the neural network and results in a better prediction. The hit ratio for ISCA-BPNN with Google Trends reaches 86.81% for the S&P 500 Index and 88.98% for the DJIA Index. Our results indicate that the proposed ISCA-BPNN is capable of predicting stock prices, and in particular, show that Google trends can help in predicting future financial returns. In this paper, we predict the directions of the opening prices of the S&P 500 and DJIA Indices. In a future work, we will use optimized neural networks to predict prices and trading volumes instead of the directions. In addition, we will use data mining technology to select a set of key words associated with the topic and discuss how Google Trends affects stock prices and trade volumes. Many other time series on stocks besides the three major US stock market indices are available, such as the Japanese Nikkei 225 Index and two Chinese stock market indices: the Shanghai Composite and Shenzhen Component Indices. In a future work, we will propose new models to predict these stock market indices. In addition, we will take the same data set for the existing models for performance comparison and will determine appropriate models for specific time series of stocks. The proposed model in this paper is established from BP neural network optimized by the improved sine cosine algorithm. And the results show that this model is suitable for predicting the directions of the S&P 500 and DJIA Indices. These give us indications that swarm intelligence algorithms are able to optimize the parameters of other artificial neural networks for predictions and classifications. Therefore, we will propose new or improved swarm intelligence algorithms, and combine multiple swarm intelligence algorithms together for predictions.

[1] M.Y. Qiu, Y. Song, Predicting the direction of stock market index movement using an optimized artificial neural network model, PLoS ONE. 11 (5) (2016) E0155133, doi:10.1371/journal. pone.0155133. [2] E. Guresen, G. Kayakutlu, T.U. Daim, Using artificial neural network models in stock market index prediction, Expert Syst. Appl. 38 (8) (2011) 89–97. [3] T. Lee, C. Chiu, Neural network forecasting of an opening cash price index, Int. J. Syst. Sci. 33 (3) (2002) 29–37. [4] M. Inthachot, V. Boonjing, S. Intakosum, Artificial neural network and genetic algorithm hybrid intelligence for predicting thai stock price index trend, Comput. Intell. Neurosci. (2016) 3045254, doi:10.1155/2016/3045254. [5] E.L. de Faria, M.P. Albuquerque, J.L. Gonzalez, J. Cavalcante, Predicting the brazilian stock market through neural networks and adaptive exponential smoothing methods, Expert Syst. Appl. 36 (2009) 12506–12509. [6] A.A. Adebiyi, A.O. Adewumi, C. Ayo, Comparison of ARIMA and artificial neural networks models for stock price prediction, J. Appl. Math. (2014) 614342, doi:10.1155/2014/614342. [7] J. Zahedi, M. Rounaghi, Application of artificial neural network models and principal component analysis method in predicting stock prices on tehran stock exchange, Phys. A. 438 (2015) 178–187. [8] A.M. Rather, A. Agarwal, V. Sastry, Recurrent neural network and a hybrid model for prediction of stock returns, Expert Syst. Appl. 42 (2015) 3234–3241. [9] T.T. Khuat, Q.C. Le, B.L. Nguyen, M. Le, Forecasting stock price using wavelet neural network optimized by directed artificial bee colony algorithm, J. Telecommu. Inf. Technol. 2 (2016) 43–52. [10] S. Kumar Chandar, M. Sumathi, S. Sivanandam, Prediction of stock market price using hybrid of wavelet transform and artificial neural network, Indian J. Sci. Technol. 9 (8) (2016) 1–5, doi:10.17485/ijst/2016/v9i8/87905. [11] J. Wang, J. Wang, W. Fang, H. Niu, Financial time series prediction using ELMAN recurrent random neural networks, Comput. Intell. Neurosci. (2016) 4742515, doi:10.1155/2016/4742515. [12] W. Shen, X.P. Guo, C. Wu, D. Wu, Forecasting stock indices using radial basis function neural networks optimized by artificial fish swarm algorithm, Knowl. Based Syst. 24 (2011) 378–385. [13] M.A. Fernndez-Gmez, A.M. Gil-Corral, F. Galn-Valdivieso, Corporate reputation and market value: evidence with generalized regression neural networks, Expert Syst. Appl. 46 (2016) 69–76. [14] N. Chauhan, V. Ravi, D. Karthik Chandra, Differential evolution trained wavelet neural networks: application to bankruptcy prediction in banks, Expert Syst. Appl. 36 (2009) 7659–7665. [15] M. Pulido, P. Melin, O. Castillo, Particle swarm optimization of ensemble neural networks with fuzzy aggregation for time series prediction of the mexican stock exchange, Inf. Sci. 280 (2014) 188–204. [16] Y. Kara, M.A. Boyacioglu, d. Baykan, Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the istanbul stock exchange, Expert Syst. Appl. 38 (2011) 5311–5319. [17] A.H. Moghaddam, M.H. Moghaddam, M. Esfandyari, Stock market index prediction using artificial neural network, J. Econ. Financ. Adm. Sci. 21 (2016) 89–93. [18] M.Y. Qiu, Y. Song, F. Akagi, Application of artificial neural network for the prediction of stock market returns: the case of the japanese stock market, Chaos Solitons Fract. 85 (2016) 1–7. [19] P.F. Pai, C. Lin, A hybrid ARIMA and support vector machines model in stock price forecasting, Omega 33 (2005) 497–505. [20] W. Huang, Y. Nakamori, S. Wang, Forecasting stock market movement direction with support vector machine, Comput. Oper. Res. 32 (10) (2005) 13–22. [21] D. Velumoni, S. Rau, Cognitive intelligence based expert system for predicting stock markets using prospect theory, Indian J. Sci. Technol. 9 (10) (2016) 1–6, doi:10.17485/ijst/2016/v9i10/88886. [22] D. Enkea, M. Grauerb, N. Mehdiyev, Stock market prediction with multiple regression, fuzzy type–2 clustering and neural networks, Procedia Comput. Sci. 6 (2011) 201–206. ˚ [23] Gustaf Forslund, David Akesson, Predicting share price by using Multiple Linear Regression, A Bachelor Thesis in Mathematical Statistics, Vehicle engineering KTH. May 21st, 2013. [24] J. Elman, Finding structure in time, Cognit. Sci. 14 (2) (1990) 179–211. [25] W. Pan, A new fruit fly optimization algorithm: taking the financial distress model as an example, Knowl. Based Syst. 26 (2012) 69–74. [26] S. Mirjalili, SCA: A sine cosine algorithm for solving optimization problems, Knowl. Based Syst. 0 0 0 (2016) 1–14. [27] Y. Chauvin, D. Rumelhart, Backpropagation: Theory, Architectures, and Applications, first ed, Psychology Press, 1995. [28] T. Preis, H.S. Moat, H. Eugene Stanley, Quantifying trading behavior in financial markets using Google Trends, SCci. Rep. 3 (2013) 1684, doi:10.1038/srep01684. [29] A. Hamid, M. Heiden, Forecasting volatility with empirical similarity and Google Trends, J. Econ. Behav. Org. 117 (2015) 62–81. [30] E. Chong, C. Han, C.P. Frank, Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies, Expert Syst. Appl. 83 (2017) 187–205. [31] D. Enke, N. Mehdiyev, Stock market prediction using a combination of stepwise regression analysis, differential evolution-based fuzzy clustering, and a fuzzy inference neural network, Intell. Autom. Soft Comput. 19 (4) (2013) 636–648.

Acknowledgments The authors would like to thank the editor and referees for their helpful comments which improve the paper. The authors also would like to acknowledge the help of Taylor Meginnis for proofreading the paper. This work is in part supported by the National Natural Science Foundation of China (Grant No. 61774137, 11061030, 11261052, 91430108, 11771322), the Natural Science Foundation of Tianjin City of China (Grant No. 15JCYBJC160 0 0), Natural Science Foundation of Shanxi Province

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038

JID: NEUCOM 8

ARTICLE IN PRESS

[m5G;January 30, 2018;13:42]

H. Hu et al. / Neurocomputing 000 (2018) 1–8

[32] S. Niaki, S. Hoseinzade, Forecasting s&p 500 index using artificial neural networks and design of experiments, J. Indust. Eng. Internatl. 9 (1) (2013) 1. [33] R. Cervelló-Royo, F. Guijarro, K. Michniuk, Stock market trading rule based on pattern recognition and technical analysis: forecasting the DJIA index with intraday data, Expert Syst. Appl. 42 (14) (2015) 5963–5975. [34] J. Patel, S. Shah, P. Thakkar, K. Kotecha, Predicting stock market index using fusion of machine learning techniques, Expert Syst. Appl. 42 (4) (2015) 2162–2172. [35] T.L. Chen, F. Chen, An intelligent pattern recognition model for supporting investment decisions in stock market, Inf. Sci. 346 (2016) 261–274. [36] W.C. Chiang, D. Enke, T. Wu, R. Wang, An adaptive stock index trading decision support system, Expert Syst. Appl. 59 (2016) 195–207. [37] K. Chourmouziadis, P. Chatzoglou, An intelligent short term stock trading fuzzy system for assisting investors in portfolio management, Expert Syst. Appl. 43 (2016) 298–311. [38] A. Arévalo, J. Nino¯ , G. Hernádez, J. Sandoval, High-frequency trading strategy based on deep neural networks, in: Proceedings of the International Conference on Intelligent Computing, Springer, 2016, pp. 424–436. [39] X. Zhong, D. Enke, Forecasting daily stock market return using dimensionality reduction, Expert Syst. Appl. 67 (2017) 126–139. [40] Y.J. Chen, Y. Hao, A feature weighted support vector machine and k-nearest neighbor algorithm for stock market indices prediction, Expert Syst. Appl. 80 (2017) 340–355. [41] X.D. Zhang, A. Li, R. Pan, Stock trend prediction based on a new status box method and adaboost probabilistic support vector machine, Appl. Soft Comput. 49 (2016) 385–398. [42] A.H. Moghaddam, M.H. Moghaddam, E. Morteza, Stock market index predition using srtificial neural network, J. Econ. Finance Adm. Sci. 21 (2016) 89–93. [43] F.A. de Oliveira, C.N. Nobre, E.Z. Luis, Applying artificial neural networks to prediction of stock price and improvement of the directional prediction index— case study of PETR4, Petrobras, Brazil, Expert Syst. Appl. 40 (2013) 7596–7606. [44] M.T. Leung, H. Daouk, A. Chen, Forecasting stock indices: a comparison of classification and level estimation models, Int. J. Forecast. 16 (2) (20 0 0) 173–190. [45] S. Lahmiri, Interest rate next-day variation prediction based on hybrid feedforward neural network, particle swarm optimization, and multiresolution techniques, Phys. A. 444 (2016) 388–396. [46] S. Lahmiri, Comparing variational and empirical mode decomposition in forecasting day-ahead energy prices, IEEE Syst. J. 11 (3) (2017) 1907–1910. [47] S. Lahmiri, Minute-ahead stock price forecasting based on singular spectrum analysis and support vector regression, Appl. Math. Comput. 320 (2018) 444–451.

Li Tang is a lecturer in the Department of Information Science and Technology of Tianjin University of Finance and Economics. She received her PhD degree in computer application from the Tianjin University in 2015. She has published several research papers indexed by EI and SCI. Her research interests include machine learning and data mining.

Shuhua Zhang is a Professor at the Coordinated Innovation Center for Computable Modeling in Management Science, Tianjin University of Finance and Economics. He received his Ph.D. in Computational Mathematics from the Institute of Systems Science of Chinese Academy of Sciences. His research interests include numerical methods for partial differential equations, risk management in climate change, supply chain network, and financial mathematics and engineering.

Haiyan Wang is a Professor of Applied Mathematics at Arizona State University. He received his Ph.D. from Michigan State University. His research interests include applied mathematics and data science.

Hongping Hu was born in July 1973. She is an associate professor at the School of Science, North University of China. She received her Ph.D. degree from North University of China in 2009. Her research interests include combinatorial mathematics, artificial intelligence and image processing on applied mathematics.

Please cite this article as: H. Hu et al., Predicting the direction of stock markets using optimized neural networks with Google Trends, Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.01.038