Neural network earnings per share forecasting models: A comparison of backward propagation and the genetic algorithm

Neural network earnings per share forecasting models: A comparison of backward propagation and the genetic algorithm

Decision Support Systems 47 (2009) 32–41 Contents lists available at ScienceDirect Decision Support Systems j o u r n a l h o m e p a g e : w w w. e...

297KB Sizes 0 Downloads 169 Views

Decision Support Systems 47 (2009) 32–41

Contents lists available at ScienceDirect

Decision Support Systems j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / d s s

Neural network earnings per share forecasting models: A comparison of backward propagation and the genetic algorithm Qing Cao a,⁎, Mark E. Parry b a b

Rawls College of Business, Texas Tech University, Lubbock, TX 79409-2101, United States 318 Bloch School, University of Missouri-Kansas City, 5110 Cherry Street, Kansas City, MO 64110-2499, United States

a r t i c l e

i n f o

Article history: Received 22 June 2007 Received in revised form 22 December 2008 Accepted 23 December 2008 Available online 20 January 2009 Keywords: Artificial neural network Comparative analysis Forecasting methods

a b s t r a c t Zhang, Cao, and Schniederjans [W. Zhang, Q. Cao, M. Schniederjans, Neural Network Earnings Per Share Forecasting Models: A Comparative Analysis of Alternative Methods. Decision Sciences 35(2) (2004), 205– 237, hereafter ZCS] examined the relative ability of neural network models to forecast earnings per share. Their results indicate that the univariate NN model significantly outperformed four alternative univariate models examined in prior research. The authors also found that a neural network forecasting model incorporating fundamental accounting signals outperformed two variations of the multivariate forecasting model examined by Abarbanell and Bushee [J.S. Abarbanell, B.J. Bushee, Fundamental Analysis, Future EPS, and Stock Prices. Journal of Accounting Research 35(1) (1997), 1–24]. To estimate the neural network weights of their neural network models, ZCS used backward propagation (BP). In this paper we compare the forecasting accuracy of neural network weights estimated with BP to ones derived from an alternative estimation procedure, the genetic algorithm [R.S. Sexton, R.E. Dorsey, N.A. Sikander, Simultaneous Optimization of Neural Network Function and Architecture Algorithm. Decision Support Systems 36(3) (2004), 283–296]. We find that the genetic algorithm produces models that are significantly more accurate than the models examined by ZCS. © 2009 Elsevier B.V. All rights reserved.

1. Introduction Earnings expectations exert a significant influence on the resource allocation decisions of managers and the investment decisions of investors. Given the importance of accurate earnings forecasts, researchers have examined the relative effectiveness of different statistical methods for forecasting earnings and compared the accuracy of those methods with that of analysts' judgmental forecasts. Several studies have concluded that forecast accuracy can be improved by combining statistical and judgmental forecasts [7,13,24]. This conclusion has inspired a more recent stream of research that explores the predictive usefulness in statistical forecasting models of variables used by analysts in generating judgmental forecasts. For example, Lev and Thiagarajan [23, p. 190] examined the explanatory power of fundamental variables, which they defined as financial variables “claimed by analysts to be useful in security valuation.” Through a review of financial statement analyses the authors identified 12 fundamental signals, including variables such as inventory, capital expenditures, and gross margin. The authors found

⁎ Corresponding author. Tel.: +1 806 742 3919; fax: +1 806 742 3193. E-mail addresses: [email protected] (Q. Cao), [email protected] (M.E. Parry). 0167-9236/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2008.12.011

that these signals added significant explanatory power to regressions of excess returns on contemporaneous earnings. The study of fundamental signals was extended by Abarbanell and Bushee [1], who used a linear regression model and found significant linkages between a number of fundamental signals and future earnings. However, their empirical analysis did not confirm several hypothesized linkages, and the authors speculated that these variables might have a nonlinear relationship with future earnings. Zhang, Cao, and Schniederjans [37] relaxed the linearity assumption through the “arbitrary functional mapping ability” of a neural network model. Their results indicated that the univariate NN model significantly outperformed four alternative univariate models examined in prior research. The authors also found that a neural network forecasting model incorporating fundamental signals outperformed two variations of the linear regression model examined by Abarbanell and Bushee [1]. The research of Zhang and her colleagues (hereafter referred to as ZCS) raises two important questions. The first question concerns the ability of alternative estimation algorithms to provide improved NN earnings forecasts. To estimate the neural network weights of their neural network models, ZCS used backward propagation (BP), which involves the iterative adjustment of a single parameter vector with the goal of minimizing the sum of the squared differences between the observed and predicted values of future earnings per share (EPS). Existing research indicates that BP algorithms are inferior to genetic algorithms, in which a new generation of parameter vectors is created

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

by modifying the parameter vectors in the existing generation [9,29,30]. Genetic algorithms are designed to increase the probability of finding a global solution to a minimization problem by simultaneously exploring different parts of the parameter space [9]. A second important question concerns the reasons underlying the superior performance of the neural network (NN) models observed by ZCS. One possible explanation for their results arises from the fact that their NN models incorporated more explanatory variables than their linear models. In particular, the univariate NN model estimated by ZCS [37] contained four lagged values of EPS, while their univariate linear regression model contained two lagged values of EPS. Similarly, the ZCS multivariate NN model contained four sets of lagged fundamental accounting variables, while their two linear models each contained a single set of lagged fundamental accounting variables. Thus it is unclear whether the superior performance of their NN models reflected the nonlinear nature of financial data or the use of an expanded set of independent predictors. In this paper we examine the relative effectiveness of neural network-genetic algorithm (NNGA) models in predicting future earnings per share. We also explore the impact of independent variable specification on the relative accuracy of NN and multivariate linear regression models. To simplify comparisons with existing research, we adopt the methodology of ZCS [37], who compared the forecasting accuracy of ARIMA and linear regression models with neural network models estimated using BP. We derive alternative NN parameter estimates using a genetic algorithm and compare the forecasting accuracy of these parameter estimates with that of the models studied by ZCS. We find that the genetic algorithm produces more parsimonious models that are significantly more accurate than the models examined by ZCS. We also find that expanding the set of predictor variables used in linear regression does not change the relative forecasting accuracy of the NN and linear forecasting models. Our paper contributes to the neural network literature by providing evidence regarding the relative effectiveness of two different approaches to estimating neural network weights. Our findings are consistent with Monte Carlo studies indicating that a genetic algorithm can find a more parsimonious model structure by reducing the number of hidden nodes and setting some parameter weights to zero. The remainder of our discussion is organized as follows. In the next section we provide a brief introduction to NN models and describe the genetic algorithm used in this paper. After describing our data and our research methodology, we present the results of our analysis. We close with a discussion of implications and directions for future research. 2. Neural network methodologies 2.1. Neural network models Neural network models are inspired by studies of the informationprocessing abilities of the human brain. Key attributes of the brain's information network include a nonlinear, parallel structure and dense connections between information nodes [14]. Neural network models have been successfully applied in a variety of business fields including accounting [20,21,22], management information systems [15,19,38], marketing [8,17,31], and production management [16,32,36]. Neural network (NN) models represent the information-processing characteristics of the brain by linking layers of input and output variables through processing units called hidden nodes. Following Callen et al. [6] and ZCS, we use a three-layer neural network model consisting of an input layer, a hidden layer, and an output layer. Each independent variable (i.e., each input layer node) has a weighted connection to each hidden node in the input layer. Similarly, each hidden layer node has a weighted connection to each dependent variable (i.e., each output layer node). In this paper we will consider a model with a single output variable.

33

Formally, let Yt denote the output of the neural network and let xi and zj denote, respectively, the ith input variable (i = 1,…, k) and the jth middle layer variable. According to Qi [26] this model is commonly implemented with (1) logistic functions relating the input variables to the middle layer (hidden) variables and (2) an identify transfer function relating the middle layer variables to the output variable. Under these assumptions we can write the generic three-layer network model as: ! n n k X X X ðYt Þ = f ð X; α; βÞ = α j zj = α j log βij xi + β0j ð1Þ j=1

j=1

i=1

where: αj βij β0j log sig

is the weight linking the jth hidden layer variable with the output variable; is the weight linking the ith input to the jth hidden layer variable; is the bias weight of the jth hidden layer variable; and the logistic transfer function log sig (a) 1/[(1 + exp(−a))].

The model in Eq. (1) is commonly estimated using a backward propagation (BP) algorithm that specifies an initial set of neural network weights and then adjusts these weights to reduce an overall measure of model fit or forecast accuracy. For example, in their study of earnings per share, ZCS [37] used a BP algorithm to minimize the mean squared error of their NN model. BP algorithms are popular because they are easy to implement and provide “effective solutions to large and difficult problems” [14, p. 172]. However, like other gradient search techniques, BP algorithms evaluate a fit function at a single point (i.e., a single vector of parameter estimates) and use information about the curvature and steepness of the fit function around that point to generate a new point for evaluation. This process is repeated until the improvement in fit observed in successive iterations falls below a user-specified threshold level. The resulting solution is typically the local optimum nearest the algorithm's starting point [9]. As a result, parameter estimates produced by these algorithms may not be optimal over the entire parameter space. In fact, a literature review led Sexton, Dorsey and Johnson [28, pp. 598–590] to conclude that backpropagation is “plagued with inconsistent and unpredictable performances.” 2.2. Genetic algorithms Genetic algorithms are a family of search techniques inspired by studies of evolution and natural selection. Unlike BP, genetic algorithms (GAs) do not depend on the iterative modification of a single vector of neural network weight estimates. Instead, GAs work with sets (a population) of NN weight vectors. A randomly-generated initial population of vectors evolves through a series of operations that typically includes reproduction, crossover, and mutation (we define these operators below). The result is a “randomized but structured” search that “sweeps through the parameter space in many directions simultaneously and thereby reduces the probability of convergence to false optima” [9, 1995, p. 54]. Several studies suggest that GAs can outperform alternative estimation algorithms. Dorsey and Mayer [9] found that a genetic algorithm outperformed four alternative algorithms in estimating the neural network weights of higher dimensional problems (i.e., problems involving six or more neural network weights). In the context of neural network models, Sexton, Dorsey, and Johnson [28] considered seven different problems and found that, in every case, the GA solution statistically dominated the corresponding BP solution. In a related Monte Carlo study, the same authors reported that the GA solutions outperformed those generated by an alternative global search technique, simulated annealing [28].

34

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

value of RMSE.” As a result, the added-weight penalty is high at the beginning of the search process but declines as new generations evolve and the RMSE declines. In a Monte Carlo experiment Sexton and his colleagues compared the relative performance of a GA algorithm that included these two modifications with three different backward propagation algorithms. In tests involving 11 problems used in previous NN research, the authors found that their proposed algorithm “dominated the BP solutions in every test set at the 99% level of significance” [30, p. 292]. The authors also found that their GA algorithm successfully eliminated irrelevant variables in 101 of 130 training runs.

In this paper we use the GA outlined in Table 1, which is based on the work of Dorsey and Mayer [9] and Sexton, Dorsey and Sikander [30]. This algorithm begins by assuming a single hidden node, evolving 1000 generations, adding a second node, and evolving another N generations. This process continues until the addition of three consecutive nodes fails to produce a new set of parameter estimates with better fit characteristics than all preceding solutions. Given a specific assumption about the number of hidden nodes, the GA described in Table 1 generates 20 random NN weight vectors that constitute the first generation of solutions. To create the next generation of weight vectors, a reproduction operator assigns each vector a probability that reflects the fitness of that vector relative to the remaining 19 vectors. Then a mating pool is created by choosing vectors with replacement from the existing generation. The vectors within the mating pool are adjusted in three steps. First, a crossover operator randomly pairs the vectors in the mating pool, identifies any weights that exceed a randomly-drawn number, and switches those weights between the paired vectors. Second, a mutation operator replaces a small, random number (on average, about 5%) of the individual neural network weights in the mating pool with random numbers drawn from the range of possible values for that weight. This step is designed to increase the probability that the algorithm finds a global solution. Third, a second mutation number replaces a small, random number of weights with a hard zero [30]. The resulting vectors define the second generation. This process is repeated until 1000 generations have occurred. The GA algorithm described in Table 1 reflects two key recommendations of Sexton, Dorsey, and Sikander [30]. First, in order to obtain a parsimonious set of NN weight estimates, Sexton and his colleagues proposed the inclusion of the second mutation operator that randomly sets small weights to zero. Second, the authors also recommended the use of the following objective function, which is an increasing function of the number of model neural network weights:

To evaluate the usefulness of the genetic algorithm described in Table 1, we use the financial data set employed by ZCS and statistically evaluate the relative forecasting accuracy of NN models estimated with BP and comparably-specified models estimated with the GA algorithm described in Table 1. We test the following null hypothesis:

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP  u n ˆ Þ2 Oi − O u i n tj = 1 X 2 ˆ MinE = Oi − Oi Þ + C ; N j=1

Based on prior research, we expect that the use of a genetic algorithm for estimation purposes will improve forecasting accuracy, leading to a rejection of these null hypotheses.

ð2Þ

2.3. Hypotheses

H1. There will be no forecasting accuracy difference between NN models estimated with BP and NN models estimated with GA. For purposes of statistical testing we break this hypothesis into two parts in order to distinguish between univariate and multivariate models: H1a. There will be no forecasting accuracy difference between univariate NN models estimated with BP and univariate NN models estimated with GA. H1b. There will be no forecasting accuracy difference between multivariate NN models estimated with BP and multivariate NN models estimated with GA, where the predictor variables include both lagged dependent variables and fundamental accounting variables.

3. Data Here N is the number of observations in the training data set, O is ˆ is the fitted value of the observed value of the dependent variable, O the dependent variable, and C is the number of non-zero NN weights. As Sexton et al. [30, p. 288] explained, the “penalty for keeping an additional weight varies during the search and is equal to the current

Following ZCS [37], we collected quarterly EPS and fundamental accounting data from Compustat, which includes data on fundamental accounting variables for 283 firms over 45 quarters. The 283 firms in our sample represent 41 industries, which are listed in Table 2. Four

Table 1 The neural network simultaneous optimization algorithm (NNSOA). We follow Sexton et al. and set the total number of generations per node to 1000. The algorithm then proceeds in three stages. Stage 1 Determine the best solution given one hidden node. Step 1 Randomly generate 20 solutions. Step 2 Compute the fitness of each solution and assign a probability to each solution that reflects the fitness of that solution relative to the fitness of the other 19 solutions. Step 3 (reproduction) Select neural network weight vectors for the mating pool by (1) selecting a string from the current population, (2) comparing the probability of the current vector with a randomly drawn number, and (3) assigning the current vector to the next generation if its probability is greater than the random number. Repeat this process, cycling repeatedly through the current population of vectors until the mating pool contains 20 vectors. Note that the same vector may appear multiple times in the mating pool. Step 4 (crossover) Randomly pair the 20 vectors. For each pair of parents, compare the neural network weights in the parent vectors with a randomly chosen number. If a weight is greater than the random number, switch that weight with the corresponding weight in the other parent. Step 5 (mutation 1) For each weight in the mating pool, draw a random number. If the number is less than 0.01, replace the weight with a number drawn from the range of possible values for that neural network weight. The critical value of 0.01 comes from Goldberg [11]. Step 6 (mutation 2) For each weight in the mating pool, draw a random number. If the number is less than 0.01 and if the weight is less than 0.01, replace the weight with a hard zero. The resulting vectors define the next generation. Step 7 Repeat steps 1 to 6 until 1000 generations are complete. Step 8 Review the 1000 generations and save the best-fitting vector. Stage 2 Add one hidden node and repeat the steps in Stage 2. Save the best-fitting vector from the new set of 1000 generations. Repeat this process until the addition of three successive hidden nodes produces no improvement in fit. Stage 3 Select the best solution identified in all the generations created through Stage 2 and train this best solution using through an additional 1000 generations using steps 2 through 8 of Stage 1.

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41 Table 2 A breakdown of the sample by industries used in the study. Two-digit SIC code

Industry

Number of firms

13 20 26 27 28 30 33 34 35 36 37 38 39 50 51 73

Oil and gas extraction Food and kindred products Paper and allied products Printing and published Chemicals and allied products Rubber and miscellaneous plastic products Primary metal industries Fabricated metal products Industrial machinery and equipment Electrical and electronic equipment Transportation equipment Instruments and related products Miscellaneous manufacturing industries Wholesale trade — durable goods Wholesale trade — nondurable goods Business services Other industries

10 13 17 8 24 9 8 16 24 27 12 22 6 16 7 10 54 283

Total

industries (electrical and electronic equipment, chemical products, industrial machinery, and instruments) are represented by 24 or more firms. These four industries account for about 34% of our sample. Note that our selection criteria exclude both newly-formed firms and failed ones. As a result, our analysis applies to successful firms with a history of 11 or more years [25]. Our dependent variable is earnings per share. Our predictor variables are based on the work of Lev and Thiagarajan [23], who studied the written statements of financial analysts in order to identify accounting variables used in security valuation and earnings prediction, and the subsequent analysis of Abarbanell and Bushee [1]. Later studies [2,4] have confirmed the usefulness of these variables in predicting EPS. The formal definitions of our predictor variables are as follows: INV AR CAPX

GM SA ETR LFP

Dollar value of inventory scaled by the number of common shares; Accounts receivables scaled by the number of common shares; Capital Expenditure per Schedule V, scaled by the number of common shares (Schedule V reports property, plant, and equipment under Securities & Exchange Commission [SEC] Regulation S−X); Gross margin, defined as sales less cost of goods sold, scaled by the number of common shares; Selling and administrative expenses scaled by the number of common shares; Effective tax rates, defined as income taxes divided by pretax income; and Log of labor force productivity, defined as the log of the ratio of sales to the number of employees.

We take the log of the labor force productivity ratio in order to make the range of this variable (which extends into the millions) more comparable with the remaining variables. Note also that the meaning of the “number of employees” variable is not constant across companies in the Compustat database. Some companies report the number of employees at the end of the year, while others report the average number of employees during the year. Because Compustat reports annual capital expenditures, we divide the reported number by four to approximate quarterly capital expenditures. Similarly, because Compustat reports the annual number of employees, we assume that the number of employees in each quarter was constant through the year. ZCS [37] argued that any bias introduced by these assumptions would “very likely” be

35

independent of the functional forms of the models incorporating these two variables. They also examined the sensitivity of their results to (1) dropping the labor force productivity variable and (2) replacing the capital expenditures variable with an alternative measure available from quarterly cash flow statements. These changes had no impact on the relative accuracy of the different multivariate models examined by the authors. For these reasons we follow ZCS [37] and assume that (1) annual capital expenditures are distributed evenly across the quarters within a year and (2) the number of employees is relatively constant over the course of the year. Following ZCS [37], our analysis omits two variables considered by Abarbanell and Bushee [1]: a dummy variable for inventory policy and an auditor opinion variable. Most of the firms in our sample (242 of 283) used a LIFO inventory policy and received an unqualified auditor opinion. 4. Analysis To evaluate the relative performance of the GA algorithm developed by Sexton et al. [30], we compare the neural networkgenetic algorithm (NNGA) forecasting model with (1) univariate ARIMA and linear forecasting models and (2) neural network models estimated with backward propagation (NNBP). We also distinguish between the performance of models that include fundamental accounting variables and those that do not. Finally, we consider two additional linear regression models that contain the same predictor variables used in our NN models. As a result, our research design, which is summarized in Table 3, consists of eight model categories. The first four categories in Table 3 correspond to the research design used by ZCS [37]. Category 1 consists of four Univariate Linear Models (denoted ULM) that have performed well in previous studies of quarterly earnings forecast accuracy. Time series analysis of quarter earnings have indicated that “there are two components to the quarterly earnings process: (1) a four-period seasonal component and (2) an adjacent quarter component which describes the seasonally adjusted series” [12, p. 71]. Several different models have been proposed to represent these two components. Foster [10] used quarterly data ranging from 1946 to 1974 to examine the predictive accuracy of six univariate forecasting models. He found that the ARIMA model labeled ULM.1 in Table 2, which contains a single autoregressive parameter, produced the most accurate one-step-ahead forecast of earnings and sales. Brown and Rozeff [5] used Value Line data on 23 firms and compared the accuracy of Model ULM.1 with two alternative models. Model ULM.2 adds a seasonal moving average (SMA) parameter [5], while Model ULM.3 adds an additional moving average parameter (see [12,33]). Brown and Rozeff concluded that (1) Models ULM.2 and ULM.3 produced insignificantly different one-period ahead forecasts and (2) both outperformed the Foster model (ULM.1). In a later study using Compustat data on 240 firms, Bathke and Lorek [3] concluded that Model ULM.2 produced better predictive forecasts than Models ULM.1 and ULM.3. Given this conflicting evidence, we will use all three models as benchmarks to evaluate the performance of our univariate NN models. Model ULM.4 is an OLS model that, like the Brown and Rozeff [5] ARIMA model, features two lagged values of earnings per share [37]. This model serves as a benchmark for evaluating the value of the fundamental signals incorporated into the Multivariate Linear Models (denoted MLM) of Category 2, which contains two multivariate OLS models inspired by the work of Lev and Thiagarajan [23] and Abarbanell and Bushee [1]. Both models feature two lagged values of earnings per share and lagged values of the seven fundamental accounting variables defined in the preceding section. In the first model the accounting variables are lagged one quarter, while in the second model the accounting variables are lagged four quarters.

36

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

Table 3 Research design. Category

Models

1. Univariate Linear Models (ULM)a

ULM.1: E(Yt) = Yt − 4 + ϕ1(Yt − 1−Yt − 5) + δ ULM.2: E(Yt) = Yt − 4 + ϕ1(Yt − 1−Yt − 5)−Θ1at − 4 + δ ULM.3: E(Yt) = Yt − 4 + ϕ1(Yt − 1 − Yt − 5) − θ1at − 1 − Θ1at − 4 + − θ1Θ1at − 5 + δ ULM.4: E(Yt) = b0 + b1Yt − 1 + b2Yt − 4 + δ MLM.1: E(Yt)=a+b1Yt − 1 +b2Yt − 4 +b3INVt − 1δ+b4ARt − 1 +b5CAPXt − 1 +b6GMt − 1 +b7SAt − 1 +b8ETRt − 1 +b9LFPt − 1 MLM.2: E(Yt)=a+b1Yt − 1 +b2Yt − 4 +b3INVt − 4δ+b4ARt − 4 +b5CAPXt − 4 +b6GMt − 4 +b7SAt − 4 +b8ETRt − 4 +b9LFPt − 4

2. Multivariate Linear Models (MLM)b

3. Univariate neural network model estimated with BP (UBP)

UBP : EðYt Þ =

n P

α j log sig β 0j +

j=1

4. Multivariate neural network model estimated with BP (MBP)

MBP : EðYt Þ =

n P

0 α j log sig@β 0j

j=1

5. Univariate neural network model estimated with GA (UGA)

UGA : EðYt Þ =

n P

6. Multivariate neural network model estimated with GA (MGA)

MGA : EðYt Þ =

n P

0 α j log sig@β 0j

j=1

7. Revised Univariate Linear Regression Model (RULM) 8. Revised Multivariate Linear Regression Model (RMLM)

β ij Yt − i

Þ

8 91 = 4 < β 1i Yt − i + β 2i INVt − i δ + β 3i ARt − i P + + β 4i CAPXt − i + β 5i GMt − i + β 6 SAt − i A ; i = 1 : + β ETR + β LFP t−i t−i 7i 8i

α j log sig β 0j +

j=1

4 P i=1

4 P

Þ

β ij Yt − i 8 91 = 4 < β 1i Yt − i + β 2i INVt − i δ + β 3i AR t − i P + + β 4i CAPXt − i + β 5i GMt − i + β6 SAt − i A ; i = 1 : + β ETR t − i + β 8i LFPt − i 7i i=1

RULM: E(Yt) = a + b1Yt − 1 + b2Yt − 2 + b3Yt − 3 + b4Yt − 4 RMLM: E(Yt) = a + b1Yt − 1 + b2Yt − 2 + b3Yt − 3 + b4Yt − 4 + b5INVt − 4δ + b6Art − 4 + b7CAPXt − 4 + b8GMt − 4 + b9Sat − 4 + b10ETRt − 4 + b11LFPt − 4

a In the ARIMA models we use the following notation: ϕ, θ and Θ denote, respectively, the autoregressive (AR), moving average (MA), and seasonal moving average (SMA) weights. The parameter δdenotes the ARIMA model constant. b The multivariate models of Categories 2, 4, and 6 include the following fundamental accounting variables: INV (inventory), AR (accounts receivables), CAPX (capital expenditure), GM (gross margin), SA (selling and administrative expenses), ETR (effective tax rate), and LFP (labor force).

Categories 3 and 4 each consist of a single neural network model estimated with backward propagation. In both cases we use 30 quarters of data to create 26 rolling, five-quarter samples. In each sample the first four quarters are used to predict earnings per share in the fifth quarter. The forecast error is then computed and revised parameter estimates are generated using the BP technique. In the Category 3 model (denoted UBP for Univariate BP), the input variables consist of lagged values of EPS. In the Category 4 model (denoted MBP for Multivariate BP), the input variables consist of lagged values of EPS and the fundamental accounting variables. Categories 5 and 6 use the same sets of input variables as the preceding two categories, but the neural network weights are estimated with the genetic algorithm outlined in Table 1. The models in these categories are denoted by UGA (for Univariate GA) and MGA (for Multivariate GA). Unlike the BP procedure, the GA algorithm does not use a sequence of rolling samples. Instead, the 30 quarters of training data are used to evolve the NN model neural network weights through 1000 generations, a process that is repeated under different assumptions regarding the number of hidden nodes. Categories 7 and 8 contain alternative specifications of the linear regression models in Categories 1 and 2. The univariate BP model estimated by ZCS [37] contained four lagged values of EPS, while their univariate linear model (ULM.4) contained two lagged value of EPS. To determine the impact of the omitted lagged variables on the forecast accuracy of the linear model, the Revised Univariate Linear Model (RULM) in Category 7 contains the same four lagged variables as the univariate NN models in Categories 3 and 5. Similarly, the Revised Multivariate Linear Model (RMLM) in Category 8 contains the same four sets of lagged fundamental accounting variables included in the multivariate NN models of Categories 4 and 6. 4.1. Forecasting accuracy procedure As described above, our analysis is based on Compustat data for 283 firms over 45 quarters. For each firm we lose 5 data points due to differencing. Thus our analysis sample consists of 40 observations for each firm ranging from the second quarter of 1992 to the first quarter of 2002. We group these observations to form rolling samples of 31 quarters. Within each rolling sample we use the first 30 observations to estimate the neural network weights of each forecasting model and use the last observations in the estimation set to make a one-step

ahead forecast. Thus, for each model, we make 10 forecasts based on data from the preceding 30 quarters. To assess forecast accuracy, we use the following measures of fit [6,37]: Mean Absolute Percentage Error ðMAPEÞ =

40 1 X Y − Yˆ t j t j; and ð3Þ 10 t = 31 Yt

40 1 X Yt − Yˆ t Mean Squared Error ðMSEÞ = 10 t = 31 Yt

!2 ;

ð4Þ

where Yˆt is the forecasted value of EPSt. We omit observations for which EPSt = 0. Consistent with prior work by ZCS, Brown and Rozeff [5], and Lorek and Willinger [25], we impose an upper bound of one on individual errors and report the percentage of errors affected by this constraint. 5. Empirical results 5.1. The relative performance of the GA algorithm Table 4 reports the averages of two error metrics within each quarter and across all four quarters. An examination of these tables indicates that the univariate NNGA model generates MAPEs that are uniformly lower than the corresponding univariate models in Categories 1 and 3. For example, if we examine the error metrics from the pooled sample containing data from all four quarters, the UGA model (Category 5) has a MAPE of 0.41, while the UBP model (Category 3) has a MAPE of 0.51 and the best-fitting ULM model (ULM.4 in Category 1) has an MPE of 0.55. A similar conclusion follows from a comparison of the models that include the fundamental accounting variables (Categories 2, 4, and 6): the MAPE of the MGA model (0.22) is lower than the MAPE of the MBP model (0.36) and the lowest MAPE generated by the MLM models (0.60). This same pattern emerges from (1) a within-quarter comparison of the MSE measures for each model and (2) a within-quarter comparison of the percentage of large errors (defined as the number of forecast errors greater than one) for each model. The results in Table 4 also provide interesting insight into the impact of fundamental accounting variables on forecasting accuracy. As noted by ZCS, the inclusion of fundamental accounting variables

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

37

Table 4 Comparing forecast accuracy of one-step-ahead quarterly EPS forecasts for ten forecast models. 1st quarter

2nd quarter

3rd quarter

4th quarter

Overall

MAPEa MSEb Large forecast MAPE MSE errorc (%)

Large forecast MAPE MSE error (%)

Large forecast MAPE MSE error (%)

Large forecast MAPE MSE error (%)

Large forecast error (%)

Category 1 ULM.1 ULM.2 ULM.3 ULM.4

0.575 0.603 0.564 0.548

0.480 0.511 0.467 0.443

33.5 33.5 33.8 29.9

0.506 0.538 0.514 0.525

0.407 0.437 0.419 0.419

29.9 31.0 31.3 28.6

0.584 0.591 0.590 0.574

0.491 0.499 0.500 0.469

36.2 35.7 38.5 31.7

0.557 0.575 0.555 0.550

0.462 0.481 0.463 0.445

34.2 34.9 34.7 30.5

Category 2 MLM.1 MLM.2

0.631 0.629

0.543 41.6 0.541 42.6

0.625 0.535 41.3 0.552 0.456 34.3

0.558 0.571

0.454 32.4 0.477 36.5

0.612 0.624

0.525 39.6 0.529 38.6

0.609 0.600

0.518 39.1 0.507 38.5

Category 3 UBP.1

0.534

0.432 29.4

0.492 0.380 24.6

0.486

0.376

24.2

0.526

0.413

25.4

0.514

0.405 26.2

Category 4 MBP.1

0.372

0.268 16.5

0.368 0.261

16.4

0.346

0.234 13.9

0.359

0.246 13.3

0.362

0.253 15.0

Category 5 UGA.1

0.341

0.282 21.5

0.375 0.288 20.1

0.399

0.241

19.3

0.410

0.298 20.1

0.406

0.301 20.7

0.251 0.189 N = 562

0.242 0.187 N = 562

11.0

0.224 0.199 N = 840

Category 6 MGA.1 0.233 0.171 Number of forecasts N = 836 a b c d

35.5 37.6 33.7 31.0

11.6

0.541 0.546 0.529 0.544

0.449 0.451 0.444 0.438

12.7

11.2

0.222 0.178 11.7 Total = 2800d

Mean Absolute Percentage Error (MAPE). Mean Squared Error (MSE). ˆ t)/Qt|) = 100% when this expression exceeded 100% (large forecast error). We set the forecast error (|(Qt − Q We excluded firm-quarters whose EPS are zero, thus reducing the total firm-quarters from 2830 to 2800.

had a negative impact on the forecast accuracy of the linear regression model (compare Model ULM.4 with MLM.1 and MLM.2). In contrast, the inclusion of fundamental accounting variables had a dramatic positive impact on the forecast accuracy of the NN models estimated with BP. For example, in the pooled sample the inclusion of these variables lowered the MAPE of the BP model from 0.51 (Model UBP) to 0.36 (Model MBP). Similarly, for the same sample, adding the fundamental accounting variables to the GA univariate model reduced the average MAPE from 0.41 (Model UGA) to 0.22 (Model MGA). These results support the ZCS argument that fundamental accounting variables have a non-linear relationship with EPS. To formally test H1a, we rank ordered the MAPE error metrics generated by each model for each testing sample and applied the Friedman test for multiple treatments. Panel A of Table 5 reports the results of this analysis for the six univariate models in Categories 1, 3, and 5. The last two columns of Panel A summarize the test results for the overall pooled sample. The F-statistic for the pooled analysis (12.13) is significant at the 1% level of confidence, indicating that forecast accuracy, as measured by ranks, varies across the six models. The univariate GA model (Category 5) has the lowest average rank (2.33) and the lowest performance score (1), which indicates that, among the univariate models, the UGA model provided the most accurate set of forecasts. This conclusion is also supported by a quarter-by-quarter analysis of the data. In each quarterly sample, the UGA model has the lowest average ranking and smallest performance scores of the six univariate models. Based on these results, we reject Hypothesis 1a for the results pooled across quarters. A review of Panel A also indicates that, in the pooled sample, the univariate BP model provided the second most accurate set of forecasts, with a mean rank of 3.40. Importantly, the UBP model also dominated all four Category 1 models in each of the quarterly samples. These results contradict the findings of ZCS, who found that the UBP model did not significantly dominate the four category 1 models in Quarters 2 and 3. Importantly, this variation in results does not reflect differences in parameter estimates. In particular, the MSEs and MAPE's reported in Table 5 are identical to those reported by ZCS. Thus the difference in conclusions drawn from the Friedman test reflects our

inclusion of an additional model (the univariate GA model) in the ranking procedure. Panel B of Table 5 reports the Friedman test results relevant for H1b, which concerns the relative forecasting performance of the four multivariate models in Categories 2, 4, and 6. As in Panel A, the relevant F statistic is significant in the pooled sample and in all four quarterly samples. Moreover, in each sample the average rank of the multivariate GA model is significantly lower than the average rank of the remaining three multivariate models. Based on these results, we reject null hypothesis H1b in both the pooled sample and in the quarterly samples. Note also that, in all five samples, the multivariate BP model statistically dominates the two linear models in Category 2. This result is consistent with the findings of ZCS. Panel C of Table 5, which reports the Friedman test results relevant for all of the models in the first six categories, addresses the relative performance of the univariate and multivariate models. As in the precending panels, the Friedman F statistic is significant in the pooled sample and in all four quarterly samples. In each sample the average rank of the multivariate GA model is significantly lower than the average rank of the remaining three models. These results provide further evidence for the rejection of null hypothesis H1 in both the pooled sample and in the quarterly samples. It is also worth noting that, in the overall pooled sample, the second most accurate forecasting model is the univariate GA model. This result also holds in all four quarterly samples. In summary, the results reported in Tables 4 and 5 indicate that there is a significant difference in forecasting accuracy between the NNGA models and the alternative models of Categories 1 through 4. We conclude that the use of the genetic algorithm improves the accuracy of EPS forecasts. Relative to the BP algorithm, the GA algorithm did require greater computing time. For the univariate models, the average training time difference between the NNGA and NNBP algorithms was 36 s (UGA = 40 s, UBP = 6 s). For the multivariate models, the difference was 37 s (MGA 45 s, MBP = 7 s). These results were obtained using a Dell Laptop with an Intel T 2300 processor, 1.66 GHz, and 0.99 GB of Ram. We attribute these differences in computing time to the fact, that, in the BP estimation procedure (implemented using NeuroShell 2

38

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

Table 5 Ranking forecast accuracy of one-step-ahead quarterly EPS forecasts for ten forecast models. 1st quarter Average ranks1

Performance scores determined by the Friedman Test

2nd quarter

3rd quarter

4th quarter

Overall

Average ranks

Average ranks

Average ranks

Average ranks

Performance scores determined by the Friedman Test

Performance scores determined by the Friedman Test

Performance scores determined by the Friedman Test

Performance scores determined by the Friedman Test

Panel A: Comparisons of Univariate Linear and NN Models (H1a)2 ULM.1 3.6672 3 3.5856 5 ULM.2 3.8040 6 3.6120 3 ULM.3 3.5532 3 3.5172 3 ULM.4 3.5688 3 3.8220 6 UBP.1 3.4068 2 3.4632 2 UGA.1 2.7853 1 2.3956 1 3 F = 7.413 (p = 0.000) F = 4.254 (p = 0.000)

3.5712 3 3.6828 3 3.5472 3 3.7344 3 3.4644 2 3.1368 1 F = 3.126 (p = 0.005)

3.6756 3.6324 3.7164 3.6732 3.3036 2.8135 F = 7.828

Panel B: Comparisons of Multivariate Linear and NN Models (H1b) MLM.1 2.7156 3 2.8428 4 MLM.2 2.7311 4 2.5584 3 MBP.1 1.7556 2 1.7988 2 MGA.1 1.4397 1 1.2573 1 F = 188.576 (p = 0.000) F = 150.934 (p = 0.000)

2.7036 4 2.6928 3 1.8048 2 1.3806 1 F = 145.475 (p = 0.000)

2.7108 4 2.7384 3 1.7524 2 1.3290 1 F = 250.416 (p = 0.000)

2.7372 4 2.6904 3 1.7724 2 1.2784 1 F = 800.432 (p = 0.000)

Panel C: Comparisons of the Multivariate NN Model and all others ULM.1 5.5896 7 5.4360 5 ULM.2 5.8020 8 5.4636 6 ULM.3 5.4228 5 5.3220 4 ULM.4 5.3916 5 5.7876 8 MLM.1 6.1440 9 6.4140 9 MLM.2 6.2100 9 5.7576 7 UBP.1 5.1864 4 5.2692 5 MBP.1 3.4536 3 3.7488 3 UGA.1 3.0287 2 3.3426 2 MGA.1 2.4287 1 2.5139 1 F = 89.412 (p = 0.000) F = 52.572 (p = 0.000)

5.4432 5.6220 5.4300 5.7276 6.0264 6.0132 5.2980 3.6408 2.8437 2.4245 F = 51.436

5.6832 5 5.6520 5 5.7252 5 5.6904 5 5.9232 9 6.0480 10 5.1852 3 3.2916 3 2.4720 2 2.1466 1 F = 111.468 (p = 0.000)

5.5572 8 5.6532 5 5.4948 5 5.6280 5 6.1080 9 6.0312 9 5.2248 4 3.5016 3 3.1413 2 2.5390 1 F = 260.385 (p = 0.000)

8 5 5 5 9 9 4 3 2 1 (p = 0.000)

3 3 3 3 2 1 (p = 0.000)

3.6336 3 3.6888 5 3.5940 3 3.6840 5 3.3984 2 2.3292 1 F = 12.131 (p = 0.000)

1

To compute average ranks, we assigned a rank of 1 to the model with the lowest MAPE and 6 to the model with the highest MAPE in Panel A and averaged the ranks across all firmquarters. The same procedure was implemented for Panels B and C but the model with the highest MAPE in Panel B was assigned a rank of 4 and the highest MAPE in Panel C was assigned a rank of 10. 2 In Panel A, we used the Friedman test for two treatments to conduct pair-wise comparisons for all the possible combinations of the six models. Based on the pair-wise results, we assigned a performance score of 1 to the model that outperformed all others, 2 to the model that performed second best, and so on. We defined two models as being significantly different in forecast accuracy when the relevant two-tailed t-test yielded a probability equal to or lower than 0.1. Identical rankings for two models indicate that the associated t-test was insignificant. 3 F statistics were obtained from the Friedman test for multiple treatments (in our case, six treatments for Panel A, four treatments for Panel B, and ten treatments for Panel C).

from Ward Systems Group), the number of hidden nodes was determined by the following rule of thumb: # of hidden neurons = 1 = 2ðnumber of input variables + number of output variablesÞ + square root of the number of training patterns:

In contrast, the number of hidden nodes for the GA algorithm was determined by varying the assumed number of hidden nodes and performing a separate estimation procedure for each hidden node assumption. If we had used a similar search procedure to determine the number of hidden nodes in the BP algorithm, the differences in times would have been much smaller. 5.2. The impact of comparable sets of predictor variables To assess the sensitivity of the preceding results to the predictor variables included in the linear models, we compared the NN models in Categories 3–6 with the expanded linear regression models in Categories 6 and 7. Table 6 reports the averages of two error metrics within each quarterly sample and in the pooled sample, along with the percentage of large forecast errors in each sample. An examination of these tables indicates that, in all five samples, the expanded univariate regression model (Model RULM) is dominated by both univariate NN models (Models UBP and UGA). A similar conclusion follows from a comparison of the models that include the fundamental accounting variables (Models MBP, MGA, and RMLM). As in the preceding sub-section, we again rank-ordered the error metrics generated by the models in Categories 3–8 and applied the Friedman test for multiple treatments. This analysis is summarized in

Table 7. The results in Panel A indicate that the expanded univariate model (Category 7) is always dominated by the two univariate NN models. Similarly, the results in Panel B indicate that the expanded multivariate model is always dominated by the two multivariate NN models. A review of Panel C indicates that: (1) the multivariate GA model again dominates the remaining models; (2) both GA models dominate both BP models; (3) all four NN models dominate the linear regression models; and (4) unlike in Table 4, the univariate multivariate linear model dominates the univariate linear regression model. We conclude that, with one exception (i.e., the relative accuracy of the univariate and multivariate linear models), expanding the set of predictor variables included in the linear regression models does not alter the conclusions drawn from our earlier analysis of the models examined by ZCS. 5.3. Modifications of algorithm assumptions To test the robustness of our results, we performed several additional analyses. First, we altered the estimation procedure for Models UBP and MBP to reflect the recommendations of Sexton, Dorsey, and Sikander [30] regarding the GA algorithm. In particular, we (a) replaced the standard sum-of-squared errors objective function with the objective function in Eq. (2) and (b) constrained randomlyselected small neural network weights to zero. Table 8 reports the results of this analysis. The rows labeled UBP.1 and MBP.1 repeat the results reported in Table 4, while the rows

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

39

Table 6 Comparing forecast accuracy of one-step-ahead quarterly EPS forecasts for six forecast models (Categories 4–8). 1st quarter MAPEa MSEb

2nd quarter Large forecast MAPE MSE errorc (%)

3rd quarter

4th quarter

Overall

Large forecast MAPE MSE error (%)

Large forecast MAPE MSE error (%)

Large forecast MAPE MSE error (%)

Large forecast error (%)

Category 3 UBP.1

0.534

0.432 29.40

0.492

0.38

24.60

0.486

0.376

24.20

0.526

0.413

25.40

0.514

0.405 26.20

Category 4 MBP.1

0.372

0.268 16.50

0.368

0.261 16.40

0.346

0.234 13.90

0.359

0.246 13.30

0.362

0.253 15.00

Category 5 UGA.1

0.341

0.282 21.5

0.375

0.288 20.1

0.399

0.241

19.3

0.410

0.298 20.1

0.406

0.301 20.7

Category 6 MGA.1

0.233

0.171

0.251

0.189

0.242

0.187

11.0

0.224

0.199

11.2

0.222

0.178

Category 7 RULM.1

0.568

0.436 32.42

0.524

0.430 30.82

0.495

0.404 30.05

0.567

0.471

32.92

0.550

0.452 31.02

0.542 0.473 N = 840

32.01

0.507 0.410 31.55 Total = 2800d

11.6

Category 8 RMLM.1 0.511 0.434 31.47 Number of forecasts N = 836 a b c d

12.7

0.469 0.401 30.23 N = 562

0.496 0.384 32.74 N = 562

11.7

Mean Absolute Percentage Error (MAPE). Mean Squared Error (MSE). We set the forecast error (|(Qt − Qˆt)/Qt|) = 100% when this expression exceeded 100% (large forecast error). We excluded firm-quarters whose EPS are zero, thus reducing the total firm-quarters from 2830 to 2800.

clear that an application of the Sexton et al. recommendations to both NN estimation procedures does not alter our earlier conclusions regarding the relative superiority of the GA algorithm. In addition, we examined the impact of a reduction in the number of generations used to evolve the GA neural network weights. Our base case results, which were based on 1000 generations and reported earlier in Table 4, are denoted in Table 8 as Models UGA.1 and MGA.1. The impact on forecast accuracy of reducing the number of generations to 500 is reported in the rows labeled UGA.2 and MGA.2. In the case of the univariate GA model, halving the number of generations produced an increase in error statistics ranging from 3.8% (second

labeled UBP.2 and MPB.2 reports the error statistics obtained when the Sexton et al. [30] recommendations are imposed on the BP algorithm. The rows labeled “%CHANGE” report the percentage change in the error statistics relative to the results reported in Table 4. For example, in the first quarter MAPE column, the %CHANGE for the UBP.2 model is 1.31%, computed as 100 ⁎ (0.541 − 0.534) / 0.534. In every case, the incorporation of these recommendations resulted in larger MAPEs and MSEs and more forecast errors, although in some cases the changes were relatively small. For example, of the twelve % CHANGE statistics reported for the Category 3 model, nine are less than 5% and all are less than 7%. While these increases are small, it is

Table 7 Ranking forecast accuracy of one-step-ahead quarterly EPS forecasts for six forecast models. 1st quarter Average ranks1

Performance scores determined by the Friedman Test

2nd quarter

3rd quarter

4th quarter

Overall

Average ranks

Average ranks

Average ranks

Performance scores determined by the Friedman Test

Average ranks

Performance scores determined by the Friedman Test

2 1 3 (p = 0.000)

1.7556 1.2403 2.7008 F = 61.988

2 1 3 (p = 0.000)

Performance scores determined by the Friedman Test

Performance scores determined by the Friedman Test

Panel A: Comparisons of Univariate Regression and NN Models2 UBP.1 1.7458 2 1.9237 2 UGA.1 1.2586 1 1.2130 1 RULM.1 2.5761 3 2.6313 3 F = 78.413 (p = 0.000) F = 49.539 (p = 0.000)

1.7836 2 1.2567 1 2.4988 3 F = 62.534 (p = 0.000)

1.7098 1.1732 2.7674 F = 55.451

Panel B: Comparisons of Multivariate Regression and NN Models MBP.1 1.5315 2 1.6067 2 MGA.1 1.1933 1 1.0098 1 RMLM.1 2.7917 3 2.6468 3 F = 85.578 (p = 0.000) F = 78.529 (p = 0.000)

1.5934 2 1.1712 1 2.5989 3 F = 72.414 (p = 0.000)

1.6186 2 1.1777 1 2.7836 3 F = 79.954 (p = 0.000)

1.5974 2 1.1259 1 2.7045 3 F = 90.356 (p = 0.000)

Panel C: Comparisons of the Multivariate NN Model and Regression UBP.1 2.7421 4 2.8405 4 MBP.1 2.3110 3 2.2425 3 UGA.1 1.4513 2 1.7093 2 MGA.1 1.2361 1 1.4062 1 RULM.1 3.9831 6 3.7989 6 RMLM.1 3.5872 5 3.4551 5 F = 39.772 (p = 0.000) F = 47.434 (p = 0.000)

2.7602 4 2.0937 3 1.7343 2 1.2345 1 3.9700 6 3.3747 5 F = 29.876 (p = 0.000)

2.9466 4 2.1637 3 1.7328 2 1.2893 1 3.7657 6 3.5050 5 F = 31.521 (p = 0.000)

2.8787 4 2.1570 3 1.6345 2 1.5135 1 3.7841 6 3.5706 5 F = 44.675 (p = 0.000)

1 To compute average ranks, we assigned a rank of 1 to the model with the lowest MAPE and 3 to the model with the highest MAPE in Panel A and averaged the ranks across all firmquarters. The same procedure was implemented for Panels B and C but the model with the highest MAPE in Panel C is assigned a rank of 6. 2 In Panel A, we used the Friedman test for two treatments to conduct pair-wise comparisons for all the possible combinations of the six models. Based on the pair-wise results, we assigned a performance score of 1 to the model that outperformed all others, 2 to the model that performed second best, and so on. We defined two models as being significantly different in forecast accuracy when the relevant two-tailed t-test yielded a probability equal to or lower than 0.1. Identical rankings for two models indicate that the associated t-test was insignificant.

40

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

Table 8 Robustness tests of forecast accuracy for the BP and GA models. 1st quarter

2nd quarter

3rd quarter

4th quarter

MAPEa

MSEb

Large forecast errorc (%)

MAPE

MSE

Large forecast error (%)

MAPE

MSE

Large forecast error (%)

MAPE

MSE

Large forecast error (%)

Category 3 UBP.1d UBP.2 %CHANGEe

0.534 0.541 1.31%

0.432 0.451 4.40%

29.4 31.20 6.12

0.492 0.499 1.42%

0.380 0.392 3.16%

24.6 25.60 4.07

0.486 0.498 2.47%

0.376 0.383 1.86%

24.2 25.55 5.58

0.526 0.534 1.52%

0.413 0.429 3.87%

25.4 27.17 6.97

Category 4 MBP.1 MBP.2 %CHANGE

0.372 0.385 3.49%

0.268 0.281 4.85%

16.5 17.80 7.87

0.368 0.376 2.17%

0.261 0.279 6.90%

16.4 17.09 4.21

0.346 0.358 3.47%

0.234 0.248 5.98%

13.9 14.89 7.12

0.359 0.382 6.41%

0.246 0.275 11.79%

13.3 13.97 5.04

Category 5 UGA.1f UGA.2 %CHANGE UGA.3 %CHANGE

0.341 0.356 4.40% 0.419 22.87%

0.282 0.299 6.03% 0.342 21.28%

21.50 22.43 4.33 24.71 14.93

0.375 0.392 4.53% 0.428 14.13%

0.288 0.299 3.82% 0.314 9.03%

20.10 21.06 4.78 21.85 8.71

0.399 0.421 5.51% 0.352 11.78%

0.241 0.256 6.22% 0.296 22.82%

19.30 20.75 7.51 21.51 11.45

0.41 0.431 5.12% 0.473 15.37%

0.298 0.309 3.69% 0.341 14.43%

20.10 21.10 4.98 22.10 9.95

Category 6 MGA.1 MGA.2 %CHANGE MGA.3 %CHANGE Number of forecasts

0.233 0.245 5.15% 0.275 18.03% N = 836

0.171 0.188 9.94% 0.202 18.13%

11.60 13.14 13.28 13.36 15.17

0.251 0.261 3.98% 0.286 13.94% N = 562

0.189 0.201 6.35% 0.219 15.87%

12.70 13.12 3.31 14.56 14.65

0.242 0.261 7.85% 0.299 23.55% N = 562

0.187 0.199 6.42% 0.214 14.44%

11.00 12.36 12.36 13.66 24.18

0.224 0.237 5.80% 0.272 21.43% N = 840

0.199 0.201 1.01% 0.222 11.56%

11.20 11.71 4.55 12.99 15.98

a

Mean Absolute Percentage Error (MAPE). Mean Squared Error (MSE). c We set the forecast error (|(Qt − Qˆt)/Qt|) = 100% when this expression exceeded 100% (large forecast error). d The estimation procedure for Models UBP.1 and MPB.1 used a sum-of-squared errors objective function and did not restrict small neural network weights to zero, while the estimation procedure for models UBP.2 and MPB.2 used the objective function in Eq. (2) and did restrict small neural network weights to zero. e The percent change (denoted by “%Change”) was computed using the base case (i.e., the original estimate from Table 4) as the denominator. For example, the %Change for the UBP.2 model in Quarter 1 was computed as 100 ⁎ (UBP.2 − UBP.1) / UBP.1 = 100 ⁎ (0.541 − 0.534) / 0.534 = 1.31%. f Models UGA.1, MGA.1, UGA.3, and MGA.3 were estimated with 1000 generations, while Models UGA.2 and MGA2 were estimated with 500 generations. In addition, Models UGA.1, MGA.1, UGA.2, and MGA.2 used the objective function in Eq. (2), while Models UGA.3 and MGA.3 were estimated with a sum-of-squared-errors objective function. b

quarter MSE) to 7.5% (third quarter MSE). For the multivariate GA Model, the increase in error statistics ranged from 1.0% (fourth quarter MSE) to 13.3% (first quarter Large Forecast Errors). Finally, we also examined the impact on model parsimony and forecast accuracy of a change in the GA objective function. In particular, the UGA.1 and MGA.1 models were estimated with the objective function proposed by Sexton et al. [27] and reproduced in Eq. (2) above. In contrast, the results reported in the rows labeled UGA.3 and MGA.3 were generated using a standard sum-of-squared errors objective function. In both sets of models, neural network weights were permitted to evolve for 1000 generations. One important question concerns the impact of this change in objective function on the number of hidden nodes. Model UGA.1 had an average of 11.1 hidden nodes, while Model UGA.3 had an average of 12.2 hidden nodes. Similarly, Model MGA.1 had an average of 6.9 hidden nodes, compared with 9.4 for MGA.3. These results support the Sexton et al. [30] argument that, for GA algorithms, the objective function in Eq. (2) produces more parsimonious models than the conventional sum-of-squared-errors objective function. With regard to the error statistics for Models UGA.3 and MGA.3, notice that, in many cases, the GA error statistics are now much closer to the corresponding BP statistics. For example, consider the first quarter MAPE statistic for the univariate BP and GA models. The difference in MAPE for models UBP.1 and UGA.1 is 0.178, but replacing UGA.1 with UGA.3 reduces the difference to 0.115. In addition, for both the univariate and multivariate GA models, three of the % CHANGE statistics are greater than 20. Moreover, for the multivariate GA model, eight of the %CHANGE statistics are greater than 15%. These results suggest that replacing the objective function in Eq. (2) with the standard sum-of-squared-errors objective function reduces the difference in forecast accuracy between the BP and GA algorithms.

6. Conclusions In this paper we have examined the relative effectiveness of neural network models estimated with a genetic algorithm in predicting future earnings per share based on fundamental signals. To simplify comparisons with existing research we adopted the methodology of ZCS [37], who compared the forecasting accuracy of ARIMA and linear regression models with neural network models estimated using BP. Using the same data analyzed by ZCS, we derived alternative NN parameter estimates using a GA algorithm proposed by Sexton et al. [30] and compared the forecasting accuracy of those estimates with that of NN neural network weights estimated using BP. We found that the genetic algorithm yielded forecasts that were significantly more accurate than those produced through the use of BP. Our results also indicate that the two GA algorithm modifications suggested by Sexton et al. (i.e., using the objective function in Eq. (2) and constraining small neural network weights to zero) produced more parsimonious sets of GA model estimates. Finally, we found that expanding the set of predictor variables used in the linear regression models did not change our conclusions regarding the relative forecasting accuracy of the NN and linear forecasting models. Our findings reinforce several important conclusions reached by ZCS. First, univariate NN models outperform univariate ARIMA and linear regression models. Second, the inclusion of fundamental accounting variables adds significant explanatory power to EPS forecasting models that only incorporate lagged dependent variables as predictor variables. Our findings also have important implications for other studies of the effectiveness of NN models in forecasting financial variables like EPS. For example, Callen et al. [6] found that linear ARIMA models outperformed neural network models estimated with backward propagation. Our results raise the possibility that a re-estimation of

Q. Cao, M.E. Parry / Decision Support Systems 47 (2009) 32–41

the Callen et al. model using the genetic algorithm would produce improved forecasts and alter their conclusions regarding the relative forecasting accuracy of neural network models. Given the results presented here, it is also important to compare the relative forecasting accuracy of neural network GA models and the judgmental forecasts of analysts. We leave these topics to be addressed by future research. We close with a cautionary note. In this paper we have shown that, for one specific objective function (the sum-of-squared- errors objective function), the GA algorithm outperforms the BP algorithm. We believe that our focus on the SSE objective function is appropriate given its widespread use in forecasting. However, the No Free Lunch Theorem [18,34,35] indicates that alternative objective functions exist for which the GA algorithm will be inferior to the BP algorithm. For this reason, the relative superiority of the GA algorithm for alternative objective functions remains a topic for future research. References [1] J.S. Abarbanell, B.J. Bushee, Fundamental analysis, future EPS, and stock prices, Journal of Accounting Research 35 (1) (1997) 1–24. [2] J.S. Abarbanell, B.J. Bushee, Abnormal returns to a fundamental analysis strategy, The Accounting Review 73 (1) (1998) 19–45. [3] A.W. Bathke, K.S. Lorek, The relationship between time-series models and the security market's expectation of quarterly earnings, The Accounting Review 59 (2) (1984) 163–176. [4] M.D. Beneish, C.M.C. Lee, R.L. Tarpley, Contextual fundamental analysis through the prediction of extreme returns, Review of Accounting Studies 6 (2,3) (2001) 165–189. [5] L.D. Brown, M.S. Rozeff, Univariate time-series models of quarterly accounting earnings per share: a proposed model, Journal of Accounting Research 17 (1979) 179–189 (Spring). [6] J.L. Callen, C.C.Y. Kwan, P.C.Y. Yip, Y. Yuan, Neural network forecasting of quarterly accounting earnings, International Journal of Forecasting 12 (4) (1996) 475–482. [7] R. Conroy, R. Harris, Consensus forecasts of corporate earnings: analysts' forecasts and time series methods, Management Science 33 (6) (1987) 725–738. [8] G. Cui, M.L. Wong, H.K. Lui, Machine learning for direct marketing response models: Bayesian networks with evolutionary programming, Management Science 52 (4) (2006) 597–612. [9] R.E. Dorsey, W.J. Mayer, Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features, Journal of Business & Economic Statistics 13 (3) (1995) 53–66. [10] G. Foster, Quarterly accounting data: time series properties and predictive ability results, The Accounting Review 52 (1) (1977) 1–21. [11] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989. [12] P. Griffin, The time series behavior of quarterly earnings: preliminary evidence, Journal of Accounting Research 15 (Spring) (1977) 71–83. [13] J.B. Guerard, Linear Constraints, Robust-Weighting and Efficient Composite Modeling, Journal of Forecasting 6 (3) (1987) 193–199. [14] S. Haykin, Neural Networks: a Comprehensive Foundation, 2nd Edition, PrenticeHall, Englewood Cliffs, NJ, 1998. [15] S.J. Huang, N.H. Chiu, L.W. Chen, Integration of the grey relational analysis with genetic algorithm for software effort estimation, European Journal of Operational Research 188 (3) (2008) 898–909. [16] S. Kaparthi, N.C. Suresh, Performance of selecting part-machine grouping technique for data sets of wide ranging sizes and imperfection, Decision Sciences 25 (4) (1994) 515–532. [17] Y. Kim, N.K. Street, G.J. Russell, F. Menczer, Customer targeting: a neural network approach guided by genetic algorithms, Management Science 51 (2) (2005) 264–276. [18] S.O. Kimbrough, G.J. Koehler, M. Lu, D.H. Wood, On a feasible-infeasible twopopulation (FI-2Pop) genetic algorithm for constrained optimization: distance tracing and no free lunch, European Journal of Operational Research 190 (2) (2007) 310–327. [19] T. Kuflik, Z. Boger, P. Shoval, Filtering search results using an optimal set of terms identified by an artificial neural network, Information Processing & Management 42 (2) (2006) 469–483. [20] K. Kuldeep, B. Sukanto, Artificial neural network vs. linear discriminant analysis in credit ratings forecast; a comparative study of prediction performances, Review of Accounting & Finance 5 (3) (2006) 216–227. [21] M. Landajo, Javier de Andrés, Robust neural modeling for the cross-sectional analysis of accounting information, European Journal of Operational Research 177 (2) (2007) 1232–1252. [22] M.J. Lenard, P. Alam, G.R. Madey, The application of neural networks and a qualitative response model to the auditor's going concern uncertainty decision, Decision Sciences 26 (2) (1995) 209–224. [23] B. Lev, S.R. Thiagarajan, Fundamental information analysis, Journal of Accounting Research, 31 (2) (1993) 190–215. [24] G.J. Lobo, R.D. Nair, Combining judgmental and statistical forecasts: an application to earnings forecasts, Decision Sciences 21 (2) (1990) 446–460. [25] K.S. Lorek, G.L. Willinger, A multivariate time-series prediction model for cashflow data, The Accounting Review 71 (1) (1996) 81–102.

41

[26] M. Qi, Nonlinear predictability of stock returns using financial and economic variables, Journal of Business & Economic Statistics 17 (4) (1999) 419–429. [27] R.S. Sexton, R.E. Dorsey, J.D. Johnson, Toward global optimization of neural networks: a comparison of the genetic algorithm and backpropagation, Decision Support Systems 22 (2) (1998) 171–185. [28] R.S. Sexton, R.E. Dorsey, J.D. Johnson, Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing, European Journal of Operational Research 114 (3) (1999) 589–601. [29] R.S. Sexton, R.S. Sriram, H. Etheridge, Improving decision effectiveness of artificial neural networks: a modified genetic algorithm approach, Decision Sciences 34 (3) (2003) 421–442. [30] R.S. Sexton, R.E. Dorsey, N.A. Sikander, Simultaneous optimization of neural network function and architecture algorithm, Decision Support Systems 36 (3) (2004) 283–296. [31] R.J. Thieme, M. Song, R.J. Calanton, Artificial neural network decision support systems for new product development project selection, Journal of Marketing Research 37 (2) (2000) 499–506. [32] K.J. Wang, J.C. Chen, Y.S. Lin, A hybrid knowledge discovery model using decision tree and neural network for selecting dispatching rules of a semiconductor final testing factory, Production Planning & Control 16 (7) (2005) 665–680. [33] R.L. Watts, R.W. Leftwich, The time series of annual accounting EPS (in research reports), Journal of Accounting Research 15 (2) (1977) 253–271. [34] D.H. Wolpert, W.G. Macready, No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010, Santa Fe Institute, 1995. [35] D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation 1 (67) (1997) 67–82. [36] R.C. Wu, R.S. Chen, S.S. Chian, Design of a product quality control system based on the use of data mining techniques, IIE Transactions 38 (1) (2006) 39–51. [37] W. Zhang, Q. Cao, M. Schniederjans, Neural network earnings per share forecasting models: a comparative analysis of alternative methods, Decision Sciences 35 (2) (2004) 205–237. [38] D. Zhu, G. Premkumar, X. Zhang, C.H. Chu, Data mining for network intrusion detection: a comparison of alternative methods, Decision Sciences 32 (4) (2001) 635–653. Qing Cao is a Jerry Rawls Professor of MIS in the Department of Information Systems and Quantitative Sciences for the Rawls College of Business at TTU in September. Cao holds a Ph.D. from the College of Business Administration at the University of Nebraska and he also earned an MBA from University of Wisconsin System and a B.S. in Mechanical Engineering from Shanghai JiaoTong University. Cao's research interests include software quality metrics, ecommerce and m-commerce, IT outsourcing, supply chain information management, and artificial intelligence applications. Cao is a recipient of the University of Missouri — Kansas City Trustee's Faculty Research Award (2005). He has published more than 20 research papers in journals such as Journal of Operations Management, Decision Sciences, Communications of ACM, Decision Support Systems, IEEE Transactions on Systems, Man, and Cybernetics, Information and Management, International Journal of Production Research, European Journal of Operational Research, Computers and Operations Research, Journal of Database Management, International Journal of Production Economics, among others. Cao serves as a member of the editorial review board at Computer & Operations Research, International Journal of Service Sciences, and International Journal of Information Technology and Management. He is serving as the Associate Program Chair, 2008 Decision Sciences Institute Annual Meeting. Mark Parry is the Ewing M. Kauffman/Missouri Endowed Chair in Entrepreneurial Leadership and Professor of Marketing at the Henry W. Bloch School of Business and Public Administration at UMKC. He is a senior leader in the Institute for innovation and Entrepreneurship at UMKC's Bloch School. Parry teaches graduate and executive education courses. He received an M.A. from the University of Texas at Arlington in 1984 and a Ph.D. in Management Science from the University of Texas at Dallas in 1988. Parry conducts research and teaches courses on innovation, new product launch strategies, marketing strategy and entrepreneurship. As a Professor of Business Administration at the Darden Graduate School of Business at the University of Virginia, Parry has designed a number of executive programs and executive program modules dealing with marketing strategy, brand strategy, pricing, and the launching of new products. His teaching/consulting assignments have included work for AAA South, Allied Signal, Carrier Corporation, Coca-Cola, Cerveceria Cuauhtemoc Moctezuma, Graphics Controls, GTech, IBM, Ingersoll-Rand, Lexis Law Publishing, The Mars Corporation, Milliken & Company, Ortho Biotech, and the Swedish Institute of Management. Parry is the author of more than 30 articles in Management Science, Academy of Management Journal, Journal of Marketing Research, Marketing Science, Journal of Marketing, the Journal of the Academy of Marketing Science, Journal of Retailing, Journal of Operations Management, Journal of Product Innovation Management, Market Letters, IEEE Transactions on Engineering Management, and others. He is also the author of three books: Cases in Marketing Strategy (with Yoshinobu Sato), Strategic Marketing Management: A Means-End Approach, and Mathematical Models of Distribution Channels (with Charles A. Ingene). His research has won several awards, including the 2005 Excellence in Global Marketing Research Award from the American Marketing Association with Michael Song, Ph.D., Charles N. Kimball, MRI/Missouri Endowed Chair in Management of Technology and Innovation and Professor of Marketing.